2024-09-16 12:25:21,768 INFO [train.py:1266] (1/2) Training started 2024-09-16 12:25:21,768 INFO [train.py:1276] (1/2) Device: cuda:1 2024-09-16 12:25:21,770 INFO [train.py:1307] (1/2) Using dtype=torch.float16 2024-09-16 12:25:21,770 INFO [train.py:1308] (1/2) Use AMP=True 2024-09-16 12:25:21,770 INFO [train.py:1310] (1/2) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'ignore_id': -1, 'label_smoothing': 0.1, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9f6206b565b833d71e19b4411493d04d99f0a308', 'k2-git-date': 'Thu Mar 28 09:46:54 2024', 'lhotse-version': '1.27.0', 'torch-version': '2.2.2+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'cr-ctc', 'icefall-git-sha1': '07d6b123-dirty', 'icefall-git-date': 'Wed Sep 4 19:33:41 2024', 'icefall-path': '/zw/mnt/yaozengwei/workspace/icefall_cr_ctc', 'k2-path': '/root/anaconda3/envs/python3.10/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/envs/python3.10/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'NGK_zengwei'}, 'world_size': 2, 'master_port': 12341, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.1, 'cr_loss_scale': 0.02, 'time_mask_ratio': 2.5, 'cr_loss_masked_scale': 1.0, 'attention_decoder_loss_scale': 0.9, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': False, 'num_encoder_layers': '2,2,4,5,4,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1536,2048,1536,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,512,768,512,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,320,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'attention_decoder_dim': 512, 'attention_decoder_num_layers': 6, 'attention_decoder_attention_dim': 512, 'attention_decoder_num_heads': 8, 'attention_decoder_feedforward_dim': 2048, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': False, 'use_ctc': True, 'use_attention_decoder': True, 'use_cr_ctc': True, 'full_libri': True, 'mini_libri': False, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1200, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'sos_id': 1, 'eos_id': 1, 'vocab_size': 500, 'dtype': torch.float16, 'use_autocast': True} 2024-09-16 12:25:21,771 INFO [train.py:1312] (1/2) About to create model 2024-09-16 12:25:22,546 INFO [train.py:1316] (1/2) Number of model parameters: 174319650 2024-09-16 12:25:22,546 INFO [train.py:752] (1/2) num_frame_masks: 25.0, max_frames_mask_fraction: 0.375 2024-09-16 12:25:24,271 INFO [train.py:1338] (1/2) Using DDP 2024-09-16 12:25:26,292 INFO [asr_datamodule.py:436] (1/2) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2024-09-16 12:25:26,295 INFO [asr_datamodule.py:232] (1/2) Enable MUSAN 2024-09-16 12:25:26,295 INFO [asr_datamodule.py:233] (1/2) About to get Musan cuts 2024-09-16 12:25:28,332 INFO [asr_datamodule.py:279] (1/2) Disable SpecAugment 2024-09-16 12:25:28,332 INFO [asr_datamodule.py:281] (1/2) About to create train dataset 2024-09-16 12:25:28,332 INFO [asr_datamodule.py:308] (1/2) Using DynamicBucketingSampler. 2024-09-16 12:25:29,137 INFO [asr_datamodule.py:325] (1/2) About to create train dataloader 2024-09-16 12:25:29,138 INFO [asr_datamodule.py:453] (1/2) About to get dev-clean cuts 2024-09-16 12:25:29,140 INFO [asr_datamodule.py:460] (1/2) About to get dev-other cuts 2024-09-16 12:25:29,141 INFO [asr_datamodule.py:356] (1/2) About to create dev dataset 2024-09-16 12:25:29,338 INFO [asr_datamodule.py:373] (1/2) About to create dev dataloader 2024-09-16 12:25:29,338 INFO [train.py:1545] (1/2) Sanity check -- see if any of the batches in epoch 1 would cause OOM. 2024-09-16 12:28:13,904 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 47142MB 2024-09-16 12:28:15,781 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 47224MB 2024-09-16 12:28:17,892 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 47710MB 2024-09-16 12:28:19,143 INFO [scaling.py:1024] (1/2) Whitening: name=None, num_groups=1, num_channels=512, metric=116.89 vs. limit=7.5 2024-09-16 12:28:20,063 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 48343MB 2024-09-16 12:28:22,108 INFO [scaling.py:1024] (1/2) Whitening: name=None, num_groups=4, num_channels=128, metric=9.14 vs. limit=3.0 2024-09-16 12:28:22,391 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 48343MB 2024-09-16 12:28:24,634 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 48343MB 2024-09-16 12:28:54,377 INFO [train.py:1198] (1/2) Epoch 1, batch 0, loss[loss=8.202, ctc_loss=4.745, cr_loss=0.5637, attn_decoder_loss=8.573, over 29616.00 frames. ], tot_loss[loss=8.202, ctc_loss=4.745, cr_loss=0.5637, attn_decoder_loss=8.573, over 29616.00 frames. ], batch size: 73, lr: 2.25e-02, grad_scale: 2.0 2024-09-16 12:28:54,378 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 12:29:13,592 INFO [train.py:1230] (1/2) Epoch 1, validation: loss=8.234, ctc_loss=4.87, cr_loss=1.182e-15, attn_decoder_loss=8.607, over 944034.00 frames. 2024-09-16 12:29:13,593 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 48575MB 2024-09-16 12:29:14,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.69 vs. limit=5.0 2024-09-16 12:29:14,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=7.5 2024-09-16 12:29:15,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=0.0, ans=0.1 2024-09-16 12:29:17,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=0.0, ans=0.5 2024-09-16 12:29:24,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=0.0, ans=0.5 2024-09-16 12:29:31,005 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.329e+03 2.556e+03 3.012e+03 3.068e+03 4.530e+03, threshold=1.205e+04, percent-clipped=0.0 2024-09-16 12:29:31,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=40.0, ans=0.498125 2024-09-16 12:29:44,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=108.63 vs. limit=7.515 2024-09-16 12:29:51,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+03 2.098e+03 2.580e+03 3.037e+03 5.426e+03, threshold=1.032e+04, percent-clipped=0.0 2024-09-16 12:29:57,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=7.53 2024-09-16 12:30:05,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=4.032 2024-09-16 12:30:15,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=120.0, ans=0.8958 2024-09-16 12:30:25,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=120.0, ans=0.09925 2024-09-16 12:30:28,473 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.091e+02 1.328e+03 1.895e+03 2.580e+03 5.426e+03, threshold=7.580e+03, percent-clipped=0.0 2024-09-16 12:30:28,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=160.0, ans=0.20240000000000002 2024-09-16 12:30:28,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=160.0, ans=0.0495 2024-09-16 12:30:30,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=16.52 vs. limit=7.56 2024-09-16 12:30:37,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=31.42 vs. limit=7.56 2024-09-16 12:30:47,304 INFO [train.py:1198] (1/2) Epoch 1, batch 50, loss[loss=1.808, ctc_loss=1.139, cr_loss=0.1645, attn_decoder_loss=1.878, over 29441.00 frames. ], tot_loss[loss=3.656, ctc_loss=2.001, cr_loss=0.2576, attn_decoder_loss=3.834, over 1266883.16 frames. ], batch size: 70, lr: 2.48e-02, grad_scale: 2.0 2024-09-16 12:30:57,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=200.0, ans=0.1925 2024-09-16 12:31:00,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=3.03 2024-09-16 12:31:07,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=24.84 vs. limit=7.59 2024-09-16 12:31:10,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=240.0, ans=0.04925 2024-09-16 12:31:11,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=7.59 2024-09-16 12:31:11,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=29.65 vs. limit=7.68 2024-09-16 12:31:12,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=240.0, ans=0.2476 2024-09-16 12:31:18,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=194.75 vs. limit=7.59 2024-09-16 12:31:26,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=19.49 vs. limit=5.07 2024-09-16 12:31:26,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=20.56 vs. limit=5.07 2024-09-16 12:31:31,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=280.0, ans=0.486875 2024-09-16 12:31:36,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=7.605 2024-09-16 12:31:47,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.93 vs. limit=7.74 2024-09-16 12:32:00,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=23.07 vs. limit=7.62 2024-09-16 12:32:02,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.85 vs. limit=7.77 2024-09-16 12:32:04,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=12.43 vs. limit=5.18 2024-09-16 12:32:05,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=360.0, ans=0.2464 2024-09-16 12:32:09,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=360.0, ans=0.00544 2024-09-16 12:32:11,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=360.0, ans=0.7536 2024-09-16 12:32:18,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=360.0, ans=0.455 2024-09-16 12:32:22,337 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 4.399e+02 6.271e+02 8.906e+02 1.633e+03 5.426e+03, threshold=1.781e+03, percent-clipped=0.0 2024-09-16 12:32:22,360 INFO [train.py:1198] (1/2) Epoch 1, batch 100, loss[loss=1.164, ctc_loss=1.146, cr_loss=0.144, attn_decoder_loss=1.163, over 29531.00 frames. ], tot_loss[loss=2.448, ctc_loss=1.559, cr_loss=0.1868, attn_decoder_loss=2.543, over 2252420.33 frames. ], batch size: 76, lr: 2.70e-02, grad_scale: 4.0 2024-09-16 12:32:29,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=25.52 vs. limit=7.8 2024-09-16 12:32:34,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=69.45 vs. limit=5.2 2024-09-16 12:32:40,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.56 vs. limit=5.22 2024-09-16 12:32:47,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=45.77 vs. limit=5.22 2024-09-16 12:32:57,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=440.0, ans=0.048625 2024-09-16 12:32:57,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.84 vs. limit=7.665 2024-09-16 12:33:01,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=7.86 2024-09-16 12:33:02,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=124.81 vs. limit=5.24 2024-09-16 12:33:13,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=480.0, ans=7.68 2024-09-16 12:33:16,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=73.62 vs. limit=7.68 2024-09-16 12:33:24,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=13.71 vs. limit=5.13 2024-09-16 12:33:26,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=186.24 vs. limit=7.695 2024-09-16 12:33:29,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.19 vs. limit=5.13 2024-09-16 12:33:29,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=520.0, ans=7.695 2024-09-16 12:33:35,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=7.89 2024-09-16 12:33:42,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=560.0, ans=0.0965 2024-09-16 12:33:43,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.76 vs. limit=5.14 2024-09-16 12:33:48,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=560.0, ans=7.92 2024-09-16 12:33:51,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=560.0, ans=0.2944 2024-09-16 12:33:55,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=560.0, ans=0.43 2024-09-16 12:33:55,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=4.224 2024-09-16 12:33:57,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.46 vs. limit=5.15 2024-09-16 12:33:58,496 INFO [train.py:1198] (1/2) Epoch 1, batch 150, loss[loss=0.9927, ctc_loss=1.104, cr_loss=0.1141, attn_decoder_loss=0.9778, over 29389.00 frames. ], tot_loss[loss=1.878, ctc_loss=1.396, cr_loss=0.1601, attn_decoder_loss=1.928, over 3047358.29 frames. ], batch size: 70, lr: 2.93e-02, grad_scale: 4.0 2024-09-16 12:34:08,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=186.05 vs. limit=7.725 2024-09-16 12:34:12,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=29.67 vs. limit=7.95 2024-09-16 12:34:15,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.63 vs. limit=5.3 2024-09-16 12:34:19,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=7.74 2024-09-16 12:34:21,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=99.27 vs. limit=7.74 2024-09-16 12:34:27,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=12.05 vs. limit=7.74 2024-09-16 12:34:29,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=165.26 vs. limit=7.74 2024-09-16 12:34:31,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.12 vs. limit=4.256 2024-09-16 12:34:34,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=640.0, ans=0.048 2024-09-16 12:34:42,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=8.01 2024-09-16 12:34:42,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=26.97 vs. limit=7.755 2024-09-16 12:34:47,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.16 vs. limit=5.17 2024-09-16 12:34:47,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.48 vs. limit=8.01 2024-09-16 12:34:51,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=33.57 vs. limit=7.755 2024-09-16 12:34:51,949 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.86 vs. limit=8.01 2024-09-16 12:35:03,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.06 vs. limit=8.04 2024-09-16 12:35:07,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=720.0, ans=0.0838 2024-09-16 12:35:07,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=720.0, ans=0.8748 2024-09-16 12:35:10,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=28.19 vs. limit=8.04 2024-09-16 12:35:20,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=8.07 2024-09-16 12:35:22,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=146.22 vs. limit=7.785 2024-09-16 12:35:33,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=7.785 2024-09-16 12:35:36,603 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.379e+02 2.686e+02 3.220e+02 5.129e+02, threshold=5.373e+02, percent-clipped=0.0 2024-09-16 12:35:36,626 INFO [train.py:1198] (1/2) Epoch 1, batch 200, loss[loss=1.047, ctc_loss=1.222, cr_loss=0.1246, attn_decoder_loss=1.024, over 27198.00 frames. ], tot_loss[loss=1.577, ctc_loss=1.316, cr_loss=0.1467, attn_decoder_loss=1.603, over 3658843.15 frames. ], batch size: 124, lr: 3.15e-02, grad_scale: 8.0 2024-09-16 12:35:39,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=23.60 vs. limit=7.8 2024-09-16 12:35:45,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.95 vs. limit=5.2 2024-09-16 12:35:47,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=17.84 vs. limit=7.8 2024-09-16 12:35:49,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=136.17 vs. limit=5.4 2024-09-16 12:36:02,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=16.04 vs. limit=5.21 2024-09-16 12:36:07,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=98.50 vs. limit=7.815 2024-09-16 12:36:09,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=54.95 vs. limit=7.815 2024-09-16 12:36:15,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=93.61 vs. limit=5.44 2024-09-16 12:36:19,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=4.352 2024-09-16 12:36:30,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=181.72 vs. limit=7.83 2024-09-16 12:36:34,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=98.41 vs. limit=7.845 2024-09-16 12:36:42,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=42.86 vs. limit=7.845 2024-09-16 12:36:44,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=8.19 2024-09-16 12:36:49,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=7.845 2024-09-16 12:36:52,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=8.22 2024-09-16 12:36:55,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=27.69 vs. limit=7.86 2024-09-16 12:36:59,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.88 vs. limit=5.24 2024-09-16 12:37:11,721 INFO [train.py:1198] (1/2) Epoch 1, batch 250, loss[loss=1.036, ctc_loss=1.222, cr_loss=0.1226, attn_decoder_loss=1.013, over 29354.00 frames. ], tot_loss[loss=1.397, ctc_loss=1.271, cr_loss=0.1401, attn_decoder_loss=1.408, over 4140934.70 frames. ], batch size: 100, lr: 3.38e-02, grad_scale: 8.0 2024-09-16 12:37:12,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=7.70 vs. limit=4.4 2024-09-16 12:37:14,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.93 vs. limit=7.875 2024-09-16 12:37:20,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=56.59 vs. limit=5.5 2024-09-16 12:37:25,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.06 vs. limit=8.25 2024-09-16 12:37:35,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.69 vs. limit=8.28 2024-09-16 12:37:42,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=22.58 vs. limit=5.52 2024-09-16 12:37:46,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=7.89 2024-09-16 12:37:54,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=7.905 2024-09-16 12:38:01,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.68 vs. limit=3.162 2024-09-16 12:38:07,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=64.20 vs. limit=7.905 2024-09-16 12:38:09,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=111.23 vs. limit=7.92 2024-09-16 12:38:14,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=8.34 2024-09-16 12:38:15,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1120.0, ans=0.4475 2024-09-16 12:38:15,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=42.19 vs. limit=8.34 2024-09-16 12:38:19,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=56.07 vs. limit=7.92 2024-09-16 12:38:24,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1120.0, ans=0.4475 2024-09-16 12:38:25,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=8.34 2024-09-16 12:38:43,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=117.70 vs. limit=7.935 2024-09-16 12:38:45,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=182.69 vs. limit=7.935 2024-09-16 12:38:48,400 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.367e+02 1.660e+02 1.791e+02 1.982e+02 5.267e+02, threshold=3.582e+02, percent-clipped=0.0 2024-09-16 12:38:48,422 INFO [train.py:1198] (1/2) Epoch 1, batch 300, loss[loss=0.9973, ctc_loss=1.216, cr_loss=0.1478, attn_decoder_loss=0.9697, over 29561.00 frames. ], tot_loss[loss=1.273, ctc_loss=1.239, cr_loss=0.1387, attn_decoder_loss=1.273, over 4509441.95 frames. ], batch size: 92, lr: 3.60e-02, grad_scale: 8.0 2024-09-16 12:38:51,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=79.84 vs. limit=7.95 2024-09-16 12:38:55,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=53.69 vs. limit=7.95 2024-09-16 12:38:59,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.86 vs. limit=5.3 2024-09-16 12:39:03,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.10 vs. limit=5.3 2024-09-16 12:39:05,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=8.4 2024-09-16 12:39:16,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=7.965 2024-09-16 12:39:22,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1240.0, ans=0.21860000000000002 2024-09-16 12:39:22,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1240.0, ans=0.441875 2024-09-16 12:39:26,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=1280.0, ans=0.178 2024-09-16 12:39:41,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1280.0, ans=0.2872 2024-09-16 12:39:42,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=7.98 2024-09-16 12:39:57,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=17.48 vs. limit=7.995 2024-09-16 12:40:01,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1320.0, ans=0.438125 2024-09-16 12:40:02,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=8.49 2024-09-16 12:40:08,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=32.31 vs. limit=8.01 2024-09-16 12:40:09,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1360.0, ans=0.8524 2024-09-16 12:40:14,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=8.01 2024-09-16 12:40:16,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=8.52 2024-09-16 12:40:24,353 INFO [train.py:1198] (1/2) Epoch 1, batch 350, loss[loss=0.8701, ctc_loss=1.057, cr_loss=0.173, attn_decoder_loss=0.8455, over 29365.00 frames. ], tot_loss[loss=1.186, ctc_loss=1.217, cr_loss=0.1438, attn_decoder_loss=1.18, over 4794592.24 frames. ], batch size: 71, lr: 3.83e-02, grad_scale: 8.0 2024-09-16 12:40:24,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=1400.0, ans=0.5 2024-09-16 12:40:33,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1400.0, ans=0.434375 2024-09-16 12:40:38,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=14.83 vs. limit=5.35 2024-09-16 12:40:54,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=8.04 2024-09-16 12:40:55,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=155.97 vs. limit=8.04 2024-09-16 12:41:04,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.44 vs. limit=8.61 2024-09-16 12:41:05,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1480.0, ans=0.09075000000000001 2024-09-16 12:41:10,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=8.055 2024-09-16 12:41:22,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1520.0, ans=0.42875 2024-09-16 12:41:39,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1560.0, ans=0.22340000000000002 2024-09-16 12:41:39,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=8.085 2024-09-16 12:41:42,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1560.0, ans=0.06490000000000001 2024-09-16 12:41:45,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=8.085 2024-09-16 12:41:47,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=8.67 2024-09-16 12:41:54,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=25.85 vs. limit=8.085 2024-09-16 12:41:57,757 INFO [train.py:1198] (1/2) Epoch 1, batch 400, loss[loss=0.939, ctc_loss=1.148, cr_loss=0.2013, attn_decoder_loss=0.9113, over 29723.00 frames. ], tot_loss[loss=1.118, ctc_loss=1.196, cr_loss=0.156, attn_decoder_loss=1.106, over 5024260.41 frames. ], batch size: 82, lr: 4.05e-02, grad_scale: 8.0 2024-09-16 12:41:59,565 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.379e+02 1.617e+02 1.838e+02 2.123e+02 1.289e+03, threshold=3.677e+02, percent-clipped=4.0 2024-09-16 12:42:14,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1600.0, ans=0.09000000000000001 2024-09-16 12:42:15,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.38 vs. limit=8.7 2024-09-16 12:42:16,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1640.0, ans=0.423125 2024-09-16 12:42:26,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=1640.0, ans=0.1385 2024-09-16 12:42:36,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=8.76 2024-09-16 12:42:36,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1680.0, ans=0.0622 2024-09-16 12:42:41,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=20.29 vs. limit=8.13 2024-09-16 12:42:50,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=8.13 2024-09-16 12:42:57,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1720.0, ans=6.075 2024-09-16 12:42:59,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=6.25 vs. limit=4.688 2024-09-16 12:43:03,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=8.145 2024-09-16 12:43:29,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=45.74 vs. limit=8.16 2024-09-16 12:43:33,846 INFO [train.py:1198] (1/2) Epoch 1, batch 450, loss[loss=0.9004, ctc_loss=1.122, cr_loss=0.281, attn_decoder_loss=0.8696, over 29687.00 frames. ], tot_loss[loss=1.065, ctc_loss=1.174, cr_loss=0.1714, attn_decoder_loss=1.049, over 5187486.67 frames. ], batch size: 83, lr: 4.28e-02, grad_scale: 8.0 2024-09-16 12:43:41,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1800.0, ans=0.282 2024-09-16 12:44:00,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=31.25 vs. limit=8.19 2024-09-16 12:44:03,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1840.0, ans=0.41375 2024-09-16 12:44:04,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.98 vs. limit=8.879999999999999 2024-09-16 12:44:09,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.98 vs. limit=8.91 2024-09-16 12:44:14,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=1880.0, ans=0.14425 2024-09-16 12:44:16,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1880.0, ans=6.175 2024-09-16 12:44:29,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=101.12 vs. limit=8.22 2024-09-16 12:44:34,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1920.0, ans=0.26 2024-09-16 12:44:44,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=8.94 2024-09-16 12:44:51,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=61.13 vs. limit=8.235 2024-09-16 12:44:58,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.68 vs. limit=8.235 2024-09-16 12:45:01,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1960.0, ans=0.2804 2024-09-16 12:45:03,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=9.0 2024-09-16 12:45:04,717 INFO [train.py:1198] (1/2) Epoch 1, batch 500, loss[loss=0.9344, ctc_loss=1.145, cr_loss=0.2492, attn_decoder_loss=0.9055, over 29440.00 frames. ], tot_loss[loss=1.021, ctc_loss=1.152, cr_loss=0.1904, attn_decoder_loss=1.003, over 5329203.13 frames. ], batch size: 94, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:45:06,532 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.448e+02 1.612e+02 2.007e+02 3.487e+02, threshold=3.225e+02, percent-clipped=0.0 2024-09-16 12:45:11,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=8.25 2024-09-16 12:45:11,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=8.25 2024-09-16 12:45:13,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.10 vs. limit=5.5 2024-09-16 12:45:17,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.77 vs. limit=4.4 2024-09-16 12:45:30,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2040.0, ans=0.404375 2024-09-16 12:45:42,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=8.28 2024-09-16 12:45:43,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=23.93 vs. limit=8.28 2024-09-16 12:45:44,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=8.28 2024-09-16 12:45:45,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=36.98 vs. limit=8.28 2024-09-16 12:45:53,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=75.23 vs. limit=8.28 2024-09-16 12:46:10,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2120.0, ans=0.1205 2024-09-16 12:46:13,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=8.295 2024-09-16 12:46:19,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2160.0, ans=0.39875 2024-09-16 12:46:20,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.80 vs. limit=5.54 2024-09-16 12:46:25,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=2160.0, ans=0.39875 2024-09-16 12:46:31,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=32.70 vs. limit=8.31 2024-09-16 12:46:34,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=77.55 vs. limit=8.325 2024-09-16 12:46:35,772 INFO [train.py:1198] (1/2) Epoch 1, batch 550, loss[loss=0.8929, ctc_loss=1.072, cr_loss=0.3298, attn_decoder_loss=0.8657, over 28897.00 frames. ], tot_loss[loss=0.9889, ctc_loss=1.13, cr_loss=0.211, attn_decoder_loss=0.9685, over 5422479.21 frames. ], batch size: 104, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:46:38,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2200.0, ans=0.22499999999999998 2024-09-16 12:46:44,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=29.44 vs. limit=8.325 2024-09-16 12:46:46,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=8.325 2024-09-16 12:46:51,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2200.0, ans=0.396875 2024-09-16 12:46:59,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2240.0, ans=8.34 2024-09-16 12:47:26,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2280.0, ans=0.2772 2024-09-16 12:47:34,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=9.24 2024-09-16 12:47:43,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=9.24 2024-09-16 12:47:52,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=9.27 2024-09-16 12:48:03,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.09 vs. limit=4.944 2024-09-16 12:48:04,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2360.0, ans=0.2764 2024-09-16 12:48:05,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.21 vs. limit=8.385 2024-09-16 12:48:12,274 INFO [train.py:1198] (1/2) Epoch 1, batch 600, loss[loss=0.9101, ctc_loss=1.052, cr_loss=0.3234, attn_decoder_loss=0.8871, over 29327.00 frames. ], tot_loss[loss=0.9609, ctc_loss=1.105, cr_loss=0.2342, attn_decoder_loss=0.9397, over 5509506.32 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:48:12,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2400.0, ans=0.3875 2024-09-16 12:48:14,063 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.236e+02 1.461e+02 1.874e+02 1.065e+03, threshold=2.921e+02, percent-clipped=6.0 2024-09-16 12:48:21,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=2400.0, ans=0.046 2024-09-16 12:48:31,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=28.44 vs. limit=8.415 2024-09-16 12:48:39,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2440.0, ans=0.385625 2024-09-16 12:48:44,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=8.415 2024-09-16 12:48:48,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.37 vs. limit=4.992 2024-09-16 12:48:48,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=8.43 2024-09-16 12:48:49,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2480.0, ans=8.43 2024-09-16 12:49:03,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=8.43 2024-09-16 12:49:03,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=8.43 2024-09-16 12:49:04,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.13 vs. limit=9.39 2024-09-16 12:49:10,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.83 vs. limit=8.445 2024-09-16 12:49:16,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2520.0, ans=0.1055 2024-09-16 12:49:19,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=9.39 2024-09-16 12:49:19,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=8.445 2024-09-16 12:49:29,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2560.0, ans=0.38 2024-09-16 12:49:34,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2560.0, ans=0.38 2024-09-16 12:49:38,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.98 vs. limit=5.024 2024-09-16 12:49:41,293 INFO [train.py:1198] (1/2) Epoch 1, batch 650, loss[loss=0.8594, ctc_loss=0.9786, cr_loss=0.3531, attn_decoder_loss=0.8383, over 29748.00 frames. ], tot_loss[loss=0.932, ctc_loss=1.071, cr_loss=0.2569, attn_decoder_loss=0.9108, over 5586870.07 frames. ], batch size: 81, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:49:46,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2600.0, ans=0.809 2024-09-16 12:49:53,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.12 vs. limit=9.45 2024-09-16 12:50:08,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.12 vs. limit=6.32 2024-09-16 12:50:10,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.61 vs. limit=5.056 2024-09-16 12:50:12,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=44.71 vs. limit=8.49 2024-09-16 12:50:14,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=4.5280000000000005 2024-09-16 12:50:18,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2680.0, ans=0.374375 2024-09-16 12:50:26,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2680.0, ans=0.374375 2024-09-16 12:50:27,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2680.0, ans=0.374375 2024-09-16 12:50:29,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=2680.0, ans=0.09949999999999999 2024-09-16 12:50:39,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=8.52 2024-09-16 12:50:54,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.75 vs. limit=9.57 2024-09-16 12:51:06,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=11.80 vs. limit=9.57 2024-09-16 12:51:06,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=9.57 2024-09-16 12:51:08,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.77 vs. limit=4.552 2024-09-16 12:51:12,610 INFO [train.py:1198] (1/2) Epoch 1, batch 700, loss[loss=0.7594, ctc_loss=0.8418, cr_loss=0.327, attn_decoder_loss=0.743, over 29566.00 frames. ], tot_loss[loss=0.9094, ctc_loss=1.043, cr_loss=0.2789, attn_decoder_loss=0.8884, over 5637769.27 frames. ], batch size: 76, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:51:14,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.893e+01 1.304e+02 1.539e+02 2.330e+02 9.417e+02, threshold=3.077e+02, percent-clipped=6.0 2024-09-16 12:51:17,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=8.55 2024-09-16 12:51:30,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2840.0, ans=0.2216 2024-09-16 12:51:35,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.47 vs. limit=5.71 2024-09-16 12:51:39,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=9.629999999999999 2024-09-16 12:51:49,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=18.12 vs. limit=8.58 2024-09-16 12:51:52,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=8.58 2024-09-16 12:51:57,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2880.0, ans=0.365 2024-09-16 12:52:05,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2920.0, ans=0.36312500000000003 2024-09-16 12:52:10,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=8.595 2024-09-16 12:52:18,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2920.0, ans=0.36312500000000003 2024-09-16 12:52:18,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=8.595 2024-09-16 12:52:22,217 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=8.595 2024-09-16 12:52:28,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.33 vs. limit=5.74 2024-09-16 12:52:33,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=2960.0, ans=0.13 2024-09-16 12:52:41,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.64 vs. limit=9.72 2024-09-16 12:52:43,662 INFO [train.py:1198] (1/2) Epoch 1, batch 750, loss[loss=0.7579, ctc_loss=0.8974, cr_loss=0.3135, attn_decoder_loss=0.7355, over 29698.00 frames. ], tot_loss[loss=0.8801, ctc_loss=1.01, cr_loss=0.2922, attn_decoder_loss=0.8591, over 5676368.83 frames. ], batch size: 82, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:52:44,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.51 vs. limit=5.75 2024-09-16 12:52:44,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=11.26 vs. limit=9.75 2024-09-16 12:52:49,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3000.0, ans=0.245 2024-09-16 12:52:51,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=8.625 2024-09-16 12:53:00,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3000.0, ans=0.040625 2024-09-16 12:53:22,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3080.0, ans=0.08449999999999999 2024-09-16 12:53:43,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=21.32 vs. limit=6.5600000000000005 2024-09-16 12:53:44,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=3.468 2024-09-16 12:53:49,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3120.0, ans=0.2688 2024-09-16 12:53:50,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.71 vs. limit=6.5600000000000005 2024-09-16 12:53:55,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=10.68 vs. limit=9.870000000000001 2024-09-16 12:54:12,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.50 vs. limit=5.8 2024-09-16 12:54:14,212 INFO [train.py:1198] (1/2) Epoch 1, batch 800, loss[loss=0.6653, ctc_loss=0.7983, cr_loss=0.3214, attn_decoder_loss=0.6434, over 29608.00 frames. ], tot_loss[loss=0.8458, ctc_loss=0.9755, cr_loss=0.3007, attn_decoder_loss=0.8247, over 5706530.28 frames. ], batch size: 73, lr: 4.49e-02, grad_scale: 16.0 2024-09-16 12:54:15,975 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.656e+02 2.537e+02 3.189e+02 4.432e+02 8.958e+02, threshold=6.378e+02, percent-clipped=52.0 2024-09-16 12:54:25,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=8.7 2024-09-16 12:54:42,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=8.715 2024-09-16 12:54:47,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3280.0, ans=0.2172 2024-09-16 12:54:51,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3280.0, ans=0.34625 2024-09-16 12:54:54,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3280.0, ans=0.2672 2024-09-16 12:54:58,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3280.0, ans=0.34625 2024-09-16 12:55:38,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3360.0, ans=0.3425 2024-09-16 12:55:40,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3360.0, ans=0.017439999999999997 2024-09-16 12:55:41,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3400.0, ans=0.340625 2024-09-16 12:55:43,347 INFO [train.py:1198] (1/2) Epoch 1, batch 850, loss[loss=0.7134, ctc_loss=0.8627, cr_loss=0.3776, attn_decoder_loss=0.6884, over 29710.00 frames. ], tot_loss[loss=0.8067, ctc_loss=0.9395, cr_loss=0.3063, attn_decoder_loss=0.7852, over 5736035.44 frames. ], batch size: 89, lr: 4.49e-02, grad_scale: 16.0 2024-09-16 12:55:51,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=50.71 vs. limit=10.05 2024-09-16 12:55:58,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=10.05 2024-09-16 12:56:00,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3440.0, ans=0.2656 2024-09-16 12:56:07,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3440.0, ans=0.022599999999999995 2024-09-16 12:56:09,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.78 vs. limit=6.72 2024-09-16 12:56:18,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=10.11 2024-09-16 12:56:22,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.71 vs. limit=8.805 2024-09-16 12:56:24,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=10.11 2024-09-16 12:56:27,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=10.11 2024-09-16 12:56:30,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=3.5220000000000002 2024-09-16 12:56:30,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=10.11 2024-09-16 12:56:52,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=3560.0, ans=0.333125 2024-09-16 12:57:07,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3600.0, ans=0.03875000000000001 2024-09-16 12:57:08,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.27 vs. limit=5.9 2024-09-16 12:57:08,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.30 vs. limit=5.9 2024-09-16 12:57:11,500 INFO [train.py:1198] (1/2) Epoch 1, batch 900, loss[loss=0.5746, ctc_loss=0.7215, cr_loss=0.3425, attn_decoder_loss=0.5507, over 29594.00 frames. ], tot_loss[loss=0.7685, ctc_loss=0.9048, cr_loss=0.3125, attn_decoder_loss=0.7464, over 5741083.57 frames. ], batch size: 73, lr: 4.48e-02, grad_scale: 16.0 2024-09-16 12:57:13,150 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.606e+02 2.694e+02 3.422e+02 4.565e+02 1.517e+03, threshold=6.845e+02, percent-clipped=7.0 2024-09-16 12:57:15,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3600.0, ans=0.33125 2024-09-16 12:57:20,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3600.0, ans=0.214 2024-09-16 12:57:28,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3640.0, ans=0.329375 2024-09-16 12:57:36,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=8.865 2024-09-16 12:57:55,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.92 vs. limit=8.879999999999999 2024-09-16 12:58:17,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=8.895 2024-09-16 12:58:23,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=8.91 2024-09-16 12:58:26,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3760.0, ans=0.015399999999999997 2024-09-16 12:58:31,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3760.0, ans=0.015399999999999997 2024-09-16 12:58:36,753 INFO [train.py:1198] (1/2) Epoch 1, batch 950, loss[loss=0.5325, ctc_loss=0.6755, cr_loss=0.3531, attn_decoder_loss=0.5088, over 29511.00 frames. ], tot_loss[loss=0.7311, ctc_loss=0.8712, cr_loss=0.318, attn_decoder_loss=0.7085, over 5742693.85 frames. ], batch size: 74, lr: 4.48e-02, grad_scale: 16.0 2024-09-16 12:58:42,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=93.57 vs. limit=10.35 2024-09-16 12:58:43,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=8.925 2024-09-16 12:58:45,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3800.0, ans=0.014499999999999985 2024-09-16 12:58:49,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=3800.0, ans=0.014499999999999985 2024-09-16 12:58:53,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=10.379999999999999 2024-09-16 12:59:03,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.92 vs. limit=5.96 2024-09-16 12:59:12,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3880.0, ans=0.7642 2024-09-16 12:59:52,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3960.0, ans=0.0050000000000000044 2024-09-16 12:59:54,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3960.0, ans=0.31437499999999996 2024-09-16 13:00:00,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3960.0, ans=0.16337000000000002 2024-09-16 13:00:04,837 INFO [train.py:1198] (1/2) Epoch 1, batch 1000, loss[loss=0.5231, ctc_loss=0.6536, cr_loss=0.3227, attn_decoder_loss=0.5015, over 29487.00 frames. ], tot_loss[loss=0.6962, ctc_loss=0.8375, cr_loss=0.3274, attn_decoder_loss=0.6732, over 5736289.50 frames. ], batch size: 77, lr: 4.48e-02, grad_scale: 8.0 2024-09-16 13:00:08,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.514e+02 2.266e+02 2.878e+02 3.816e+02 1.272e+03, threshold=5.756e+02, percent-clipped=5.0 2024-09-16 13:00:17,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=3.6 2024-09-16 13:00:29,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=9.015 2024-09-16 13:00:33,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=9.97 vs. limit=9.015 2024-09-16 13:00:51,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=4080.0, ans=0.037250000000000005 2024-09-16 13:00:52,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=4080.0, ans=0.009982608695652173 2024-09-16 13:00:56,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=4120.0, ans=0.025 2024-09-16 13:00:57,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=9.045 2024-09-16 13:01:04,549 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:01:16,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4160.0, ans=0.07400000000000001 2024-09-16 13:01:21,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4160.0, ans=0.7544 2024-09-16 13:01:31,753 INFO [train.py:1198] (1/2) Epoch 1, batch 1050, loss[loss=0.5569, ctc_loss=0.7004, cr_loss=0.3932, attn_decoder_loss=0.5322, over 29692.00 frames. ], tot_loss[loss=0.6601, ctc_loss=0.8003, cr_loss=0.3366, attn_decoder_loss=0.637, over 5744605.50 frames. ], batch size: 85, lr: 4.48e-02, grad_scale: 8.0 2024-09-16 13:01:49,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4240.0, ans=0.7516 2024-09-16 13:02:08,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.13 vs. limit=7.140000000000001 2024-09-16 13:02:18,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=9.105 2024-09-16 13:02:22,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=4320.0, ans=0.7488 2024-09-16 13:02:29,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4320.0, ans=0.2975 2024-09-16 13:02:30,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=10.74 2024-09-16 13:02:34,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4320.0, ans=0.7488 2024-09-16 13:02:37,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4360.0, ans=0.0485 2024-09-16 13:02:55,799 INFO [train.py:1198] (1/2) Epoch 1, batch 1100, loss[loss=0.5542, ctc_loss=0.6697, cr_loss=0.3214, attn_decoder_loss=0.5342, over 29456.00 frames. ], tot_loss[loss=0.6286, ctc_loss=0.7654, cr_loss=0.3465, attn_decoder_loss=0.6056, over 5757496.03 frames. ], batch size: 78, lr: 4.48e-02, grad_scale: 8.0 2024-09-16 13:02:59,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.372e+02 1.990e+02 2.415e+02 3.242e+02 8.137e+02, threshold=4.830e+02, percent-clipped=5.0 2024-09-16 13:03:06,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=10.8 2024-09-16 13:03:20,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=55.49 vs. limit=10.83 2024-09-16 13:03:31,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=107.79 vs. limit=10.86 2024-09-16 13:03:41,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4480.0, ans=0.009895652173913043 2024-09-16 13:03:45,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.09 vs. limit=6.13 2024-09-16 13:03:55,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.30 vs. limit=9.195 2024-09-16 13:03:59,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.52 vs. limit=7.26 2024-09-16 13:04:21,895 INFO [train.py:1198] (1/2) Epoch 1, batch 1150, loss[loss=0.4815, ctc_loss=0.5987, cr_loss=0.3819, attn_decoder_loss=0.46, over 29444.00 frames. ], tot_loss[loss=0.6012, ctc_loss=0.7336, cr_loss=0.3545, attn_decoder_loss=0.5786, over 5754175.58 frames. ], batch size: 78, lr: 4.47e-02, grad_scale: 8.0 2024-09-16 13:04:24,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4600.0, ans=0.009869565217391305 2024-09-16 13:04:40,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4640.0, ans=0.04733333333333334 2024-09-16 13:04:41,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.98 vs. limit=6.16 2024-09-16 13:05:01,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=11.01 2024-09-16 13:05:03,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=11.01 2024-09-16 13:05:06,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=11.57 vs. limit=11.01 2024-09-16 13:05:20,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=5.888 2024-09-16 13:05:47,981 INFO [train.py:1198] (1/2) Epoch 1, batch 1200, loss[loss=0.5262, ctc_loss=0.6367, cr_loss=0.37, attn_decoder_loss=0.5057, over 29691.00 frames. ], tot_loss[loss=0.5784, ctc_loss=0.7061, cr_loss=0.3629, attn_decoder_loss=0.5561, over 5746542.15 frames. ], batch size: 85, lr: 4.47e-02, grad_scale: 16.0 2024-09-16 13:05:51,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.465e+02 1.904e+02 2.227e+02 2.860e+02 9.470e+02, threshold=4.454e+02, percent-clipped=3.0 2024-09-16 13:06:05,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=11.129999999999999 2024-09-16 13:06:13,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4840.0, ans=0.273125 2024-09-16 13:06:27,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=4880.0, ans=7.4399999999999995 2024-09-16 13:06:28,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4880.0, ans=0.009808695652173913 2024-09-16 13:06:33,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=4880.0, ans=0.1 2024-09-16 13:06:35,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=9.33 2024-09-16 13:06:55,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4960.0, ans=6.24 2024-09-16 13:06:58,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=9.36 2024-09-16 13:07:07,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4960.0, ans=0.26749999999999996 2024-09-16 13:07:10,530 INFO [train.py:1198] (1/2) Epoch 1, batch 1250, loss[loss=0.5045, ctc_loss=0.6007, cr_loss=0.4467, attn_decoder_loss=0.4839, over 29495.00 frames. ], tot_loss[loss=0.5576, ctc_loss=0.6793, cr_loss=0.3722, attn_decoder_loss=0.5358, over 5774089.14 frames. ], batch size: 92, lr: 4.47e-02, grad_scale: 16.0 2024-09-16 13:07:18,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=11.25 2024-09-16 13:07:24,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5000.0, ans=0.265625 2024-09-16 13:07:35,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5040.0, ans=0.26375000000000004 2024-09-16 13:07:40,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5040.0, ans=0.2496 2024-09-16 13:07:44,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.41 vs. limit=9.405 2024-09-16 13:07:45,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=5080.0, ans=0.2762 2024-09-16 13:07:45,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=5080.0, ans=11.31 2024-09-16 13:07:59,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.27 vs. limit=6.27 2024-09-16 13:08:05,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5120.0, ans=0.26 2024-09-16 13:08:08,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5120.0, ans=0.04533333333333334 2024-09-16 13:08:31,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5160.0, ans=0.258125 2024-09-16 13:08:34,868 INFO [train.py:1198] (1/2) Epoch 1, batch 1300, loss[loss=0.4924, ctc_loss=0.5938, cr_loss=0.4348, attn_decoder_loss=0.4714, over 28309.00 frames. ], tot_loss[loss=0.5372, ctc_loss=0.6517, cr_loss=0.378, attn_decoder_loss=0.5161, over 5778963.42 frames. ], batch size: 111, lr: 4.47e-02, grad_scale: 16.0 2024-09-16 13:08:37,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.22 vs. limit=6.3 2024-09-16 13:08:38,070 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.446e+02 1.796e+02 2.066e+02 2.551e+02 7.251e+02, threshold=4.131e+02, percent-clipped=4.0 2024-09-16 13:08:42,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.40 vs. limit=6.3 2024-09-16 13:09:18,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5280.0, ans=0.2525 2024-09-16 13:09:19,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5280.0, ans=0.2525 2024-09-16 13:09:44,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5360.0, ans=0.24875000000000003 2024-09-16 13:09:45,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5360.0, ans=0.044333333333333336 2024-09-16 13:09:59,473 INFO [train.py:1198] (1/2) Epoch 1, batch 1350, loss[loss=0.4423, ctc_loss=0.5136, cr_loss=0.413, attn_decoder_loss=0.4252, over 29768.00 frames. ], tot_loss[loss=0.5193, ctc_loss=0.627, cr_loss=0.3838, attn_decoder_loss=0.4988, over 5795618.54 frames. ], batch size: 81, lr: 4.46e-02, grad_scale: 16.0 2024-09-16 13:10:02,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5400.0, ans=0.04416666666666667 2024-09-16 13:10:03,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=11.55 2024-09-16 13:10:06,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.73 vs. limit=7.7 2024-09-16 13:10:16,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=11.58 2024-09-16 13:10:53,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5520.0, ans=0.04949747468305833 2024-09-16 13:10:56,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5520.0, ans=0.24125000000000002 2024-09-16 13:10:59,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5520.0, ans=0.7068000000000001 2024-09-16 13:11:08,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=3.834 2024-09-16 13:11:21,352 INFO [train.py:1198] (1/2) Epoch 1, batch 1400, loss[loss=0.4029, ctc_loss=0.4551, cr_loss=0.3809, attn_decoder_loss=0.3887, over 29609.00 frames. ], tot_loss[loss=0.5048, ctc_loss=0.6054, cr_loss=0.3899, attn_decoder_loss=0.4849, over 5806538.94 frames. ], batch size: 69, lr: 4.46e-02, grad_scale: 16.0 2024-09-16 13:11:24,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.351e+02 1.700e+02 1.984e+02 2.487e+02 6.195e+02, threshold=3.968e+02, percent-clipped=5.0 2024-09-16 13:11:27,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=11.7 2024-09-16 13:11:34,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5600.0, ans=0.8059999999999999 2024-09-16 13:11:34,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=5600.0, ans=0.025 2024-09-16 13:11:42,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5640.0, ans=0.23562499999999997 2024-09-16 13:11:44,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5640.0, ans=0.23562499999999997 2024-09-16 13:11:45,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5640.0, ans=0.04316666666666667 2024-09-16 13:11:46,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.02 vs. limit=9.615 2024-09-16 13:11:57,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=5680.0, ans=9.629999999999999 2024-09-16 13:12:05,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5680.0, ans=0.23375 2024-09-16 13:12:37,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5760.0, ans=0.2424 2024-09-16 13:12:42,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5800.0, ans=0.22812500000000002 2024-09-16 13:12:44,063 INFO [train.py:1198] (1/2) Epoch 1, batch 1450, loss[loss=0.4753, ctc_loss=0.5469, cr_loss=0.4441, attn_decoder_loss=0.4575, over 29433.00 frames. ], tot_loss[loss=0.4929, ctc_loss=0.5869, cr_loss=0.3951, attn_decoder_loss=0.4736, over 5805067.71 frames. ], batch size: 94, lr: 4.46e-02, grad_scale: 16.0 2024-09-16 13:12:50,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=5800.0, ans=0.06375 2024-09-16 13:13:18,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=11.91 2024-09-16 13:13:42,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=9.72 2024-09-16 13:14:06,651 INFO [train.py:1198] (1/2) Epoch 1, batch 1500, loss[loss=0.4375, ctc_loss=0.4964, cr_loss=0.3905, attn_decoder_loss=0.4223, over 29636.00 frames. ], tot_loss[loss=0.4822, ctc_loss=0.5701, cr_loss=0.4005, attn_decoder_loss=0.4636, over 5805817.46 frames. ], batch size: 86, lr: 4.46e-02, grad_scale: 16.0 2024-09-16 13:14:09,809 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.657e+02 1.840e+02 2.318e+02 6.248e+02, threshold=3.680e+02, percent-clipped=4.0 2024-09-16 13:14:25,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=9.765 2024-09-16 13:14:27,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=6040.0, ans=0.21687499999999998 2024-09-16 13:14:28,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.68 vs. limit=6.416 2024-09-16 13:14:32,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=6040.0, ans=12.030000000000001 2024-09-16 13:14:43,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.61 vs. limit=6.52 2024-09-16 13:14:58,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=3.918 2024-09-16 13:15:05,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=6120.0, ans=0.6858 2024-09-16 13:15:12,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6160.0, ans=0.2384 2024-09-16 13:15:20,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=6160.0, ans=0.21125 2024-09-16 13:15:27,846 INFO [train.py:1198] (1/2) Epoch 1, batch 1550, loss[loss=0.4431, ctc_loss=0.5033, cr_loss=0.4189, attn_decoder_loss=0.4271, over 29487.00 frames. ], tot_loss[loss=0.4731, ctc_loss=0.5551, cr_loss=0.4039, attn_decoder_loss=0.455, over 5781683.42 frames. ], batch size: 90, lr: 4.45e-02, grad_scale: 16.0 2024-09-16 13:15:44,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=6240.0, ans=0.20750000000000002 2024-09-16 13:16:02,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=6280.0, ans=0.009504347826086957 2024-09-16 13:16:07,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=6280.0, ans=0.205625 2024-09-16 13:16:20,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.68 vs. limit=9.870000000000001 2024-09-16 13:16:26,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.07 vs. limit=6.58 2024-09-16 13:16:29,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=6320.0, ans=0.025 2024-09-16 13:16:36,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=6360.0, ans=0.20187500000000003 2024-09-16 13:16:50,876 INFO [train.py:1198] (1/2) Epoch 1, batch 1600, loss[loss=0.4347, ctc_loss=0.4913, cr_loss=0.4494, attn_decoder_loss=0.4184, over 29660.00 frames. ], tot_loss[loss=0.4642, ctc_loss=0.5404, cr_loss=0.4075, attn_decoder_loss=0.4467, over 5764191.38 frames. ], batch size: 85, lr: 4.45e-02, grad_scale: 32.0 2024-09-16 13:16:51,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=6400.0, ans=0.060000000000000005 2024-09-16 13:16:52,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=6400.0, ans=0.2 2024-09-16 13:16:53,977 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.353e+02 1.789e+02 2.003e+02 2.671e+02 7.111e+02, threshold=4.005e+02, percent-clipped=7.0 2024-09-16 13:17:10,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=6440.0, ans=0.03983333333333334 2024-09-16 13:17:14,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=6440.0, ans=0.0 2024-09-16 13:17:18,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6440.0, ans=0.23559999999999998 2024-09-16 13:17:38,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=6.6080000000000005 2024-09-16 13:18:03,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=9.96 2024-09-16 13:18:07,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=6560.0, ans=0.8156 2024-09-16 13:18:13,094 INFO [train.py:1198] (1/2) Epoch 1, batch 1650, loss[loss=0.4584, ctc_loss=0.5195, cr_loss=0.4559, attn_decoder_loss=0.4414, over 29716.00 frames. ], tot_loss[loss=0.4557, ctc_loss=0.5259, cr_loss=0.4101, attn_decoder_loss=0.4387, over 5758841.60 frames. ], batch size: 89, lr: 4.45e-02, grad_scale: 32.0 2024-09-16 13:18:34,375 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:18:55,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.75 vs. limit=12.51 2024-09-16 13:19:12,804 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:19:32,655 INFO [train.py:1198] (1/2) Epoch 1, batch 1700, loss[loss=0.3721, ctc_loss=0.4059, cr_loss=0.3889, attn_decoder_loss=0.3597, over 29594.00 frames. ], tot_loss[loss=0.4472, ctc_loss=0.511, cr_loss=0.4136, attn_decoder_loss=0.431, over 5779740.81 frames. ], batch size: 69, lr: 4.44e-02, grad_scale: 16.0 2024-09-16 13:19:37,456 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.203e+02 1.510e+02 1.749e+02 2.059e+02 5.300e+02, threshold=3.498e+02, percent-clipped=2.0 2024-09-16 13:19:41,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=6800.0, ans=0.18125000000000002 2024-09-16 13:19:44,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.89 vs. limit=10.05 2024-09-16 13:19:49,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=6840.0, ans=0.179375 2024-09-16 13:20:10,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.59 vs. limit=10.08 2024-09-16 13:20:21,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.90 vs. limit=8.46 2024-09-16 13:20:23,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=10.095 2024-09-16 13:20:26,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=6920.0, ans=0.03783333333333334 2024-09-16 13:20:34,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.29 vs. limit=12.690000000000001 2024-09-16 13:20:41,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=6960.0, ans=0.17375000000000002 2024-09-16 13:20:47,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=12.719999999999999 2024-09-16 13:20:54,500 INFO [train.py:1198] (1/2) Epoch 1, batch 1750, loss[loss=0.3574, ctc_loss=0.3785, cr_loss=0.3536, attn_decoder_loss=0.3471, over 29372.00 frames. ], tot_loss[loss=0.4398, ctc_loss=0.4974, cr_loss=0.416, attn_decoder_loss=0.4241, over 5788763.85 frames. ], batch size: 67, lr: 4.44e-02, grad_scale: 16.0 2024-09-16 13:20:57,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=7000.0, ans=0.037500000000000006 2024-09-16 13:21:01,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=7000.0, ans=0.0 2024-09-16 13:21:13,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=7040.0, ans=0.16999999999999998 2024-09-16 13:21:23,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=8.83 vs. limit=6.816 2024-09-16 13:21:26,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=7080.0, ans=0.16812500000000002 2024-09-16 13:21:44,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.92 vs. limit=12.84 2024-09-16 13:21:55,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.83 vs. limit=6.848 2024-09-16 13:22:00,390 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.98 vs. limit=4.074 2024-09-16 13:22:01,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=7160.0, ans=0.164375 2024-09-16 13:22:16,327 INFO [train.py:1198] (1/2) Epoch 1, batch 1800, loss[loss=0.4329, ctc_loss=0.4635, cr_loss=0.4553, attn_decoder_loss=0.4194, over 29691.00 frames. ], tot_loss[loss=0.4348, ctc_loss=0.4879, cr_loss=0.4193, attn_decoder_loss=0.4196, over 5790892.75 frames. ], batch size: 83, lr: 4.44e-02, grad_scale: 16.0 2024-09-16 13:22:21,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.247e+02 1.571e+02 1.759e+02 2.049e+02 3.849e+02, threshold=3.518e+02, percent-clipped=1.0 2024-09-16 13:22:40,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7240.0, ans=0.2276 2024-09-16 13:22:56,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7280.0, ans=0.2272 2024-09-16 13:22:57,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7280.0, ans=0.2272 2024-09-16 13:23:05,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=7320.0, ans=0.025 2024-09-16 13:23:35,118 INFO [train.py:1198] (1/2) Epoch 1, batch 1850, loss[loss=0.4232, ctc_loss=0.4505, cr_loss=0.4654, attn_decoder_loss=0.4099, over 29636.00 frames. ], tot_loss[loss=0.4289, ctc_loss=0.4767, cr_loss=0.4213, attn_decoder_loss=0.4143, over 5795186.17 frames. ], batch size: 86, lr: 4.43e-02, grad_scale: 16.0 2024-09-16 13:23:41,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=7400.0, ans=0.009260869565217392 2024-09-16 13:23:46,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=7400.0, ans=0.153125 2024-09-16 13:23:49,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=7440.0, ans=0.15125 2024-09-16 13:23:53,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.87 vs. limit=13.08 2024-09-16 13:23:54,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=7440.0, ans=0.025 2024-09-16 13:24:21,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=7480.0, ans=0.14937499999999998 2024-09-16 13:24:33,713 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=10.32 2024-09-16 13:24:44,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.18 vs. limit=13.17 2024-09-16 13:24:54,517 INFO [train.py:1198] (1/2) Epoch 1, batch 1900, loss[loss=0.4093, ctc_loss=0.434, cr_loss=0.4424, attn_decoder_loss=0.3967, over 29697.00 frames. ], tot_loss[loss=0.425, ctc_loss=0.4687, cr_loss=0.4244, attn_decoder_loss=0.4107, over 5803239.03 frames. ], batch size: 89, lr: 4.43e-02, grad_scale: 16.0 2024-09-16 13:24:56,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=7600.0, ans=0.025 2024-09-16 13:24:59,248 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.597e+02 1.785e+02 2.217e+02 4.479e+02, threshold=3.571e+02, percent-clipped=3.0 2024-09-16 13:25:10,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=7640.0, ans=0.07 2024-09-16 13:25:45,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=7720.0, ans=0.051750000000000004 2024-09-16 13:26:07,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=10.41 2024-09-16 13:26:14,694 INFO [train.py:1198] (1/2) Epoch 1, batch 1950, loss[loss=0.3983, ctc_loss=0.4292, cr_loss=0.4451, attn_decoder_loss=0.385, over 29453.00 frames. ], tot_loss[loss=0.4219, ctc_loss=0.461, cr_loss=0.428, attn_decoder_loss=0.408, over 5818422.84 frames. ], batch size: 78, lr: 4.43e-02, grad_scale: 16.0 2024-09-16 13:26:21,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7800.0, ans=0.222 2024-09-16 13:26:23,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=7800.0, ans=0.034166666666666665 2024-09-16 13:26:24,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=7800.0, ans=0.034166666666666665 2024-09-16 13:26:25,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.425 2024-09-16 13:26:29,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=7840.0, ans=0.1325 2024-09-16 13:26:58,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=7880.0, ans=0.009156521739130435 2024-09-16 13:27:23,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=7960.0, ans=0.12687500000000002 2024-09-16 13:27:26,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=7960.0, ans=0.025 2024-09-16 13:27:28,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=7960.0, ans=0.07 2024-09-16 13:27:33,405 INFO [train.py:1198] (1/2) Epoch 1, batch 2000, loss[loss=0.345, ctc_loss=0.3565, cr_loss=0.3486, attn_decoder_loss=0.336, over 29339.00 frames. ], tot_loss[loss=0.419, ctc_loss=0.4544, cr_loss=0.4304, attn_decoder_loss=0.4055, over 5796423.13 frames. ], batch size: 67, lr: 4.42e-02, grad_scale: 32.0 2024-09-16 13:27:38,130 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.451e+02 1.684e+02 2.248e+02 3.741e+02, threshold=3.368e+02, percent-clipped=1.0 2024-09-16 13:27:44,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=8000.0, ans=0.03333333333333334 2024-09-16 13:27:53,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.20 vs. limit=4.2059999999999995 2024-09-16 13:28:01,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=8040.0, ans=0.125 2024-09-16 13:28:16,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=8080.0, ans=0.8308 2024-09-16 13:28:24,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8120.0, ans=0.2188 2024-09-16 13:28:24,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=8120.0, ans=0.125 2024-09-16 13:28:28,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=8120.0, ans=0.125 2024-09-16 13:28:30,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=8120.0, ans=0.125 2024-09-16 13:28:31,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.72 vs. limit=7.029999999999999 2024-09-16 13:28:52,988 INFO [train.py:1198] (1/2) Epoch 1, batch 2050, loss[loss=0.3787, ctc_loss=0.3994, cr_loss=0.4053, attn_decoder_loss=0.3674, over 29415.00 frames. ], tot_loss[loss=0.4147, ctc_loss=0.4466, cr_loss=0.4303, attn_decoder_loss=0.4016, over 5788353.58 frames. ], batch size: 70, lr: 4.42e-02, grad_scale: 16.0 2024-09-16 13:28:55,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=4.23 2024-09-16 13:29:05,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=8200.0, ans=0.125 2024-09-16 13:29:12,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.89 vs. limit=10.59 2024-09-16 13:29:13,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=8240.0, ans=0.0 2024-09-16 13:29:13,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8240.0, ans=0.21760000000000002 2024-09-16 13:29:21,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.35 vs. limit=10.59 2024-09-16 13:29:27,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=8280.0, ans=0.125 2024-09-16 13:29:45,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8320.0, ans=0.2168 2024-09-16 13:29:59,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=8360.0, ans=0.125 2024-09-16 13:30:12,263 INFO [train.py:1198] (1/2) Epoch 1, batch 2100, loss[loss=0.3778, ctc_loss=0.3777, cr_loss=0.4142, attn_decoder_loss=0.3686, over 29748.00 frames. ], tot_loss[loss=0.4105, ctc_loss=0.4389, cr_loss=0.431, attn_decoder_loss=0.3978, over 5800463.16 frames. ], batch size: 81, lr: 4.42e-02, grad_scale: 16.0 2024-09-16 13:30:18,321 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.523e+02 1.725e+02 2.064e+02 6.365e+02, threshold=3.449e+02, percent-clipped=2.0 2024-09-16 13:30:20,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=8400.0, ans=0.125 2024-09-16 13:30:40,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=8440.0, ans=0.2156 2024-09-16 13:30:47,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=10.68 2024-09-16 13:30:53,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.39 vs. limit=4.272 2024-09-16 13:30:57,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=8520.0, ans=0.03116666666666667 2024-09-16 13:31:08,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.70 vs. limit=13.89 2024-09-16 13:31:23,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=8560.0, ans=0.6004 2024-09-16 13:31:29,556 INFO [train.py:1198] (1/2) Epoch 1, batch 2150, loss[loss=0.368, ctc_loss=0.3858, cr_loss=0.3878, attn_decoder_loss=0.3574, over 29465.00 frames. ], tot_loss[loss=0.407, ctc_loss=0.4324, cr_loss=0.4326, attn_decoder_loss=0.3946, over 5814510.90 frames. ], batch size: 78, lr: 4.41e-02, grad_scale: 16.0 2024-09-16 13:31:37,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=8600.0, ans=0.599 2024-09-16 13:31:39,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=10.725 2024-09-16 13:31:50,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=8640.0, ans=0.125 2024-09-16 13:31:55,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8640.0, ans=0.2136 2024-09-16 13:32:00,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.96 vs. limit=10.74 2024-09-16 13:32:12,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=8680.0, ans=0.008982608695652174 2024-09-16 13:32:49,789 INFO [train.py:1198] (1/2) Epoch 1, batch 2200, loss[loss=0.4015, ctc_loss=0.4106, cr_loss=0.4098, attn_decoder_loss=0.3914, over 29627.00 frames. ], tot_loss[loss=0.4049, ctc_loss=0.4273, cr_loss=0.4332, attn_decoder_loss=0.3927, over 5811659.69 frames. ], batch size: 86, lr: 4.41e-02, grad_scale: 16.0 2024-09-16 13:32:55,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.455e+02 1.695e+02 2.050e+02 4.766e+02, threshold=3.390e+02, percent-clipped=3.0 2024-09-16 13:33:09,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=8840.0, ans=0.008947826086956523 2024-09-16 13:33:23,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=8880.0, ans=0.0 2024-09-16 13:33:37,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=8920.0, ans=0.07 2024-09-16 13:33:45,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.81 vs. limit=7.23 2024-09-16 13:33:56,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=8960.0, ans=0.008921739130434782 2024-09-16 13:34:02,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=8960.0, ans=0.025 2024-09-16 13:34:05,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=9000.0, ans=0.02916666666666667 2024-09-16 13:34:09,143 INFO [train.py:1198] (1/2) Epoch 1, batch 2250, loss[loss=0.3939, ctc_loss=0.3955, cr_loss=0.4315, attn_decoder_loss=0.3841, over 29736.00 frames. ], tot_loss[loss=0.4022, ctc_loss=0.4215, cr_loss=0.4337, attn_decoder_loss=0.3904, over 5811802.14 frames. ], batch size: 82, lr: 4.40e-02, grad_scale: 16.0 2024-09-16 13:34:09,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=9000.0, ans=0.025 2024-09-16 13:34:37,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=9040.0, ans=0.025 2024-09-16 13:34:40,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=9080.0, ans=0.008895652173913044 2024-09-16 13:34:57,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=10.92 2024-09-16 13:35:15,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=9160.0, ans=0.0 2024-09-16 13:35:26,221 INFO [train.py:1198] (1/2) Epoch 1, batch 2300, loss[loss=0.3458, ctc_loss=0.3446, cr_loss=0.4192, attn_decoder_loss=0.3366, over 29284.00 frames. ], tot_loss[loss=0.398, ctc_loss=0.4146, cr_loss=0.4332, attn_decoder_loss=0.3866, over 5800132.23 frames. ], batch size: 71, lr: 4.40e-02, grad_scale: 16.0 2024-09-16 13:35:28,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=14.4 2024-09-16 13:35:32,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.236e+02 1.494e+02 1.712e+02 1.992e+02 4.170e+02, threshold=3.424e+02, percent-clipped=4.0 2024-09-16 13:35:36,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.87 vs. limit=9.6 2024-09-16 13:35:37,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=9200.0, ans=0.125 2024-09-16 13:35:51,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=9240.0, ans=0.5766 2024-09-16 13:35:59,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=9280.0, ans=0.008852173913043479 2024-09-16 13:36:03,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=9280.0, ans=0.008852173913043479 2024-09-16 13:36:12,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.21 vs. limit=7.32 2024-09-16 13:36:22,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten.whitening_limit, batch_count=9320.0, ans=14.49 2024-09-16 13:36:45,228 INFO [train.py:1198] (1/2) Epoch 1, batch 2350, loss[loss=0.4032, ctc_loss=0.4086, cr_loss=0.435, attn_decoder_loss=0.393, over 29678.00 frames. ], tot_loss[loss=0.3965, ctc_loss=0.4107, cr_loss=0.4345, attn_decoder_loss=0.3852, over 5804717.46 frames. ], batch size: 83, lr: 4.40e-02, grad_scale: 16.0 2024-09-16 13:36:45,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=9400.0, ans=0.008826086956521739 2024-09-16 13:36:47,717 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=14.55 2024-09-16 13:36:49,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=14.55 2024-09-16 13:36:50,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=6.40 vs. limit=7.76 2024-09-16 13:36:57,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9400.0, ans=0.20600000000000002 2024-09-16 13:37:28,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=9480.0, ans=0.125 2024-09-16 13:37:35,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.29 vs. limit=14.64 2024-09-16 13:37:46,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=9560.0, ans=0.125 2024-09-16 13:37:55,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.94 vs. limit=14.67 2024-09-16 13:37:59,475 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:38:02,853 INFO [train.py:1198] (1/2) Epoch 1, batch 2400, loss[loss=0.3491, ctc_loss=0.3398, cr_loss=0.4278, attn_decoder_loss=0.3406, over 29511.00 frames. ], tot_loss[loss=0.3949, ctc_loss=0.4067, cr_loss=0.4354, attn_decoder_loss=0.3839, over 5808501.15 frames. ], batch size: 76, lr: 4.39e-02, grad_scale: 32.0 2024-09-16 13:38:10,878 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.192e+02 1.445e+02 1.624e+02 1.930e+02 3.418e+02, threshold=3.248e+02, percent-clipped=0.0 2024-09-16 13:38:12,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=9600.0, ans=0.5640000000000001 2024-09-16 13:38:17,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=9600.0, ans=0.125 2024-09-16 13:38:20,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=9640.0, ans=0.2036 2024-09-16 13:38:28,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9640.0, ans=0.2036 2024-09-16 13:38:31,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=9640.0, ans=0.026500000000000003 2024-09-16 13:38:34,687 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=5.992e-02 2024-09-16 13:38:42,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=9680.0, ans=0.0 2024-09-16 13:38:47,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=9680.0, ans=0.07 2024-09-16 13:38:47,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=9680.0, ans=0.125 2024-09-16 13:38:51,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=9720.0, ans=0.5598000000000001 2024-09-16 13:39:05,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=9760.0, ans=0.5584 2024-09-16 13:39:07,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=9760.0, ans=0.026000000000000002 2024-09-16 13:39:14,981 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:39:17,407 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.63 vs. limit=7.4399999999999995 2024-09-16 13:39:22,242 INFO [train.py:1198] (1/2) Epoch 1, batch 2450, loss[loss=0.3912, ctc_loss=0.392, cr_loss=0.4398, attn_decoder_loss=0.3813, over 29707.00 frames. ], tot_loss[loss=0.3946, ctc_loss=0.4049, cr_loss=0.437, attn_decoder_loss=0.3838, over 5785548.26 frames. ], batch size: 82, lr: 4.39e-02, grad_scale: 16.0 2024-09-16 13:39:24,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9800.0, ans=0.202 2024-09-16 13:39:36,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=9800.0, ans=0.125 2024-09-16 13:40:05,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=9880.0, ans=0.125 2024-09-16 13:40:09,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2024-09-16 13:40:13,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=9920.0, ans=0.04949747468305833 2024-09-16 13:40:15,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=9920.0, ans=0.025 2024-09-16 13:40:19,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=9920.0, ans=0.125 2024-09-16 13:40:35,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=9960.0, ans=0.125 2024-09-16 13:40:41,273 INFO [train.py:1198] (1/2) Epoch 1, batch 2500, loss[loss=0.4077, ctc_loss=0.4112, cr_loss=0.4596, attn_decoder_loss=0.3971, over 29644.00 frames. ], tot_loss[loss=0.3918, ctc_loss=0.3993, cr_loss=0.4375, attn_decoder_loss=0.3813, over 5795354.24 frames. ], batch size: 86, lr: 4.38e-02, grad_scale: 16.0 2024-09-16 13:40:48,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.428e+02 1.613e+02 1.938e+02 4.379e+02, threshold=3.227e+02, percent-clipped=3.0 2024-09-16 13:40:49,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=10000.0, ans=0.025 2024-09-16 13:40:51,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=8.0 2024-09-16 13:41:05,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.030000000000001 2024-09-16 13:41:14,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.71 vs. limit=15.059999999999999 2024-09-16 13:41:31,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=10120.0, ans=0.5458000000000001 2024-09-16 13:41:34,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10120.0, ans=0.0 2024-09-16 13:41:49,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=11.31 2024-09-16 13:41:54,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=10160.0, ans=0.5444 2024-09-16 13:42:01,547 INFO [train.py:1198] (1/2) Epoch 1, batch 2550, loss[loss=0.342, ctc_loss=0.3328, cr_loss=0.4006, attn_decoder_loss=0.3342, over 29326.00 frames. ], tot_loss[loss=0.3897, ctc_loss=0.3952, cr_loss=0.4382, attn_decoder_loss=0.3793, over 5798349.02 frames. ], batch size: 67, lr: 4.38e-02, grad_scale: 16.0 2024-09-16 13:42:01,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10200.0, ans=0.198 2024-09-16 13:42:08,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10200.0, ans=0.198 2024-09-16 13:42:36,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.82 vs. limit=11.355 2024-09-16 13:42:49,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=10320.0, ans=0.125 2024-09-16 13:43:02,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=10360.0, ans=0.023500000000000004 2024-09-16 13:43:19,972 INFO [train.py:1198] (1/2) Epoch 1, batch 2600, loss[loss=0.3541, ctc_loss=0.337, cr_loss=0.4327, attn_decoder_loss=0.3464, over 29443.00 frames. ], tot_loss[loss=0.3893, ctc_loss=0.3933, cr_loss=0.4397, attn_decoder_loss=0.3791, over 5794721.19 frames. ], batch size: 78, lr: 4.37e-02, grad_scale: 16.0 2024-09-16 13:43:25,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=10400.0, ans=0.025 2024-09-16 13:43:29,539 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.430e+02 1.543e+02 1.954e+02 3.702e+02, threshold=3.087e+02, percent-clipped=5.0 2024-09-16 13:43:37,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=10440.0, ans=0.125 2024-09-16 13:43:37,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=10440.0, ans=0.125 2024-09-16 13:43:41,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=10440.0, ans=0.02316666666666667 2024-09-16 13:43:48,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=11.415 2024-09-16 13:43:52,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=10480.0, ans=0.19519999999999998 2024-09-16 13:43:58,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=10480.0, ans=0.09899494936611666 2024-09-16 13:44:08,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=10520.0, ans=0.022833333333333337 2024-09-16 13:44:23,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.78 vs. limit=15.42 2024-09-16 13:44:26,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=10560.0, ans=0.125 2024-09-16 13:44:26,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=10560.0, ans=0.125 2024-09-16 13:44:27,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=10560.0, ans=0.025 2024-09-16 13:44:33,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=10560.0, ans=0.125 2024-09-16 13:44:37,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=10600.0, ans=0.022500000000000003 2024-09-16 13:44:38,239 INFO [train.py:1198] (1/2) Epoch 1, batch 2650, loss[loss=0.4177, ctc_loss=0.4145, cr_loss=0.4524, attn_decoder_loss=0.408, over 29269.00 frames. ], tot_loss[loss=0.3884, ctc_loss=0.3903, cr_loss=0.4405, attn_decoder_loss=0.3784, over 5801941.26 frames. ], batch size: 100, lr: 4.37e-02, grad_scale: 16.0 2024-09-16 13:45:39,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=10760.0, ans=0.125 2024-09-16 13:45:57,792 INFO [train.py:1198] (1/2) Epoch 1, batch 2700, loss[loss=0.3899, ctc_loss=0.3733, cr_loss=0.4592, attn_decoder_loss=0.3815, over 29535.00 frames. ], tot_loss[loss=0.3867, ctc_loss=0.3866, cr_loss=0.4411, attn_decoder_loss=0.3769, over 5797882.24 frames. ], batch size: 87, lr: 4.36e-02, grad_scale: 16.0 2024-09-16 13:46:01,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=10800.0, ans=0.008521739130434783 2024-09-16 13:46:04,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=10800.0, ans=0.02166666666666667 2024-09-16 13:46:05,443 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.417e+02 1.675e+02 2.035e+02 4.386e+02, threshold=3.351e+02, percent-clipped=4.0 2024-09-16 13:46:11,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10840.0, ans=0.0 2024-09-16 13:46:11,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=10840.0, ans=0.125 2024-09-16 13:46:18,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10840.0, ans=0.1916 2024-09-16 13:46:24,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=10840.0, ans=0.125 2024-09-16 13:46:33,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=10880.0, ans=0.125 2024-09-16 13:47:17,505 INFO [train.py:1198] (1/2) Epoch 1, batch 2750, loss[loss=0.3595, ctc_loss=0.355, cr_loss=0.4235, attn_decoder_loss=0.3506, over 29525.00 frames. ], tot_loss[loss=0.3837, ctc_loss=0.3819, cr_loss=0.44, attn_decoder_loss=0.3741, over 5796808.42 frames. ], batch size: 75, lr: 4.36e-02, grad_scale: 16.0 2024-09-16 13:47:39,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=11040.0, ans=0.125 2024-09-16 13:47:42,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=11040.0, ans=0.5136000000000001 2024-09-16 13:47:43,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=11040.0, ans=0.5136000000000001 2024-09-16 13:47:48,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=11080.0, ans=0.008460869565217391 2024-09-16 13:48:00,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=11.655000000000001 2024-09-16 13:48:25,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11160.0, ans=0.18839999999999998 2024-09-16 13:48:25,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=11160.0, ans=0.125 2024-09-16 13:48:32,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=11160.0, ans=0.0 2024-09-16 13:48:35,668 INFO [train.py:1198] (1/2) Epoch 1, batch 2800, loss[loss=0.4385, ctc_loss=0.4721, cr_loss=0.4677, attn_decoder_loss=0.4244, over 20196.00 frames. ], tot_loss[loss=0.3825, ctc_loss=0.3795, cr_loss=0.4404, attn_decoder_loss=0.373, over 5778641.68 frames. ], batch size: 209, lr: 4.36e-02, grad_scale: 32.0 2024-09-16 13:48:38,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=11200.0, ans=0.125 2024-09-16 13:48:43,100 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.389e+02 1.617e+02 2.129e+02 5.220e+02, threshold=3.235e+02, percent-clipped=5.0 2024-09-16 13:48:43,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=11200.0, ans=0.008434782608695653 2024-09-16 13:48:58,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=11240.0, ans=0.125 2024-09-16 13:49:14,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=11280.0, ans=10.0 2024-09-16 13:49:32,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11320.0, ans=0.1868 2024-09-16 13:49:54,509 INFO [train.py:1198] (1/2) Epoch 1, batch 2850, loss[loss=0.3667, ctc_loss=0.3613, cr_loss=0.4395, attn_decoder_loss=0.3575, over 29484.00 frames. ], tot_loss[loss=0.382, ctc_loss=0.3782, cr_loss=0.4413, attn_decoder_loss=0.3726, over 5764924.36 frames. ], batch size: 77, lr: 4.35e-02, grad_scale: 32.0 2024-09-16 13:49:59,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11400.0, ans=0.186 2024-09-16 13:50:10,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=11440.0, ans=0.019000000000000003 2024-09-16 13:50:14,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=11440.0, ans=0.008382608695652174 2024-09-16 13:50:22,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=11440.0, ans=0.019000000000000003 2024-09-16 13:51:13,789 INFO [train.py:1198] (1/2) Epoch 1, batch 2900, loss[loss=0.369, ctc_loss=0.3457, cr_loss=0.4496, attn_decoder_loss=0.3616, over 29429.00 frames. ], tot_loss[loss=0.3816, ctc_loss=0.376, cr_loss=0.4425, attn_decoder_loss=0.3724, over 5789534.44 frames. ], batch size: 79, lr: 4.35e-02, grad_scale: 16.0 2024-09-16 13:51:22,929 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.352e+02 1.492e+02 1.728e+02 4.022e+02, threshold=2.985e+02, percent-clipped=1.0 2024-09-16 13:51:42,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=11640.0, ans=0.125 2024-09-16 13:51:47,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=11.879999999999999 2024-09-16 13:52:01,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=11720.0, ans=0.09899494936611666 2024-09-16 13:52:18,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.72 vs. limit=7.9399999999999995 2024-09-16 13:52:22,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=8.704 2024-09-16 13:52:26,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11760.0, ans=0.1824 2024-09-16 13:52:30,701 INFO [train.py:1198] (1/2) Epoch 1, batch 2950, loss[loss=0.343, ctc_loss=0.3164, cr_loss=0.4204, attn_decoder_loss=0.3366, over 29498.00 frames. ], tot_loss[loss=0.3782, ctc_loss=0.3709, cr_loss=0.4405, attn_decoder_loss=0.3692, over 5783701.76 frames. ], batch size: 75, lr: 4.34e-02, grad_scale: 16.0 2024-09-16 13:52:51,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=11840.0, ans=0.125 2024-09-16 13:52:52,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=11840.0, ans=0.125 2024-09-16 13:52:58,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=11840.0, ans=0.125 2024-09-16 13:53:01,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=8.751999999999999 2024-09-16 13:53:05,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=11880.0, ans=0.125 2024-09-16 13:53:05,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.12 vs. limit=16.41 2024-09-16 13:53:11,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.70 vs. limit=16.41 2024-09-16 13:53:29,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=11920.0, ans=0.125 2024-09-16 13:53:32,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=11960.0, ans=0.8695999999999999 2024-09-16 13:53:50,808 INFO [train.py:1198] (1/2) Epoch 1, batch 3000, loss[loss=0.393, ctc_loss=0.3849, cr_loss=0.4603, attn_decoder_loss=0.3837, over 29752.00 frames. ], tot_loss[loss=0.3772, ctc_loss=0.3689, cr_loss=0.441, attn_decoder_loss=0.3684, over 5784310.75 frames. ], batch size: 81, lr: 4.34e-02, grad_scale: 16.0 2024-09-16 13:53:50,808 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 13:54:09,119 INFO [train.py:1230] (1/2) Epoch 1, validation: loss=0.2655, ctc_loss=0.1548, cr_loss=4.113e-15, attn_decoder_loss=0.2778, over 944034.00 frames. 2024-09-16 13:54:09,119 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 13:54:12,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=12000.0, ans=0.125 2024-09-16 13:54:18,439 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.470e+02 1.654e+02 2.017e+02 3.240e+02, threshold=3.308e+02, percent-clipped=3.0 2024-09-16 13:54:36,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=4.806 2024-09-16 13:54:39,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.78 vs. limit=16.560000000000002 2024-09-16 13:54:48,706 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:54:58,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=12120.0, ans=0.008234782608695652 2024-09-16 13:55:04,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=12120.0, ans=0.008234782608695652 2024-09-16 13:55:28,452 INFO [train.py:1198] (1/2) Epoch 1, batch 3050, loss[loss=0.3466, ctc_loss=0.3257, cr_loss=0.4404, attn_decoder_loss=0.3391, over 29527.00 frames. ], tot_loss[loss=0.3772, ctc_loss=0.3678, cr_loss=0.442, attn_decoder_loss=0.3684, over 5777895.37 frames. ], batch size: 76, lr: 4.33e-02, grad_scale: 16.0 2024-09-16 13:55:38,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.73 vs. limit=11.1 2024-09-16 13:55:42,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=12240.0, ans=0.125 2024-09-16 13:56:31,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=12.135 2024-09-16 13:56:45,427 INFO [train.py:1198] (1/2) Epoch 1, batch 3100, loss[loss=0.3795, ctc_loss=0.362, cr_loss=0.4763, attn_decoder_loss=0.3708, over 29202.00 frames. ], tot_loss[loss=0.3753, ctc_loss=0.3645, cr_loss=0.4422, attn_decoder_loss=0.3667, over 5780076.59 frames. ], batch size: 100, lr: 4.33e-02, grad_scale: 16.0 2024-09-16 13:56:54,617 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.036e+02 1.315e+02 1.501e+02 1.811e+02 4.491e+02, threshold=3.002e+02, percent-clipped=4.0 2024-09-16 13:56:56,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=12400.0, ans=12.15 2024-09-16 13:57:04,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.23 vs. limit=8.11 2024-09-16 13:57:05,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=12440.0, ans=0.3866 2024-09-16 13:57:42,060 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:57:43,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=12520.0, ans=0.46180000000000004 2024-09-16 13:58:04,871 INFO [train.py:1198] (1/2) Epoch 1, batch 3150, loss[loss=0.394, ctc_loss=0.3867, cr_loss=0.4438, attn_decoder_loss=0.3849, over 28814.00 frames. ], tot_loss[loss=0.3741, ctc_loss=0.3622, cr_loss=0.4422, attn_decoder_loss=0.3655, over 5786374.93 frames. ], batch size: 104, lr: 4.32e-02, grad_scale: 16.0 2024-09-16 13:58:23,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=12640.0, ans=0.125 2024-09-16 13:58:28,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=12640.0, ans=0.17359999999999998 2024-09-16 13:58:38,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.73 vs. limit=17.009999999999998 2024-09-16 13:58:39,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=12680.0, ans=0.125 2024-09-16 13:58:44,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=12680.0, ans=0.013833333333333336 2024-09-16 13:58:45,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=12680.0, ans=0.013833333333333336 2024-09-16 13:58:52,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=12.27 2024-09-16 13:58:53,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=12720.0, ans=0.0 2024-09-16 13:58:54,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=12720.0, ans=0.125 2024-09-16 13:59:10,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12760.0, ans=0.125 2024-09-16 13:59:14,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=12760.0, ans=0.125 2024-09-16 13:59:24,638 INFO [train.py:1198] (1/2) Epoch 1, batch 3200, loss[loss=0.3606, ctc_loss=0.3317, cr_loss=0.4158, attn_decoder_loss=0.3545, over 29402.00 frames. ], tot_loss[loss=0.3723, ctc_loss=0.3594, cr_loss=0.4417, attn_decoder_loss=0.3639, over 5796361.84 frames. ], batch size: 79, lr: 4.32e-02, grad_scale: 32.0 2024-09-16 13:59:33,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.352e+02 1.572e+02 1.941e+02 4.814e+02, threshold=3.143e+02, percent-clipped=7.0 2024-09-16 13:59:46,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=12840.0, ans=0.125 2024-09-16 14:00:19,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=12920.0, ans=0.125 2024-09-16 14:00:30,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12960.0, ans=0.125 2024-09-16 14:00:42,056 INFO [train.py:1198] (1/2) Epoch 1, batch 3250, loss[loss=0.3831, ctc_loss=0.3547, cr_loss=0.4679, attn_decoder_loss=0.3758, over 29701.00 frames. ], tot_loss[loss=0.3707, ctc_loss=0.3561, cr_loss=0.4417, attn_decoder_loss=0.3625, over 5802480.60 frames. ], batch size: 84, lr: 4.31e-02, grad_scale: 32.0 2024-09-16 14:01:40,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=13120.0, ans=0.125 2024-09-16 14:01:51,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=4.974 2024-09-16 14:01:55,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=13160.0, ans=0.125 2024-09-16 14:02:00,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=13200.0, ans=0.882 2024-09-16 14:02:00,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=13200.0, ans=0.125 2024-09-16 14:02:01,645 INFO [train.py:1198] (1/2) Epoch 1, batch 3300, loss[loss=0.3929, ctc_loss=0.3844, cr_loss=0.4515, attn_decoder_loss=0.3838, over 28266.00 frames. ], tot_loss[loss=0.3691, ctc_loss=0.3544, cr_loss=0.4409, attn_decoder_loss=0.361, over 5798925.91 frames. ], batch size: 111, lr: 4.31e-02, grad_scale: 16.0 2024-09-16 14:02:02,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.74 vs. limit=17.4 2024-09-16 14:02:10,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.63 vs. limit=17.4 2024-09-16 14:02:12,356 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.388e+02 1.553e+02 1.864e+02 4.414e+02, threshold=3.106e+02, percent-clipped=4.0 2024-09-16 14:02:23,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=13240.0, ans=0.007991304347826087 2024-09-16 14:02:42,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=13280.0, ans=0.025 2024-09-16 14:02:44,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=13280.0, ans=0.125 2024-09-16 14:03:20,276 INFO [train.py:1198] (1/2) Epoch 1, batch 3350, loss[loss=0.3992, ctc_loss=0.3782, cr_loss=0.4881, attn_decoder_loss=0.3907, over 28805.00 frames. ], tot_loss[loss=0.3703, ctc_loss=0.3552, cr_loss=0.4418, attn_decoder_loss=0.3622, over 5774653.90 frames. ], batch size: 104, lr: 4.30e-02, grad_scale: 16.0 2024-09-16 14:03:28,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=13400.0, ans=0.125 2024-09-16 14:03:36,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=13440.0, ans=0.0 2024-09-16 14:04:38,391 INFO [train.py:1198] (1/2) Epoch 1, batch 3400, loss[loss=0.3301, ctc_loss=0.3145, cr_loss=0.4295, attn_decoder_loss=0.3223, over 29342.00 frames. ], tot_loss[loss=0.3687, ctc_loss=0.3526, cr_loss=0.4416, attn_decoder_loss=0.3606, over 5768301.56 frames. ], batch size: 67, lr: 4.29e-02, grad_scale: 16.0 2024-09-16 14:04:48,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=17.7 2024-09-16 14:04:49,183 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.397e+02 1.601e+02 1.904e+02 5.092e+02, threshold=3.203e+02, percent-clipped=2.0 2024-09-16 14:04:57,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=13640.0, ans=0.1136 2024-09-16 14:04:58,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=13640.0, ans=0.125 2024-09-16 14:05:01,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=13640.0, ans=0.09899494936611666 2024-09-16 14:05:04,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=17.73 2024-09-16 14:05:15,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=13680.0, ans=0.125 2024-09-16 14:05:57,559 INFO [train.py:1198] (1/2) Epoch 1, batch 3450, loss[loss=0.3896, ctc_loss=0.3775, cr_loss=0.4516, attn_decoder_loss=0.3809, over 28336.00 frames. ], tot_loss[loss=0.3684, ctc_loss=0.3511, cr_loss=0.4421, attn_decoder_loss=0.3605, over 5776272.13 frames. ], batch size: 111, lr: 4.29e-02, grad_scale: 16.0 2024-09-16 14:06:08,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=13800.0, ans=0.125 2024-09-16 14:06:17,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.27 vs. limit=17.880000000000003 2024-09-16 14:06:27,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=13840.0, ans=0.125 2024-09-16 14:06:41,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=13880.0, ans=0.125 2024-09-16 14:06:44,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=13920.0, ans=0.09899494936611666 2024-09-16 14:07:03,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=13960.0, ans=0.025 2024-09-16 14:07:15,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=14000.0, ans=0.125 2024-09-16 14:07:16,933 INFO [train.py:1198] (1/2) Epoch 1, batch 3500, loss[loss=0.3277, ctc_loss=0.3018, cr_loss=0.4183, attn_decoder_loss=0.3213, over 29336.00 frames. ], tot_loss[loss=0.3666, ctc_loss=0.3482, cr_loss=0.4412, attn_decoder_loss=0.3589, over 5778045.95 frames. ], batch size: 71, lr: 4.28e-02, grad_scale: 16.0 2024-09-16 14:07:27,728 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.349e+02 1.530e+02 1.819e+02 5.462e+02, threshold=3.060e+02, percent-clipped=1.0 2024-09-16 14:07:47,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=14080.0, ans=0.008 2024-09-16 14:08:01,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=14120.0, ans=0.125 2024-09-16 14:08:27,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=14160.0, ans=0.1584 2024-09-16 14:08:32,825 INFO [train.py:1198] (1/2) Epoch 1, batch 3550, loss[loss=0.3757, ctc_loss=0.3545, cr_loss=0.4522, attn_decoder_loss=0.368, over 29746.00 frames. ], tot_loss[loss=0.3654, ctc_loss=0.346, cr_loss=0.4412, attn_decoder_loss=0.3578, over 5784597.27 frames. ], batch size: 89, lr: 4.28e-02, grad_scale: 16.0 2024-09-16 14:08:33,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14200.0, ans=0.125 2024-09-16 14:08:42,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=14200.0, ans=0.125 2024-09-16 14:08:48,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=14240.0, ans=0.125 2024-09-16 14:08:54,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=14240.0, ans=0.007773913043478261 2024-09-16 14:08:54,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=14240.0, ans=0.0 2024-09-16 14:08:54,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=14240.0, ans=0.0073333333333333375 2024-09-16 14:08:55,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=14240.0, ans=0.05 2024-09-16 14:09:02,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.77 vs. limit=18.21 2024-09-16 14:09:16,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=14320.0, ans=0.125 2024-09-16 14:09:49,088 INFO [train.py:1198] (1/2) Epoch 1, batch 3600, loss[loss=0.3551, ctc_loss=0.3425, cr_loss=0.4266, attn_decoder_loss=0.347, over 29529.00 frames. ], tot_loss[loss=0.3653, ctc_loss=0.3454, cr_loss=0.4416, attn_decoder_loss=0.3577, over 5793546.64 frames. ], batch size: 77, lr: 4.27e-02, grad_scale: 32.0 2024-09-16 14:09:52,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=14400.0, ans=0.125 2024-09-16 14:09:59,802 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.344e+02 1.491e+02 1.790e+02 3.419e+02, threshold=2.982e+02, percent-clipped=2.0 2024-09-16 14:10:07,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14440.0, ans=0.125 2024-09-16 14:10:43,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=14520.0, ans=0.39180000000000004 2024-09-16 14:10:45,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=14520.0, ans=0.39180000000000004 2024-09-16 14:10:56,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=12.96 2024-09-16 14:11:06,524 INFO [train.py:1198] (1/2) Epoch 1, batch 3650, loss[loss=0.3773, ctc_loss=0.3472, cr_loss=0.4695, attn_decoder_loss=0.3702, over 29531.00 frames. ], tot_loss[loss=0.3634, ctc_loss=0.3425, cr_loss=0.4403, attn_decoder_loss=0.3559, over 5794741.81 frames. ], batch size: 90, lr: 4.27e-02, grad_scale: 32.0 2024-09-16 14:11:07,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.73 vs. limit=12.975 2024-09-16 14:11:16,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=14600.0, ans=0.0076956521739130436 2024-09-16 14:11:20,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=14640.0, ans=0.007686956521739131 2024-09-16 14:11:34,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=14640.0, ans=0.38760000000000006 2024-09-16 14:11:48,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=13.004999999999999 2024-09-16 14:12:16,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14760.0, ans=0.1524 2024-09-16 14:12:22,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=14800.0, ans=0.125 2024-09-16 14:12:24,269 INFO [train.py:1198] (1/2) Epoch 1, batch 3700, loss[loss=0.3527, ctc_loss=0.315, cr_loss=0.4384, attn_decoder_loss=0.3471, over 29709.00 frames. ], tot_loss[loss=0.3628, ctc_loss=0.3407, cr_loss=0.4412, attn_decoder_loss=0.3554, over 5804682.75 frames. ], batch size: 84, lr: 4.26e-02, grad_scale: 32.0 2024-09-16 14:12:30,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=14800.0, ans=0.125 2024-09-16 14:12:34,993 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.317e+02 1.543e+02 1.858e+02 5.259e+02, threshold=3.086e+02, percent-clipped=2.0 2024-09-16 14:12:36,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=14800.0, ans=0.15200000000000002 2024-09-16 14:12:41,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=14840.0, ans=0.125 2024-09-16 14:12:48,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=14840.0, ans=0.125 2024-09-16 14:12:50,314 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:13:02,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=14880.0, ans=0.125 2024-09-16 14:13:06,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=14880.0, ans=0.125 2024-09-16 14:13:17,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=14920.0, ans=0.004500000000000004 2024-09-16 14:13:20,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=14920.0, ans=0.0 2024-09-16 14:13:34,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=13.11 2024-09-16 14:13:40,151 INFO [train.py:1198] (1/2) Epoch 1, batch 3750, loss[loss=0.3246, ctc_loss=0.3043, cr_loss=0.3873, attn_decoder_loss=0.3182, over 29314.00 frames. ], tot_loss[loss=0.3618, ctc_loss=0.339, cr_loss=0.441, attn_decoder_loss=0.3546, over 5807432.80 frames. ], batch size: 67, lr: 4.26e-02, grad_scale: 32.0 2024-09-16 14:13:55,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=15040.0, ans=0.125 2024-09-16 14:14:06,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=15040.0, ans=0.0040000000000000036 2024-09-16 14:14:17,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.26 vs. limit=18.810000000000002 2024-09-16 14:14:21,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=15080.0, ans=0.025 2024-09-16 14:14:29,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=13.17 2024-09-16 14:14:30,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=15120.0, ans=0.125 2024-09-16 14:14:35,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=15120.0, ans=0.05 2024-09-16 14:14:40,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=21.61 vs. limit=13.184999999999999 2024-09-16 14:14:45,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=15160.0, ans=0.125 2024-09-16 14:14:54,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=15200.0, ans=0.125 2024-09-16 14:14:56,686 INFO [train.py:1198] (1/2) Epoch 1, batch 3800, loss[loss=0.3596, ctc_loss=0.3236, cr_loss=0.4534, attn_decoder_loss=0.3535, over 29621.00 frames. ], tot_loss[loss=0.3607, ctc_loss=0.3374, cr_loss=0.4406, attn_decoder_loss=0.3536, over 5798637.30 frames. ], batch size: 86, lr: 4.25e-02, grad_scale: 32.0 2024-09-16 14:15:07,356 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.372e+02 1.609e+02 1.860e+02 5.053e+02, threshold=3.218e+02, percent-clipped=1.0 2024-09-16 14:15:12,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=15240.0, ans=0.003166666666666672 2024-09-16 14:15:30,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=15280.0, ans=0.125 2024-09-16 14:15:50,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=13.245000000000001 2024-09-16 14:16:12,331 INFO [train.py:1198] (1/2) Epoch 1, batch 3850, loss[loss=0.3911, ctc_loss=0.3696, cr_loss=0.4564, attn_decoder_loss=0.3834, over 29305.00 frames. ], tot_loss[loss=0.3601, ctc_loss=0.3358, cr_loss=0.4405, attn_decoder_loss=0.3531, over 5812588.15 frames. ], batch size: 100, lr: 4.24e-02, grad_scale: 32.0 2024-09-16 14:16:29,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=15440.0, ans=0.1456 2024-09-16 14:16:31,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.22 vs. limit=19.08 2024-09-16 14:16:39,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=15440.0, ans=0.125 2024-09-16 14:16:58,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=15520.0, ans=0.3568 2024-09-16 14:17:15,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=13.335 2024-09-16 14:17:22,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=15560.0, ans=0.125 2024-09-16 14:17:26,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=13.335 2024-09-16 14:17:30,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=15600.0, ans=0.354 2024-09-16 14:17:31,258 INFO [train.py:1198] (1/2) Epoch 1, batch 3900, loss[loss=0.3826, ctc_loss=0.3477, cr_loss=0.4768, attn_decoder_loss=0.3759, over 29624.00 frames. ], tot_loss[loss=0.36, ctc_loss=0.3347, cr_loss=0.4404, attn_decoder_loss=0.3531, over 5816519.14 frames. ], batch size: 86, lr: 4.24e-02, grad_scale: 32.0 2024-09-16 14:17:33,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.37 vs. limit=13.35 2024-09-16 14:17:41,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.343e+02 1.512e+02 1.794e+02 6.576e+02, threshold=3.024e+02, percent-clipped=3.0 2024-09-16 14:17:47,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.75 vs. limit=19.23 2024-09-16 14:18:13,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=15680.0, ans=0.125 2024-09-16 14:18:16,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=15720.0, ans=0.125 2024-09-16 14:18:19,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=15720.0, ans=0.00116666666666667 2024-09-16 14:18:28,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=15720.0, ans=0.125 2024-09-16 14:18:30,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.39 vs. limit=13.41 2024-09-16 14:18:31,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=10.304 2024-09-16 14:18:33,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=15760.0, ans=0.125 2024-09-16 14:18:39,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=15760.0, ans=0.125 2024-09-16 14:18:42,121 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:18:46,496 INFO [train.py:1198] (1/2) Epoch 1, batch 3950, loss[loss=0.3807, ctc_loss=0.3475, cr_loss=0.472, attn_decoder_loss=0.3739, over 29526.00 frames. ], tot_loss[loss=0.3592, ctc_loss=0.3331, cr_loss=0.4403, attn_decoder_loss=0.3523, over 5836065.60 frames. ], batch size: 97, lr: 4.23e-02, grad_scale: 32.0 2024-09-16 14:18:46,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=15800.0, ans=0.125 2024-09-16 14:19:27,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=13.455 2024-09-16 14:19:45,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=15960.0, ans=0.125 2024-09-16 14:19:52,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15960.0, ans=0.14040000000000002 2024-09-16 14:20:09,358 INFO [train.py:1198] (1/2) Epoch 1, batch 4000, loss[loss=0.3437, ctc_loss=0.3135, cr_loss=0.4249, attn_decoder_loss=0.3376, over 29517.00 frames. ], tot_loss[loss=0.3592, ctc_loss=0.333, cr_loss=0.4411, attn_decoder_loss=0.3523, over 5814234.21 frames. ], batch size: 74, lr: 4.23e-02, grad_scale: 32.0 2024-09-16 14:20:19,714 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.414e+02 1.598e+02 1.942e+02 7.205e+02, threshold=3.195e+02, percent-clipped=1.0 2024-09-16 14:20:55,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=16120.0, ans=0.125 2024-09-16 14:20:57,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=16120.0, ans=0.025 2024-09-16 14:21:05,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=19.59 2024-09-16 14:21:24,263 INFO [train.py:1198] (1/2) Epoch 1, batch 4050, loss[loss=0.4181, ctc_loss=0.428, cr_loss=0.4519, attn_decoder_loss=0.4069, over 20715.00 frames. ], tot_loss[loss=0.3588, ctc_loss=0.332, cr_loss=0.4398, attn_decoder_loss=0.352, over 5798691.78 frames. ], batch size: 209, lr: 4.22e-02, grad_scale: 32.0 2024-09-16 14:21:28,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=10.48 2024-09-16 14:22:20,093 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:22:27,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=16360.0, ans=13.18 2024-09-16 14:22:33,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=16360.0, ans=0.125 2024-09-16 14:22:40,685 INFO [train.py:1198] (1/2) Epoch 1, batch 4100, loss[loss=0.3559, ctc_loss=0.3231, cr_loss=0.4021, attn_decoder_loss=0.3506, over 29494.00 frames. ], tot_loss[loss=0.3582, ctc_loss=0.3311, cr_loss=0.4396, attn_decoder_loss=0.3514, over 5794389.84 frames. ], batch size: 90, lr: 4.22e-02, grad_scale: 32.0 2024-09-16 14:22:51,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.366e+02 1.525e+02 1.800e+02 4.946e+02, threshold=3.051e+02, percent-clipped=3.0 2024-09-16 14:22:51,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.64 vs. limit=13.2 2024-09-16 14:23:00,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=16440.0, ans=0.0 2024-09-16 14:23:00,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=19.83 2024-09-16 14:23:06,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.50 vs. limit=19.83 2024-09-16 14:23:23,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.87 vs. limit=19.86 2024-09-16 14:23:24,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.88 vs. limit=19.89 2024-09-16 14:23:26,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=16520.0, ans=0.125 2024-09-16 14:23:37,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=16520.0, ans=0.0072782608695652175 2024-09-16 14:23:45,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.90 vs. limit=9.14 2024-09-16 14:23:49,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=16560.0, ans=0.125 2024-09-16 14:23:54,724 INFO [train.py:1198] (1/2) Epoch 1, batch 4150, loss[loss=0.3506, ctc_loss=0.3328, cr_loss=0.4268, attn_decoder_loss=0.3431, over 29506.00 frames. ], tot_loss[loss=0.3569, ctc_loss=0.3292, cr_loss=0.4392, attn_decoder_loss=0.3502, over 5799535.60 frames. ], batch size: 77, lr: 4.21e-02, grad_scale: 32.0 2024-09-16 14:24:04,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16600.0, ans=0.134 2024-09-16 14:24:21,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=16640.0, ans=0.125 2024-09-16 14:24:45,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.61 vs. limit=20.04 2024-09-16 14:24:46,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=16720.0, ans=0.125 2024-09-16 14:24:51,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=20.04 2024-09-16 14:24:56,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.22 vs. limit=20.07 2024-09-16 14:25:09,104 INFO [train.py:1198] (1/2) Epoch 1, batch 4200, loss[loss=0.3863, ctc_loss=0.374, cr_loss=0.4513, attn_decoder_loss=0.3776, over 29514.00 frames. ], tot_loss[loss=0.3566, ctc_loss=0.3286, cr_loss=0.4392, attn_decoder_loss=0.3499, over 5801619.86 frames. ], batch size: 90, lr: 4.20e-02, grad_scale: 32.0 2024-09-16 14:25:19,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.040e+02 1.356e+02 1.563e+02 1.936e+02 3.144e+02, threshold=3.127e+02, percent-clipped=1.0 2024-09-16 14:25:22,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=33.57 vs. limit=13.8 2024-09-16 14:25:29,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=13.815000000000001 2024-09-16 14:26:13,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=16960.0, ans=0.125 2024-09-16 14:26:23,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=13.86 2024-09-16 14:26:25,320 INFO [train.py:1198] (1/2) Epoch 1, batch 4250, loss[loss=0.3238, ctc_loss=0.2851, cr_loss=0.4299, attn_decoder_loss=0.3185, over 29500.00 frames. ], tot_loss[loss=0.3561, ctc_loss=0.3273, cr_loss=0.4403, attn_decoder_loss=0.3495, over 5806977.32 frames. ], batch size: 74, lr: 4.20e-02, grad_scale: 32.0 2024-09-16 14:26:27,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.91 vs. limit=13.875 2024-09-16 14:26:35,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=17000.0, ans=0.0071739130434782614 2024-09-16 14:26:49,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=17040.0, ans=0.0 2024-09-16 14:26:51,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.59 vs. limit=9.26 2024-09-16 14:27:07,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.76 vs. limit=20.310000000000002 2024-09-16 14:27:18,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.69 vs. limit=13.92 2024-09-16 14:27:29,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=17160.0, ans=0.125 2024-09-16 14:27:34,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=13.934999999999999 2024-09-16 14:27:39,369 INFO [train.py:1198] (1/2) Epoch 1, batch 4300, loss[loss=0.361, ctc_loss=0.3217, cr_loss=0.3916, attn_decoder_loss=0.3567, over 29529.00 frames. ], tot_loss[loss=0.3567, ctc_loss=0.3278, cr_loss=0.441, attn_decoder_loss=0.3501, over 5794749.32 frames. ], batch size: 87, lr: 4.19e-02, grad_scale: 32.0 2024-09-16 14:27:49,860 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.364e+02 1.537e+02 1.919e+02 5.209e+02, threshold=3.074e+02, percent-clipped=5.0 2024-09-16 14:27:53,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=17240.0, ans=0.125 2024-09-16 14:28:15,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=17280.0, ans=0.125 2024-09-16 14:28:16,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=17280.0, ans=0.025 2024-09-16 14:28:18,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17280.0, ans=0.0 2024-09-16 14:28:36,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=17320.0, ans=0.007104347826086957 2024-09-16 14:28:43,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.80 vs. limit=9.34 2024-09-16 14:28:53,573 INFO [train.py:1198] (1/2) Epoch 1, batch 4350, loss[loss=0.3681, ctc_loss=0.3322, cr_loss=0.4877, attn_decoder_loss=0.3612, over 29472.00 frames. ], tot_loss[loss=0.3604, ctc_loss=0.3311, cr_loss=0.4455, attn_decoder_loss=0.3537, over 5796064.83 frames. ], batch size: 97, lr: 4.19e-02, grad_scale: 32.0 2024-09-16 14:29:18,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=14.04 2024-09-16 14:29:24,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=10.992 2024-09-16 14:29:24,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.30 vs. limit=14.055 2024-09-16 14:29:29,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=17480.0, ans=0.125 2024-09-16 14:30:06,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=17560.0, ans=0.007052173913043479 2024-09-16 14:30:09,474 INFO [train.py:1198] (1/2) Epoch 1, batch 4400, loss[loss=0.3785, ctc_loss=0.3576, cr_loss=0.459, attn_decoder_loss=0.3706, over 27634.00 frames. ], tot_loss[loss=0.3634, ctc_loss=0.3345, cr_loss=0.448, attn_decoder_loss=0.3566, over 5767336.86 frames. ], batch size: 125, lr: 4.18e-02, grad_scale: 32.0 2024-09-16 14:30:17,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=19.54 vs. limit=14.1 2024-09-16 14:30:19,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.315e+02 1.467e+02 1.766e+02 6.671e+02, threshold=2.933e+02, percent-clipped=1.0 2024-09-16 14:30:21,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17600.0, ans=0.125 2024-09-16 14:30:22,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=20.7 2024-09-16 14:30:43,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17680.0, ans=0.1232 2024-09-16 14:30:54,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.03 vs. limit=20.79 2024-09-16 14:31:02,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=17720.0, ans=0.0 2024-09-16 14:31:24,219 INFO [train.py:1198] (1/2) Epoch 1, batch 4450, loss[loss=0.3996, ctc_loss=0.4028, cr_loss=0.4591, attn_decoder_loss=0.3891, over 20129.00 frames. ], tot_loss[loss=0.3671, ctc_loss=0.3411, cr_loss=0.4496, attn_decoder_loss=0.36, over 5571434.34 frames. ], batch size: 211, lr: 4.17e-02, grad_scale: 32.0 2024-09-16 14:31:51,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=17840.0, ans=0.27560000000000007 2024-09-16 14:31:54,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=17880.0, ans=0.125 2024-09-16 14:31:57,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=17880.0, ans=0.0 2024-09-16 14:32:21,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=17920.0, ans=0.9291999999999999 2024-09-16 14:32:40,074 INFO [train.py:1198] (1/2) Epoch 1, batch 4500, loss[loss=0.4013, ctc_loss=0.4238, cr_loss=0.4781, attn_decoder_loss=0.3882, over 19448.00 frames. ], tot_loss[loss=0.3719, ctc_loss=0.351, cr_loss=0.4478, attn_decoder_loss=0.3642, over 5230953.38 frames. ], batch size: 209, lr: 4.17e-02, grad_scale: 32.0 2024-09-16 14:32:50,391 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.043e+02 1.290e+02 1.458e+02 1.671e+02 6.229e+02, threshold=2.915e+02, percent-clipped=1.0 2024-09-16 14:33:08,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=18080.0, ans=0.125 2024-09-16 14:33:14,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=18080.0, ans=0.006939130434782609 2024-09-16 14:34:13,795 INFO [train.py:1198] (1/2) Epoch 2, batch 0, loss[loss=0.4708, ctc_loss=0.2886, cr_loss=0.4265, attn_decoder_loss=0.4815, over 29594.00 frames. ], tot_loss[loss=0.4708, ctc_loss=0.2886, cr_loss=0.4265, attn_decoder_loss=0.4815, over 29594.00 frames. ], batch size: 73, lr: 4.08e-02, grad_scale: 32.0 2024-09-16 14:34:13,795 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 14:34:17,158 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.5251, 2.5152, 2.5979, 2.5003, 2.3752, 2.2946, 1.8668, 2.5723], device='cuda:1') 2024-09-16 14:34:32,033 INFO [train.py:1230] (1/2) Epoch 2, validation: loss=0.3071, ctc_loss=0.1367, cr_loss=4.721e-15, attn_decoder_loss=0.326, over 944034.00 frames. 2024-09-16 14:34:32,034 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 14:34:52,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=18140.0, ans=0.125 2024-09-16 14:35:04,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=18180.0, ans=0.0 2024-09-16 14:35:14,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=18180.0, ans=0.025 2024-09-16 14:35:23,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=18220.0, ans=0.0 2024-09-16 14:35:36,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=18260.0, ans=0.125 2024-09-16 14:35:37,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=18260.0, ans=0.125 2024-09-16 14:35:48,040 INFO [train.py:1198] (1/2) Epoch 2, batch 50, loss[loss=0.3224, ctc_loss=0.2918, cr_loss=0.3775, attn_decoder_loss=0.3174, over 29433.00 frames. ], tot_loss[loss=0.3767, ctc_loss=0.3397, cr_loss=0.4457, attn_decoder_loss=0.3709, over 1270305.95 frames. ], batch size: 70, lr: 4.08e-02, grad_scale: 16.0 2024-09-16 14:35:59,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=18300.0, ans=0.125 2024-09-16 14:36:10,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.81 vs. limit=9.585 2024-09-16 14:36:11,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=18340.0, ans=0.125 2024-09-16 14:36:12,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18340.0, ans=0.11660000000000001 2024-09-16 14:36:42,199 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.396e+02 1.768e+02 2.293e+02 2.873e+03, threshold=3.536e+02, percent-clipped=13.0 2024-09-16 14:36:48,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=18420.0, ans=0.006865217391304348 2024-09-16 14:37:06,514 INFO [train.py:1198] (1/2) Epoch 2, batch 100, loss[loss=0.3445, ctc_loss=0.314, cr_loss=0.4464, attn_decoder_loss=0.3379, over 29528.00 frames. ], tot_loss[loss=0.3678, ctc_loss=0.3345, cr_loss=0.4466, attn_decoder_loss=0.3616, over 2255162.46 frames. ], batch size: 76, lr: 4.07e-02, grad_scale: 16.0 2024-09-16 14:37:08,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=18500.0, ans=0.006847826086956521 2024-09-16 14:37:11,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=18500.0, ans=0.125 2024-09-16 14:37:17,459 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:37:34,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=18540.0, ans=0.125 2024-09-16 14:37:37,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=18580.0, ans=0.0 2024-09-16 14:37:38,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=18580.0, ans=0.125 2024-09-16 14:38:20,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=18660.0, ans=0.125 2024-09-16 14:38:24,322 INFO [train.py:1198] (1/2) Epoch 2, batch 150, loss[loss=0.3233, ctc_loss=0.292, cr_loss=0.3967, attn_decoder_loss=0.3179, over 29458.00 frames. ], tot_loss[loss=0.3605, ctc_loss=0.3274, cr_loss=0.4437, attn_decoder_loss=0.3543, over 3050116.44 frames. ], batch size: 70, lr: 4.06e-02, grad_scale: 16.0 2024-09-16 14:38:38,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=18740.0, ans=0.125 2024-09-16 14:39:07,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=26.14 vs. limit=21.585 2024-09-16 14:39:14,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=18820.0, ans=0.0 2024-09-16 14:39:15,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.312e+02 1.456e+02 1.615e+02 4.569e+02, threshold=2.911e+02, percent-clipped=2.0 2024-09-16 14:39:17,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=18820.0, ans=0.125 2024-09-16 14:39:31,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=18860.0, ans=0.125 2024-09-16 14:39:37,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=18860.0, ans=0.1114 2024-09-16 14:39:40,016 INFO [train.py:1198] (1/2) Epoch 2, batch 200, loss[loss=0.3792, ctc_loss=0.3595, cr_loss=0.4661, attn_decoder_loss=0.3711, over 27317.00 frames. ], tot_loss[loss=0.3561, ctc_loss=0.3229, cr_loss=0.4423, attn_decoder_loss=0.35, over 3661663.80 frames. ], batch size: 124, lr: 4.06e-02, grad_scale: 16.0 2024-09-16 14:39:49,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=18900.0, ans=0.025 2024-09-16 14:39:49,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=21.675 2024-09-16 14:39:56,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=18940.0, ans=0.07 2024-09-16 14:40:09,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.38 vs. limit=14.49 2024-09-16 14:40:20,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=5.8469999999999995 2024-09-16 14:40:30,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=19020.0, ans=0.05979999999999999 2024-09-16 14:40:35,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=19020.0, ans=0.0 2024-09-16 14:40:43,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.45 vs. limit=21.795 2024-09-16 14:40:52,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=19060.0, ans=0.125 2024-09-16 14:40:58,056 INFO [train.py:1198] (1/2) Epoch 2, batch 250, loss[loss=0.3685, ctc_loss=0.3332, cr_loss=0.4475, attn_decoder_loss=0.3625, over 29234.00 frames. ], tot_loss[loss=0.3541, ctc_loss=0.3204, cr_loss=0.442, attn_decoder_loss=0.3481, over 4142732.57 frames. ], batch size: 100, lr: 4.05e-02, grad_scale: 16.0 2024-09-16 14:41:11,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19140.0, ans=0.1086 2024-09-16 14:41:19,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=19140.0, ans=0.025 2024-09-16 14:41:28,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=19180.0, ans=0.125 2024-09-16 14:41:29,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.42 vs. limit=14.692499999999999 2024-09-16 14:41:41,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=19180.0, ans=0.125 2024-09-16 14:41:50,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.356e+02 1.504e+02 1.757e+02 3.092e+02, threshold=3.008e+02, percent-clipped=1.0 2024-09-16 14:42:03,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=19260.0, ans=0.10740000000000002 2024-09-16 14:42:12,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=19260.0, ans=0.07 2024-09-16 14:42:16,764 INFO [train.py:1198] (1/2) Epoch 2, batch 300, loss[loss=0.3841, ctc_loss=0.3598, cr_loss=0.4378, attn_decoder_loss=0.377, over 29560.00 frames. ], tot_loss[loss=0.3519, ctc_loss=0.3174, cr_loss=0.4414, attn_decoder_loss=0.346, over 4511086.67 frames. ], batch size: 92, lr: 4.05e-02, grad_scale: 16.0 2024-09-16 14:42:23,324 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:42:24,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=19300.0, ans=0.125 2024-09-16 14:42:24,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=19300.0, ans=0.025 2024-09-16 14:42:32,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19340.0, ans=0.1066 2024-09-16 14:42:32,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19340.0, ans=0.125 2024-09-16 14:42:50,700 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:43:21,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19460.0, ans=0.125 2024-09-16 14:43:27,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=19460.0, ans=0.0 2024-09-16 14:43:33,254 INFO [train.py:1198] (1/2) Epoch 2, batch 350, loss[loss=0.3327, ctc_loss=0.3009, cr_loss=0.4367, attn_decoder_loss=0.3265, over 29283.00 frames. ], tot_loss[loss=0.3513, ctc_loss=0.3163, cr_loss=0.4409, attn_decoder_loss=0.3454, over 4796093.25 frames. ], batch size: 71, lr: 4.04e-02, grad_scale: 16.0 2024-09-16 14:43:38,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=19500.0, ans=0.125 2024-09-16 14:43:42,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=19500.0, ans=0.125 2024-09-16 14:43:56,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=19540.0, ans=0.025 2024-09-16 14:44:26,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.410e+02 1.578e+02 1.828e+02 5.190e+02, threshold=3.157e+02, percent-clipped=4.0 2024-09-16 14:44:33,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.215 2024-09-16 14:44:39,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=19660.0, ans=0.10340000000000002 2024-09-16 14:44:51,068 INFO [train.py:1198] (1/2) Epoch 2, batch 400, loss[loss=0.355, ctc_loss=0.3204, cr_loss=0.4304, attn_decoder_loss=0.3493, over 29712.00 frames. ], tot_loss[loss=0.3502, ctc_loss=0.3144, cr_loss=0.4406, attn_decoder_loss=0.3444, over 5024529.21 frames. ], batch size: 82, lr: 4.03e-02, grad_scale: 32.0 2024-09-16 14:44:51,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=19700.0, ans=0.21050000000000002 2024-09-16 14:44:58,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=5.955 2024-09-16 14:45:03,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=19700.0, ans=0.125 2024-09-16 14:45:03,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=19700.0, ans=0.006586956521739131 2024-09-16 14:45:17,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=19740.0, ans=0.006578260869565217 2024-09-16 14:45:25,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=19780.0, ans=0.49670000000000003 2024-09-16 14:45:34,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=5.9670000000000005 2024-09-16 14:45:37,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19820.0, ans=0.1018 2024-09-16 14:46:10,055 INFO [train.py:1198] (1/2) Epoch 2, batch 450, loss[loss=0.3631, ctc_loss=0.3226, cr_loss=0.4488, attn_decoder_loss=0.3576, over 29688.00 frames. ], tot_loss[loss=0.3496, ctc_loss=0.3139, cr_loss=0.4412, attn_decoder_loss=0.3438, over 5186045.64 frames. ], batch size: 83, lr: 4.03e-02, grad_scale: 32.0 2024-09-16 14:46:18,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19900.0, ans=0.101 2024-09-16 14:46:22,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=19900.0, ans=0.07 2024-09-16 14:46:31,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=19940.0, ans=0.125 2024-09-16 14:46:38,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=14.9775 2024-09-16 14:46:40,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=19980.0, ans=0.125 2024-09-16 14:47:02,431 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.299e+02 1.486e+02 1.745e+02 5.446e+02, threshold=2.972e+02, percent-clipped=3.0 2024-09-16 14:47:21,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=20060.0, ans=15.0 2024-09-16 14:47:26,734 INFO [train.py:1198] (1/2) Epoch 2, batch 500, loss[loss=0.3778, ctc_loss=0.3389, cr_loss=0.4826, attn_decoder_loss=0.3714, over 29465.00 frames. ], tot_loss[loss=0.3479, ctc_loss=0.3119, cr_loss=0.4398, attn_decoder_loss=0.3421, over 5328827.63 frames. ], batch size: 94, lr: 4.02e-02, grad_scale: 32.0 2024-09-16 14:48:06,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=20180.0, ans=0.0 2024-09-16 14:48:15,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2024-09-16 14:48:30,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=20260.0, ans=0.2 2024-09-16 14:48:34,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=20260.0, ans=0.0 2024-09-16 14:48:39,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=20260.0, ans=0.2 2024-09-16 14:48:43,731 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:48:44,961 INFO [train.py:1198] (1/2) Epoch 2, batch 550, loss[loss=0.3651, ctc_loss=0.3238, cr_loss=0.4856, attn_decoder_loss=0.3589, over 28893.00 frames. ], tot_loss[loss=0.348, ctc_loss=0.3122, cr_loss=0.4403, attn_decoder_loss=0.3422, over 5423615.25 frames. ], batch size: 104, lr: 4.02e-02, grad_scale: 16.0 2024-09-16 14:49:11,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=20340.0, ans=0.125 2024-09-16 14:49:17,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=20380.0, ans=10.0 2024-09-16 14:49:24,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=20380.0, ans=0.0 2024-09-16 14:49:26,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20380.0, ans=0.1 2024-09-16 14:49:32,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=20420.0, ans=0.0 2024-09-16 14:49:38,286 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.358e+02 1.600e+02 1.893e+02 5.686e+02, threshold=3.199e+02, percent-clipped=4.0 2024-09-16 14:49:38,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=20420.0, ans=0.006430434782608695 2024-09-16 14:49:56,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=20460.0, ans=0.125 2024-09-16 14:50:03,473 INFO [train.py:1198] (1/2) Epoch 2, batch 600, loss[loss=0.3809, ctc_loss=0.3436, cr_loss=0.4808, attn_decoder_loss=0.3743, over 29223.00 frames. ], tot_loss[loss=0.3482, ctc_loss=0.3121, cr_loss=0.4404, attn_decoder_loss=0.3424, over 5510942.27 frames. ], batch size: 100, lr: 4.01e-02, grad_scale: 16.0 2024-09-16 14:50:11,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=20500.0, ans=0.0 2024-09-16 14:50:15,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.97 vs. limit=5.0 2024-09-16 14:50:23,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.39 vs. limit=15.0 2024-09-16 14:50:26,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=20540.0, ans=0.0 2024-09-16 14:50:38,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-09-16 14:50:44,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=20580.0, ans=0.125 2024-09-16 14:50:56,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=20620.0, ans=0.0 2024-09-16 14:50:56,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.51 vs. limit=22.5 2024-09-16 14:51:11,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=20660.0, ans=0.2 2024-09-16 14:51:13,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=20660.0, ans=0.125 2024-09-16 14:51:19,267 INFO [train.py:1198] (1/2) Epoch 2, batch 650, loss[loss=0.3464, ctc_loss=0.3001, cr_loss=0.4706, attn_decoder_loss=0.3411, over 29756.00 frames. ], tot_loss[loss=0.347, ctc_loss=0.3104, cr_loss=0.4404, attn_decoder_loss=0.3413, over 5587926.83 frames. ], batch size: 81, lr: 4.00e-02, grad_scale: 16.0 2024-09-16 14:51:23,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2024-09-16 14:51:29,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.37 vs. limit=22.5 2024-09-16 14:51:51,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=20780.0, ans=0.1 2024-09-16 14:52:14,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2024-09-16 14:52:15,199 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.306e+02 1.501e+02 1.738e+02 3.373e+02, threshold=3.002e+02, percent-clipped=2.0 2024-09-16 14:52:26,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=20860.0, ans=0.125 2024-09-16 14:52:29,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=20860.0, ans=0.125 2024-09-16 14:52:35,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-09-16 14:52:37,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.10 vs. limit=15.0 2024-09-16 14:52:38,091 INFO [train.py:1198] (1/2) Epoch 2, batch 700, loss[loss=0.3261, ctc_loss=0.2765, cr_loss=0.4373, attn_decoder_loss=0.3219, over 29537.00 frames. ], tot_loss[loss=0.3473, ctc_loss=0.3108, cr_loss=0.4417, attn_decoder_loss=0.3416, over 5638246.76 frames. ], batch size: 76, lr: 4.00e-02, grad_scale: 16.0 2024-09-16 14:52:54,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=20940.0, ans=0.2 2024-09-16 14:52:59,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20940.0, ans=0.1 2024-09-16 14:53:03,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20940.0, ans=0.1 2024-09-16 14:53:07,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=20980.0, ans=0.006308695652173913 2024-09-16 14:53:19,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2024-09-16 14:53:19,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2024-09-16 14:53:36,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=21020.0, ans=0.2 2024-09-16 14:53:56,473 INFO [train.py:1198] (1/2) Epoch 2, batch 750, loss[loss=0.3542, ctc_loss=0.3113, cr_loss=0.476, attn_decoder_loss=0.3484, over 29710.00 frames. ], tot_loss[loss=0.3463, ctc_loss=0.3097, cr_loss=0.4415, attn_decoder_loss=0.3406, over 5676318.46 frames. ], batch size: 82, lr: 3.99e-02, grad_scale: 16.0 2024-09-16 14:54:07,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=21100.0, ans=0.125 2024-09-16 14:54:31,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=21180.0, ans=0.2 2024-09-16 14:54:48,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=21220.0, ans=0.006256521739130435 2024-09-16 14:54:49,443 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.356e+02 1.549e+02 1.774e+02 3.247e+02, threshold=3.098e+02, percent-clipped=2.0 2024-09-16 14:54:52,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.87 vs. limit=22.5 2024-09-16 14:55:10,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.02 vs. limit=15.0 2024-09-16 14:55:12,221 INFO [train.py:1198] (1/2) Epoch 2, batch 800, loss[loss=0.318, ctc_loss=0.2819, cr_loss=0.3968, attn_decoder_loss=0.3132, over 29613.00 frames. ], tot_loss[loss=0.3459, ctc_loss=0.309, cr_loss=0.4403, attn_decoder_loss=0.3402, over 5706596.29 frames. ], batch size: 73, lr: 3.98e-02, grad_scale: 32.0 2024-09-16 14:55:34,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=21340.0, ans=0.025 2024-09-16 14:55:37,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=21340.0, ans=0.2 2024-09-16 14:55:40,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=22.5 2024-09-16 14:56:05,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=21420.0, ans=0.0 2024-09-16 14:56:05,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=21420.0, ans=0.1 2024-09-16 14:56:29,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=21500.0, ans=0.125 2024-09-16 14:56:30,085 INFO [train.py:1198] (1/2) Epoch 2, batch 850, loss[loss=0.3659, ctc_loss=0.3261, cr_loss=0.4912, attn_decoder_loss=0.3594, over 29692.00 frames. ], tot_loss[loss=0.3448, ctc_loss=0.3075, cr_loss=0.4408, attn_decoder_loss=0.3391, over 5735337.33 frames. ], batch size: 89, lr: 3.98e-02, grad_scale: 16.0 2024-09-16 14:56:42,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=21500.0, ans=0.2 2024-09-16 14:56:42,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2024-09-16 14:56:49,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=21540.0, ans=0.2 2024-09-16 14:56:57,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=21540.0, ans=0.125 2024-09-16 14:56:58,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.85 vs. limit=15.0 2024-09-16 14:57:02,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=21580.0, ans=0.125 2024-09-16 14:57:02,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.63 vs. limit=22.5 2024-09-16 14:57:05,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=21580.0, ans=0.2 2024-09-16 14:57:05,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=21580.0, ans=0.125 2024-09-16 14:57:06,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=21580.0, ans=0.025 2024-09-16 14:57:24,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=21620.0, ans=0.125 2024-09-16 14:57:25,361 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.316e+02 1.489e+02 1.639e+02 3.105e+02, threshold=2.978e+02, percent-clipped=1.0 2024-09-16 14:57:27,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21620.0, ans=0.1 2024-09-16 14:57:46,616 INFO [train.py:1198] (1/2) Epoch 2, batch 900, loss[loss=0.3018, ctc_loss=0.263, cr_loss=0.3901, attn_decoder_loss=0.2974, over 29599.00 frames. ], tot_loss[loss=0.3447, ctc_loss=0.307, cr_loss=0.4404, attn_decoder_loss=0.3391, over 5740665.61 frames. ], batch size: 73, lr: 3.97e-02, grad_scale: 16.0 2024-09-16 14:58:06,617 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2024-09-16 14:58:24,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21780.0, ans=0.1 2024-09-16 14:59:04,593 INFO [train.py:1198] (1/2) Epoch 2, batch 950, loss[loss=0.3093, ctc_loss=0.2615, cr_loss=0.3908, attn_decoder_loss=0.3059, over 29521.00 frames. ], tot_loss[loss=0.3447, ctc_loss=0.3069, cr_loss=0.4402, attn_decoder_loss=0.3392, over 5741737.12 frames. ], batch size: 74, lr: 3.97e-02, grad_scale: 16.0 2024-09-16 14:59:46,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=21980.0, ans=0.0 2024-09-16 14:59:52,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=22020.0, ans=0.006082608695652174 2024-09-16 15:00:01,565 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.375e+02 1.582e+02 1.931e+02 4.850e+02, threshold=3.164e+02, percent-clipped=3.0 2024-09-16 15:00:06,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=22060.0, ans=0.125 2024-09-16 15:00:06,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22060.0, ans=0.125 2024-09-16 15:00:22,486 INFO [train.py:1198] (1/2) Epoch 2, batch 1000, loss[loss=0.3403, ctc_loss=0.2943, cr_loss=0.4548, attn_decoder_loss=0.3353, over 29508.00 frames. ], tot_loss[loss=0.3455, ctc_loss=0.3081, cr_loss=0.4411, attn_decoder_loss=0.3399, over 5736222.82 frames. ], batch size: 77, lr: 3.96e-02, grad_scale: 16.0 2024-09-16 15:00:24,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=22100.0, ans=0.125 2024-09-16 15:00:24,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=22100.0, ans=0.125 2024-09-16 15:00:33,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=22100.0, ans=0.006065217391304348 2024-09-16 15:00:33,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=22100.0, ans=0.2 2024-09-16 15:00:37,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-09-16 15:00:47,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=22140.0, ans=0.0 2024-09-16 15:01:02,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=22180.0, ans=0.125 2024-09-16 15:01:08,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=22220.0, ans=0.125 2024-09-16 15:01:18,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=22220.0, ans=0.0060391304347826085 2024-09-16 15:01:25,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=22260.0, ans=0.006030434782608696 2024-09-16 15:01:27,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=22260.0, ans=0.025 2024-09-16 15:01:29,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=22260.0, ans=0.04949747468305833 2024-09-16 15:01:38,483 INFO [train.py:1198] (1/2) Epoch 2, batch 1050, loss[loss=0.3496, ctc_loss=0.3063, cr_loss=0.4512, attn_decoder_loss=0.3444, over 29699.00 frames. ], tot_loss[loss=0.3442, ctc_loss=0.3063, cr_loss=0.4399, attn_decoder_loss=0.3387, over 5744929.69 frames. ], batch size: 85, lr: 3.95e-02, grad_scale: 16.0 2024-09-16 15:01:53,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=22300.0, ans=0.5 2024-09-16 15:02:03,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2024-09-16 15:02:05,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22340.0, ans=0.1 2024-09-16 15:02:07,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.38 vs. limit=15.0 2024-09-16 15:02:14,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22380.0, ans=0.1 2024-09-16 15:02:20,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=12.0 2024-09-16 15:02:27,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.76 vs. limit=22.5 2024-09-16 15:02:36,118 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.038e+02 1.353e+02 1.564e+02 1.813e+02 2.890e+02, threshold=3.129e+02, percent-clipped=0.0 2024-09-16 15:02:36,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=22420.0, ans=0.125 2024-09-16 15:02:57,507 INFO [train.py:1198] (1/2) Epoch 2, batch 1100, loss[loss=0.3307, ctc_loss=0.2855, cr_loss=0.4317, attn_decoder_loss=0.3262, over 29436.00 frames. ], tot_loss[loss=0.3439, ctc_loss=0.3059, cr_loss=0.4403, attn_decoder_loss=0.3383, over 5756191.57 frames. ], batch size: 78, lr: 3.95e-02, grad_scale: 16.0 2024-09-16 15:03:05,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=22500.0, ans=0.0 2024-09-16 15:03:17,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=22540.0, ans=0.025 2024-09-16 15:03:19,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22540.0, ans=0.1 2024-09-16 15:03:23,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2024-09-16 15:03:36,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2024-09-16 15:03:45,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=22620.0, ans=0.125 2024-09-16 15:04:15,683 INFO [train.py:1198] (1/2) Epoch 2, batch 1150, loss[loss=0.3362, ctc_loss=0.2883, cr_loss=0.4183, attn_decoder_loss=0.3322, over 29461.00 frames. ], tot_loss[loss=0.3435, ctc_loss=0.3057, cr_loss=0.44, attn_decoder_loss=0.338, over 5755340.58 frames. ], batch size: 78, lr: 3.94e-02, grad_scale: 16.0 2024-09-16 15:04:16,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.49 vs. limit=15.0 2024-09-16 15:04:19,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22700.0, ans=0.1 2024-09-16 15:04:25,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=22700.0, ans=0.0 2024-09-16 15:04:55,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=22780.0, ans=0.2 2024-09-16 15:05:03,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=22820.0, ans=0.025 2024-09-16 15:05:10,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 1.307e+02 1.503e+02 1.816e+02 4.036e+02, threshold=3.005e+02, percent-clipped=3.0 2024-09-16 15:05:10,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=22820.0, ans=0.125 2024-09-16 15:05:14,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=24.54 vs. limit=22.5 2024-09-16 15:05:18,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=22860.0, ans=0.07 2024-09-16 15:05:31,660 INFO [train.py:1198] (1/2) Epoch 2, batch 1200, loss[loss=0.3543, ctc_loss=0.3166, cr_loss=0.4489, attn_decoder_loss=0.3486, over 29669.00 frames. ], tot_loss[loss=0.3445, ctc_loss=0.3063, cr_loss=0.4408, attn_decoder_loss=0.339, over 5747218.38 frames. ], batch size: 85, lr: 3.93e-02, grad_scale: 32.0 2024-09-16 15:05:45,059 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:05:45,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=22900.0, ans=0.09899494936611666 2024-09-16 15:05:46,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=22900.0, ans=0.2 2024-09-16 15:05:54,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.11 vs. limit=15.0 2024-09-16 15:05:58,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=22940.0, ans=0.1 2024-09-16 15:06:06,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=22980.0, ans=0.0 2024-09-16 15:06:17,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.78 vs. limit=22.5 2024-09-16 15:06:17,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2024-09-16 15:06:18,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=23020.0, ans=0.125 2024-09-16 15:06:18,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=23020.0, ans=10.0 2024-09-16 15:06:20,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=23020.0, ans=6.0 2024-09-16 15:06:24,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=23020.0, ans=0.05 2024-09-16 15:06:44,715 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.349e-02 2024-09-16 15:06:44,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=23060.0, ans=0.125 2024-09-16 15:06:49,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=23100.0, ans=0.125 2024-09-16 15:06:50,267 INFO [train.py:1198] (1/2) Epoch 2, batch 1250, loss[loss=0.3597, ctc_loss=0.3196, cr_loss=0.4764, attn_decoder_loss=0.3535, over 29528.00 frames. ], tot_loss[loss=0.3451, ctc_loss=0.3067, cr_loss=0.4426, attn_decoder_loss=0.3396, over 5775085.82 frames. ], batch size: 92, lr: 3.93e-02, grad_scale: 16.0 2024-09-16 15:06:58,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=23100.0, ans=10.0 2024-09-16 15:07:08,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-09-16 15:07:14,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=23140.0, ans=0.1 2024-09-16 15:07:35,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=23180.0, ans=0.2 2024-09-16 15:07:37,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=23220.0, ans=0.125 2024-09-16 15:07:47,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=23220.0, ans=0.1 2024-09-16 15:07:49,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.374e+02 1.508e+02 1.823e+02 4.800e+02, threshold=3.017e+02, percent-clipped=3.0 2024-09-16 15:07:56,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=12.0 2024-09-16 15:08:08,751 INFO [train.py:1198] (1/2) Epoch 2, batch 1300, loss[loss=0.3553, ctc_loss=0.3186, cr_loss=0.466, attn_decoder_loss=0.349, over 28454.00 frames. ], tot_loss[loss=0.3438, ctc_loss=0.3051, cr_loss=0.4423, attn_decoder_loss=0.3383, over 5780654.09 frames. ], batch size: 112, lr: 3.92e-02, grad_scale: 16.0 2024-09-16 15:08:22,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2024-09-16 15:08:46,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.83 vs. limit=10.0 2024-09-16 15:08:47,172 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:09:10,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=23460.0, ans=0.025 2024-09-16 15:09:14,826 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.452e-02 2024-09-16 15:09:20,841 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:09:25,048 INFO [train.py:1198] (1/2) Epoch 2, batch 1350, loss[loss=0.3524, ctc_loss=0.3135, cr_loss=0.4647, attn_decoder_loss=0.3464, over 29715.00 frames. ], tot_loss[loss=0.3422, ctc_loss=0.3026, cr_loss=0.4415, attn_decoder_loss=0.3367, over 5794960.07 frames. ], batch size: 81, lr: 3.91e-02, grad_scale: 16.0 2024-09-16 15:09:28,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=23500.0, ans=0.125 2024-09-16 15:09:36,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=23500.0, ans=0.125 2024-09-16 15:09:40,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=23540.0, ans=0.125 2024-09-16 15:09:59,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=23580.0, ans=0.0 2024-09-16 15:10:23,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.283e+02 1.428e+02 1.705e+02 2.892e+02, threshold=2.856e+02, percent-clipped=0.0 2024-09-16 15:10:42,646 INFO [train.py:1198] (1/2) Epoch 2, batch 1400, loss[loss=0.2969, ctc_loss=0.2581, cr_loss=0.3772, attn_decoder_loss=0.2928, over 29599.00 frames. ], tot_loss[loss=0.3418, ctc_loss=0.302, cr_loss=0.4418, attn_decoder_loss=0.3364, over 5806361.29 frames. ], batch size: 69, lr: 3.91e-02, grad_scale: 16.0 2024-09-16 15:10:55,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2024-09-16 15:10:56,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2024-09-16 15:10:58,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=23740.0, ans=0.0 2024-09-16 15:11:00,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=23740.0, ans=0.0 2024-09-16 15:11:13,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=23780.0, ans=0.125 2024-09-16 15:11:46,797 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:11:49,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=23860.0, ans=0.125 2024-09-16 15:12:00,334 INFO [train.py:1198] (1/2) Epoch 2, batch 1450, loss[loss=0.3701, ctc_loss=0.329, cr_loss=0.4833, attn_decoder_loss=0.364, over 29409.00 frames. ], tot_loss[loss=0.3423, ctc_loss=0.3026, cr_loss=0.4417, attn_decoder_loss=0.3369, over 5804801.74 frames. ], batch size: 94, lr: 3.90e-02, grad_scale: 16.0 2024-09-16 15:12:11,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=23900.0, ans=0.0 2024-09-16 15:12:20,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=23940.0, ans=10.0 2024-09-16 15:12:27,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=23940.0, ans=0.0 2024-09-16 15:12:36,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=23980.0, ans=0.125 2024-09-16 15:12:40,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=23980.0, ans=0.125 2024-09-16 15:12:56,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.348e+02 1.492e+02 1.698e+02 3.722e+02, threshold=2.983e+02, percent-clipped=2.0 2024-09-16 15:12:57,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-09-16 15:13:03,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2024-09-16 15:13:05,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=24060.0, ans=0.125 2024-09-16 15:13:16,028 INFO [train.py:1198] (1/2) Epoch 2, batch 1500, loss[loss=0.3466, ctc_loss=0.293, cr_loss=0.4673, attn_decoder_loss=0.3421, over 29612.00 frames. ], tot_loss[loss=0.343, ctc_loss=0.3029, cr_loss=0.4427, attn_decoder_loss=0.3376, over 5806444.77 frames. ], batch size: 86, lr: 3.90e-02, grad_scale: 16.0 2024-09-16 15:13:16,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=24100.0, ans=0.005630434782608696 2024-09-16 15:13:25,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=24100.0, ans=0.1 2024-09-16 15:13:27,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=24100.0, ans=0.125 2024-09-16 15:13:54,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=24180.0, ans=0.0 2024-09-16 15:13:57,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.74 vs. limit=10.0 2024-09-16 15:14:07,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-09-16 15:14:08,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.40 vs. limit=10.0 2024-09-16 15:14:34,946 INFO [train.py:1198] (1/2) Epoch 2, batch 1550, loss[loss=0.3692, ctc_loss=0.3339, cr_loss=0.4723, attn_decoder_loss=0.3626, over 29464.00 frames. ], tot_loss[loss=0.3426, ctc_loss=0.3026, cr_loss=0.4424, attn_decoder_loss=0.3372, over 5782517.26 frames. ], batch size: 90, lr: 3.89e-02, grad_scale: 8.0 2024-09-16 15:15:10,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.95 vs. limit=22.5 2024-09-16 15:15:14,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=24380.0, ans=0.125 2024-09-16 15:15:21,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2024-09-16 15:15:25,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=24420.0, ans=0.125 2024-09-16 15:15:25,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=24420.0, ans=0.0 2024-09-16 15:15:26,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2024-09-16 15:15:30,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=24420.0, ans=0.0 2024-09-16 15:15:30,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=24420.0, ans=0.0055608695652173915 2024-09-16 15:15:34,549 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.379e+02 1.577e+02 1.948e+02 4.764e+02, threshold=3.154e+02, percent-clipped=9.0 2024-09-16 15:15:44,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-09-16 15:15:44,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.38 vs. limit=15.0 2024-09-16 15:15:46,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=24460.0, ans=0.0 2024-09-16 15:15:52,712 INFO [train.py:1198] (1/2) Epoch 2, batch 1600, loss[loss=0.357, ctc_loss=0.3137, cr_loss=0.4844, attn_decoder_loss=0.3511, over 29674.00 frames. ], tot_loss[loss=0.3426, ctc_loss=0.303, cr_loss=0.4422, attn_decoder_loss=0.3372, over 5764056.06 frames. ], batch size: 85, lr: 3.88e-02, grad_scale: 16.0 2024-09-16 15:16:06,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=24540.0, ans=0.025 2024-09-16 15:16:14,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.27 vs. limit=10.0 2024-09-16 15:16:46,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=24620.0, ans=0.125 2024-09-16 15:16:46,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=24620.0, ans=0.0 2024-09-16 15:17:08,435 INFO [train.py:1198] (1/2) Epoch 2, batch 1650, loss[loss=0.3632, ctc_loss=0.3245, cr_loss=0.4802, attn_decoder_loss=0.3568, over 29698.00 frames. ], tot_loss[loss=0.3419, ctc_loss=0.3021, cr_loss=0.4413, attn_decoder_loss=0.3366, over 5758119.53 frames. ], batch size: 89, lr: 3.88e-02, grad_scale: 16.0 2024-09-16 15:17:16,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=24700.0, ans=0.125 2024-09-16 15:17:44,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=24780.0, ans=0.0 2024-09-16 15:17:47,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=24780.0, ans=0.125 2024-09-16 15:17:57,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.08 vs. limit=12.0 2024-09-16 15:18:08,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.020e+02 1.312e+02 1.453e+02 1.722e+02 6.388e+02, threshold=2.905e+02, percent-clipped=6.0 2024-09-16 15:18:13,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=24860.0, ans=0.125 2024-09-16 15:18:21,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=18.32 vs. limit=15.0 2024-09-16 15:18:25,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=24900.0, ans=0.1 2024-09-16 15:18:26,757 INFO [train.py:1198] (1/2) Epoch 2, batch 1700, loss[loss=0.3016, ctc_loss=0.2655, cr_loss=0.4395, attn_decoder_loss=0.2958, over 29593.00 frames. ], tot_loss[loss=0.3412, ctc_loss=0.3007, cr_loss=0.442, attn_decoder_loss=0.3359, over 5780217.95 frames. ], batch size: 69, lr: 3.87e-02, grad_scale: 16.0 2024-09-16 15:18:33,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.53 vs. limit=22.5 2024-09-16 15:18:40,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=24900.0, ans=6.0 2024-09-16 15:18:41,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=24940.0, ans=0.025 2024-09-16 15:18:47,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=24940.0, ans=0.0 2024-09-16 15:18:51,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24940.0, ans=0.125 2024-09-16 15:19:11,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=24980.0, ans=0.2 2024-09-16 15:19:11,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24980.0, ans=0.1 2024-09-16 15:19:22,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=25020.0, ans=0.125 2024-09-16 15:19:35,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=25060.0, ans=10.0 2024-09-16 15:19:40,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.63 vs. limit=22.5 2024-09-16 15:19:44,699 INFO [train.py:1198] (1/2) Epoch 2, batch 1750, loss[loss=0.2897, ctc_loss=0.2351, cr_loss=0.3747, attn_decoder_loss=0.2875, over 29338.00 frames. ], tot_loss[loss=0.3397, ctc_loss=0.2984, cr_loss=0.441, attn_decoder_loss=0.3345, over 5787836.50 frames. ], batch size: 67, lr: 3.86e-02, grad_scale: 16.0 2024-09-16 15:19:49,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=25100.0, ans=0.05 2024-09-16 15:19:55,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=25100.0, ans=0.125 2024-09-16 15:20:05,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-16 15:20:23,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.70 vs. limit=10.0 2024-09-16 15:20:31,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.26 vs. limit=12.0 2024-09-16 15:20:38,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=25220.0, ans=0.125 2024-09-16 15:20:42,198 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.354e+02 1.539e+02 1.820e+02 3.547e+02, threshold=3.078e+02, percent-clipped=3.0 2024-09-16 15:20:44,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=25260.0, ans=0.125 2024-09-16 15:21:00,330 INFO [train.py:1198] (1/2) Epoch 2, batch 1800, loss[loss=0.3526, ctc_loss=0.3084, cr_loss=0.4289, attn_decoder_loss=0.348, over 29704.00 frames. ], tot_loss[loss=0.3398, ctc_loss=0.2986, cr_loss=0.4413, attn_decoder_loss=0.3346, over 5790527.06 frames. ], batch size: 83, lr: 3.86e-02, grad_scale: 16.0 2024-09-16 15:21:08,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=25300.0, ans=0.0 2024-09-16 15:21:31,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=25380.0, ans=0.125 2024-09-16 15:21:32,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-09-16 15:22:08,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=25460.0, ans=0.125 2024-09-16 15:22:14,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.30 vs. limit=22.5 2024-09-16 15:22:18,553 INFO [train.py:1198] (1/2) Epoch 2, batch 1850, loss[loss=0.3531, ctc_loss=0.3011, cr_loss=0.45, attn_decoder_loss=0.3488, over 29625.00 frames. ], tot_loss[loss=0.3398, ctc_loss=0.2987, cr_loss=0.4424, attn_decoder_loss=0.3346, over 5797080.38 frames. ], batch size: 86, lr: 3.85e-02, grad_scale: 16.0 2024-09-16 15:22:29,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=25500.0, ans=0.0 2024-09-16 15:22:30,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=25500.0, ans=0.125 2024-09-16 15:22:42,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=5.44 vs. limit=12.0 2024-09-16 15:22:46,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=25540.0, ans=0.125 2024-09-16 15:22:46,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.74 vs. limit=15.0 2024-09-16 15:22:54,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=25580.0, ans=0.125 2024-09-16 15:23:10,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=25620.0, ans=0.1 2024-09-16 15:23:19,028 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.341e+02 1.488e+02 1.704e+02 7.229e+02, threshold=2.976e+02, percent-clipped=2.0 2024-09-16 15:23:25,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=25660.0, ans=0.125 2024-09-16 15:23:37,145 INFO [train.py:1198] (1/2) Epoch 2, batch 1900, loss[loss=0.3702, ctc_loss=0.3376, cr_loss=0.4625, attn_decoder_loss=0.3635, over 29685.00 frames. ], tot_loss[loss=0.3405, ctc_loss=0.2991, cr_loss=0.444, attn_decoder_loss=0.3352, over 5805466.96 frames. ], batch size: 89, lr: 3.85e-02, grad_scale: 16.0 2024-09-16 15:23:47,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=25700.0, ans=0.125 2024-09-16 15:23:52,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=25740.0, ans=0.2 2024-09-16 15:23:57,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=12.0 2024-09-16 15:24:04,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=25740.0, ans=0.1 2024-09-16 15:24:38,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25860.0, ans=0.1 2024-09-16 15:24:44,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=22.5 2024-09-16 15:24:53,303 INFO [train.py:1198] (1/2) Epoch 2, batch 1950, loss[loss=0.3279, ctc_loss=0.2854, cr_loss=0.4425, attn_decoder_loss=0.3228, over 29450.00 frames. ], tot_loss[loss=0.3413, ctc_loss=0.2992, cr_loss=0.4453, attn_decoder_loss=0.3361, over 5819778.55 frames. ], batch size: 78, lr: 3.84e-02, grad_scale: 16.0 2024-09-16 15:25:07,532 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.547e-02 2024-09-16 15:25:08,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=25940.0, ans=0.04949747468305833 2024-09-16 15:25:15,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.79 vs. limit=15.0 2024-09-16 15:25:44,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2024-09-16 15:25:50,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=26020.0, ans=0.125 2024-09-16 15:25:52,895 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.307e+02 1.485e+02 1.949e+02 3.051e+02, threshold=2.970e+02, percent-clipped=1.0 2024-09-16 15:26:11,400 INFO [train.py:1198] (1/2) Epoch 2, batch 2000, loss[loss=0.3098, ctc_loss=0.2733, cr_loss=0.421, attn_decoder_loss=0.3046, over 29362.00 frames. ], tot_loss[loss=0.3419, ctc_loss=0.3004, cr_loss=0.4455, attn_decoder_loss=0.3367, over 5797014.82 frames. ], batch size: 67, lr: 3.83e-02, grad_scale: 32.0 2024-09-16 15:26:17,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=26100.0, ans=0.125 2024-09-16 15:26:18,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=28.77 vs. limit=22.5 2024-09-16 15:26:18,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-09-16 15:26:30,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=26140.0, ans=0.025 2024-09-16 15:26:33,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=26140.0, ans=0.005186956521739131 2024-09-16 15:26:45,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-09-16 15:27:13,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=26260.0, ans=0.07 2024-09-16 15:27:19,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=26260.0, ans=0.005160869565217391 2024-09-16 15:27:21,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=26260.0, ans=0.125 2024-09-16 15:27:25,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2024-09-16 15:27:30,011 INFO [train.py:1198] (1/2) Epoch 2, batch 2050, loss[loss=0.2972, ctc_loss=0.2482, cr_loss=0.391, attn_decoder_loss=0.294, over 29425.00 frames. ], tot_loss[loss=0.3408, ctc_loss=0.2995, cr_loss=0.4451, attn_decoder_loss=0.3355, over 5787504.34 frames. ], batch size: 70, lr: 3.83e-02, grad_scale: 16.0 2024-09-16 15:27:34,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-09-16 15:27:50,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.91 vs. limit=10.0 2024-09-16 15:28:29,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.317e+02 1.483e+02 1.822e+02 5.194e+02, threshold=2.965e+02, percent-clipped=3.0 2024-09-16 15:28:43,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=26460.0, ans=0.125 2024-09-16 15:28:46,403 INFO [train.py:1198] (1/2) Epoch 2, batch 2100, loss[loss=0.3141, ctc_loss=0.2642, cr_loss=0.4244, attn_decoder_loss=0.3102, over 29763.00 frames. ], tot_loss[loss=0.339, ctc_loss=0.2972, cr_loss=0.4436, attn_decoder_loss=0.3338, over 5800539.29 frames. ], batch size: 81, lr: 3.82e-02, grad_scale: 16.0 2024-09-16 15:28:57,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26500.0, ans=0.1 2024-09-16 15:29:01,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=26540.0, ans=0.125 2024-09-16 15:30:05,024 INFO [train.py:1198] (1/2) Epoch 2, batch 2150, loss[loss=0.3387, ctc_loss=0.2958, cr_loss=0.4725, attn_decoder_loss=0.333, over 29453.00 frames. ], tot_loss[loss=0.3381, ctc_loss=0.296, cr_loss=0.4435, attn_decoder_loss=0.3329, over 5815708.75 frames. ], batch size: 78, lr: 3.81e-02, grad_scale: 16.0 2024-09-16 15:30:15,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2024-09-16 15:30:26,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=26740.0, ans=0.125 2024-09-16 15:30:28,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=26740.0, ans=0.0 2024-09-16 15:30:32,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=26740.0, ans=0.125 2024-09-16 15:30:36,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.66 vs. limit=22.5 2024-09-16 15:30:39,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=26780.0, ans=0.035 2024-09-16 15:30:52,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.67 vs. limit=15.0 2024-09-16 15:31:06,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.386e+02 1.602e+02 1.803e+02 8.431e+02, threshold=3.204e+02, percent-clipped=4.0 2024-09-16 15:31:23,610 INFO [train.py:1198] (1/2) Epoch 2, batch 2200, loss[loss=0.3474, ctc_loss=0.2953, cr_loss=0.465, attn_decoder_loss=0.3429, over 29646.00 frames. ], tot_loss[loss=0.3381, ctc_loss=0.2959, cr_loss=0.4434, attn_decoder_loss=0.3329, over 5811692.25 frames. ], batch size: 86, lr: 3.81e-02, grad_scale: 16.0 2024-09-16 15:31:28,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=26900.0, ans=0.005021739130434783 2024-09-16 15:31:51,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=26940.0, ans=0.1 2024-09-16 15:32:22,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=27020.0, ans=0.0 2024-09-16 15:32:40,169 INFO [train.py:1198] (1/2) Epoch 2, batch 2250, loss[loss=0.3401, ctc_loss=0.2913, cr_loss=0.4682, attn_decoder_loss=0.3351, over 29713.00 frames. ], tot_loss[loss=0.3379, ctc_loss=0.2956, cr_loss=0.4435, attn_decoder_loss=0.3327, over 5810629.26 frames. ], batch size: 82, lr: 3.80e-02, grad_scale: 16.0 2024-09-16 15:32:43,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=27100.0, ans=0.1 2024-09-16 15:32:46,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=27100.0, ans=0.0 2024-09-16 15:32:50,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=27100.0, ans=0.125 2024-09-16 15:32:50,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=27100.0, ans=0.0 2024-09-16 15:33:04,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=27140.0, ans=0.1 2024-09-16 15:33:13,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27180.0, ans=0.1 2024-09-16 15:33:41,483 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.301e+02 1.512e+02 1.808e+02 4.415e+02, threshold=3.025e+02, percent-clipped=2.0 2024-09-16 15:33:58,448 INFO [train.py:1198] (1/2) Epoch 2, batch 2300, loss[loss=0.3012, ctc_loss=0.2605, cr_loss=0.3988, attn_decoder_loss=0.2969, over 29296.00 frames. ], tot_loss[loss=0.3369, ctc_loss=0.2945, cr_loss=0.4416, attn_decoder_loss=0.3318, over 5797054.26 frames. ], batch size: 71, lr: 3.79e-02, grad_scale: 16.0 2024-09-16 15:34:00,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=27300.0, ans=0.0049347826086956524 2024-09-16 15:34:09,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.37 vs. limit=15.0 2024-09-16 15:34:23,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=22.5 2024-09-16 15:34:38,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=27380.0, ans=0.2 2024-09-16 15:34:58,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=27420.0, ans=0.125 2024-09-16 15:35:16,566 INFO [train.py:1198] (1/2) Epoch 2, batch 2350, loss[loss=0.3642, ctc_loss=0.3303, cr_loss=0.4714, attn_decoder_loss=0.3575, over 29710.00 frames. ], tot_loss[loss=0.3367, ctc_loss=0.2943, cr_loss=0.4421, attn_decoder_loss=0.3316, over 5803006.73 frames. ], batch size: 83, lr: 3.79e-02, grad_scale: 16.0 2024-09-16 15:35:25,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=27500.0, ans=0.125 2024-09-16 15:35:25,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=27500.0, ans=0.0 2024-09-16 15:35:33,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-09-16 15:35:36,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.90 vs. limit=22.5 2024-09-16 15:36:08,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=27620.0, ans=0.2 2024-09-16 15:36:15,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.428e+02 1.608e+02 2.014e+02 4.831e+02, threshold=3.217e+02, percent-clipped=8.0 2024-09-16 15:36:25,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2024-09-16 15:36:32,927 INFO [train.py:1198] (1/2) Epoch 2, batch 2400, loss[loss=0.3117, ctc_loss=0.2642, cr_loss=0.4043, attn_decoder_loss=0.308, over 29538.00 frames. ], tot_loss[loss=0.3368, ctc_loss=0.294, cr_loss=0.4427, attn_decoder_loss=0.3317, over 5807568.61 frames. ], batch size: 76, lr: 3.78e-02, grad_scale: 32.0 2024-09-16 15:36:33,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=27700.0, ans=0.2 2024-09-16 15:36:35,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.65 vs. limit=15.0 2024-09-16 15:36:42,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=27700.0, ans=0.125 2024-09-16 15:36:43,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=27700.0, ans=0.04949747468305833 2024-09-16 15:36:49,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=27740.0, ans=0.1 2024-09-16 15:37:30,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=27820.0, ans=0.2 2024-09-16 15:37:30,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=27820.0, ans=0.0 2024-09-16 15:37:48,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=27860.0, ans=0.025 2024-09-16 15:37:51,269 INFO [train.py:1198] (1/2) Epoch 2, batch 2450, loss[loss=0.3555, ctc_loss=0.3149, cr_loss=0.4929, attn_decoder_loss=0.3491, over 29719.00 frames. ], tot_loss[loss=0.3381, ctc_loss=0.2955, cr_loss=0.4438, attn_decoder_loss=0.333, over 5784724.57 frames. ], batch size: 82, lr: 3.78e-02, grad_scale: 16.0 2024-09-16 15:37:58,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2024-09-16 15:38:14,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-09-16 15:38:30,204 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.047e-02 2024-09-16 15:38:37,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=28020.0, ans=0.125 2024-09-16 15:38:41,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.29 vs. limit=15.0 2024-09-16 15:38:45,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=28020.0, ans=0.2 2024-09-16 15:38:54,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.358e+02 1.541e+02 1.889e+02 3.653e+02, threshold=3.082e+02, percent-clipped=2.0 2024-09-16 15:38:56,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=28060.0, ans=0.025 2024-09-16 15:39:02,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.03 vs. limit=6.0 2024-09-16 15:39:09,701 INFO [train.py:1198] (1/2) Epoch 2, batch 2500, loss[loss=0.3598, ctc_loss=0.3171, cr_loss=0.5025, attn_decoder_loss=0.3534, over 29623.00 frames. ], tot_loss[loss=0.3382, ctc_loss=0.2955, cr_loss=0.4441, attn_decoder_loss=0.3331, over 5795177.84 frames. ], batch size: 86, lr: 3.77e-02, grad_scale: 16.0 2024-09-16 15:39:12,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.90 vs. limit=10.0 2024-09-16 15:39:18,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=15.0 2024-09-16 15:39:46,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2024-09-16 15:40:20,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.87 vs. limit=15.0 2024-09-16 15:40:24,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=28300.0, ans=0.2 2024-09-16 15:40:25,986 INFO [train.py:1198] (1/2) Epoch 2, batch 2550, loss[loss=0.3017, ctc_loss=0.2653, cr_loss=0.4112, attn_decoder_loss=0.2966, over 29315.00 frames. ], tot_loss[loss=0.3379, ctc_loss=0.2949, cr_loss=0.4434, attn_decoder_loss=0.3328, over 5798670.23 frames. ], batch size: 67, lr: 3.76e-02, grad_scale: 16.0 2024-09-16 15:40:26,331 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:40:27,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28300.0, ans=0.1 2024-09-16 15:40:38,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28300.0, ans=0.1 2024-09-16 15:40:54,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=28380.0, ans=0.125 2024-09-16 15:41:28,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.393e+02 1.535e+02 1.794e+02 3.607e+02, threshold=3.070e+02, percent-clipped=1.0 2024-09-16 15:41:30,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2024-09-16 15:41:31,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=28460.0, ans=0.125 2024-09-16 15:41:44,114 INFO [train.py:1198] (1/2) Epoch 2, batch 2600, loss[loss=0.3426, ctc_loss=0.3, cr_loss=0.46, attn_decoder_loss=0.3371, over 29453.00 frames. ], tot_loss[loss=0.3381, ctc_loss=0.2953, cr_loss=0.4443, attn_decoder_loss=0.333, over 5795514.73 frames. ], batch size: 78, lr: 3.76e-02, grad_scale: 16.0 2024-09-16 15:42:57,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=28660.0, ans=0.125 2024-09-16 15:43:01,958 INFO [train.py:1198] (1/2) Epoch 2, batch 2650, loss[loss=0.3531, ctc_loss=0.3063, cr_loss=0.5076, attn_decoder_loss=0.347, over 29282.00 frames. ], tot_loss[loss=0.3384, ctc_loss=0.2953, cr_loss=0.4455, attn_decoder_loss=0.3333, over 5801402.11 frames. ], batch size: 100, lr: 3.75e-02, grad_scale: 16.0 2024-09-16 15:43:11,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=28700.0, ans=0.125 2024-09-16 15:43:16,223 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.072e-03 2024-09-16 15:43:22,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=28740.0, ans=0.125 2024-09-16 15:43:25,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28740.0, ans=0.1 2024-09-16 15:43:45,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=28780.0, ans=0.004613043478260869 2024-09-16 15:43:46,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=28820.0, ans=0.025 2024-09-16 15:44:02,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=28860.0, ans=0.125 2024-09-16 15:44:03,362 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.330e+02 1.517e+02 1.799e+02 4.153e+02, threshold=3.035e+02, percent-clipped=1.0 2024-09-16 15:44:05,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=28860.0, ans=0.1 2024-09-16 15:44:08,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=28860.0, ans=0.125 2024-09-16 15:44:18,675 INFO [train.py:1198] (1/2) Epoch 2, batch 2700, loss[loss=0.3335, ctc_loss=0.2786, cr_loss=0.4506, attn_decoder_loss=0.3296, over 29545.00 frames. ], tot_loss[loss=0.3385, ctc_loss=0.2952, cr_loss=0.4467, attn_decoder_loss=0.3334, over 5797991.89 frames. ], batch size: 87, lr: 3.74e-02, grad_scale: 16.0 2024-09-16 15:44:25,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=28900.0, ans=0.0 2024-09-16 15:44:50,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=28980.0, ans=0.125 2024-09-16 15:44:58,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=28980.0, ans=0.2 2024-09-16 15:45:02,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=29020.0, ans=0.05 2024-09-16 15:45:17,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29020.0, ans=0.1 2024-09-16 15:45:20,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=29060.0, ans=0.004552173913043479 2024-09-16 15:45:23,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=29060.0, ans=0.09899494936611666 2024-09-16 15:45:36,958 INFO [train.py:1198] (1/2) Epoch 2, batch 2750, loss[loss=0.3313, ctc_loss=0.2911, cr_loss=0.4571, attn_decoder_loss=0.3256, over 29509.00 frames. ], tot_loss[loss=0.3369, ctc_loss=0.2938, cr_loss=0.4447, attn_decoder_loss=0.3318, over 5796233.02 frames. ], batch size: 75, lr: 3.74e-02, grad_scale: 8.0 2024-09-16 15:45:47,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=29100.0, ans=0.2 2024-09-16 15:46:16,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=29180.0, ans=0.025 2024-09-16 15:46:35,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=29220.0, ans=0.0 2024-09-16 15:46:41,280 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.322e+02 1.540e+02 1.938e+02 5.454e+02, threshold=3.080e+02, percent-clipped=6.0 2024-09-16 15:46:43,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=29260.0, ans=0.125 2024-09-16 15:46:47,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=29260.0, ans=0.125 2024-09-16 15:46:55,043 INFO [train.py:1198] (1/2) Epoch 2, batch 2800, loss[loss=0.3833, ctc_loss=0.3765, cr_loss=0.4249, attn_decoder_loss=0.3747, over 20405.00 frames. ], tot_loss[loss=0.3373, ctc_loss=0.2943, cr_loss=0.4446, attn_decoder_loss=0.3322, over 5776594.06 frames. ], batch size: 211, lr: 3.73e-02, grad_scale: 16.0 2024-09-16 15:47:16,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29340.0, ans=0.1 2024-09-16 15:47:23,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=29380.0, ans=0.125 2024-09-16 15:47:29,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=29380.0, ans=0.0 2024-09-16 15:47:45,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=29420.0, ans=0.125 2024-09-16 15:47:46,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=29420.0, ans=0.5 2024-09-16 15:47:50,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2024-09-16 15:47:55,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=29460.0, ans=0.2 2024-09-16 15:47:58,595 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:48:07,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.88 vs. limit=15.0 2024-09-16 15:48:10,151 INFO [train.py:1198] (1/2) Epoch 2, batch 2850, loss[loss=0.3252, ctc_loss=0.2781, cr_loss=0.4278, attn_decoder_loss=0.3209, over 29519.00 frames. ], tot_loss[loss=0.3377, ctc_loss=0.2948, cr_loss=0.4449, attn_decoder_loss=0.3325, over 5761677.72 frames. ], batch size: 77, lr: 3.73e-02, grad_scale: 16.0 2024-09-16 15:48:19,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=29500.0, ans=0.125 2024-09-16 15:48:28,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=29540.0, ans=0.125 2024-09-16 15:48:31,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=29540.0, ans=0.0 2024-09-16 15:49:15,201 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.408e+02 1.587e+02 1.885e+02 4.187e+02, threshold=3.175e+02, percent-clipped=5.0 2024-09-16 15:49:24,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2024-09-16 15:49:28,799 INFO [train.py:1198] (1/2) Epoch 2, batch 2900, loss[loss=0.3145, ctc_loss=0.2486, cr_loss=0.4359, attn_decoder_loss=0.3121, over 29431.00 frames. ], tot_loss[loss=0.3382, ctc_loss=0.2944, cr_loss=0.4462, attn_decoder_loss=0.3332, over 5786227.13 frames. ], batch size: 79, lr: 3.72e-02, grad_scale: 16.0 2024-09-16 15:49:30,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=29700.0, ans=0.125 2024-09-16 15:49:39,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=29700.0, ans=0.0 2024-09-16 15:49:49,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=29740.0, ans=0.2 2024-09-16 15:49:55,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29740.0, ans=0.1 2024-09-16 15:49:58,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2024-09-16 15:50:01,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=29780.0, ans=0.125 2024-09-16 15:50:02,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=29780.0, ans=0.004395652173913044 2024-09-16 15:50:10,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=29780.0, ans=0.125 2024-09-16 15:50:26,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=29820.0, ans=0.125 2024-09-16 15:50:36,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=29860.0, ans=0.125 2024-09-16 15:50:46,435 INFO [train.py:1198] (1/2) Epoch 2, batch 2950, loss[loss=0.3269, ctc_loss=0.2779, cr_loss=0.456, attn_decoder_loss=0.3222, over 29527.00 frames. ], tot_loss[loss=0.3365, ctc_loss=0.2928, cr_loss=0.444, attn_decoder_loss=0.3315, over 5782292.48 frames. ], batch size: 75, lr: 3.71e-02, grad_scale: 16.0 2024-09-16 15:51:04,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2024-09-16 15:51:10,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=29940.0, ans=0.125 2024-09-16 15:51:18,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-16 15:51:27,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=29980.0, ans=10.0 2024-09-16 15:51:48,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.374e+02 1.533e+02 1.890e+02 8.560e+02, threshold=3.066e+02, percent-clipped=4.0 2024-09-16 15:51:48,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=30060.0, ans=0.125 2024-09-16 15:52:02,437 INFO [train.py:1198] (1/2) Epoch 2, batch 3000, loss[loss=0.3356, ctc_loss=0.2867, cr_loss=0.4265, attn_decoder_loss=0.3315, over 29765.00 frames. ], tot_loss[loss=0.336, ctc_loss=0.2924, cr_loss=0.4442, attn_decoder_loss=0.331, over 5783292.79 frames. ], batch size: 81, lr: 3.71e-02, grad_scale: 16.0 2024-09-16 15:52:02,438 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 15:52:20,638 INFO [train.py:1230] (1/2) Epoch 2, validation: loss=0.2432, ctc_loss=0.1092, cr_loss=4.796e-15, attn_decoder_loss=0.2581, over 944034.00 frames. 2024-09-16 15:52:20,638 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 15:52:22,644 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:52:47,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2024-09-16 15:53:27,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30260.0, ans=0.1 2024-09-16 15:53:34,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=30260.0, ans=0.025 2024-09-16 15:53:42,120 INFO [train.py:1198] (1/2) Epoch 2, batch 3050, loss[loss=0.3155, ctc_loss=0.2735, cr_loss=0.4109, attn_decoder_loss=0.3111, over 29555.00 frames. ], tot_loss[loss=0.3373, ctc_loss=0.2938, cr_loss=0.4457, attn_decoder_loss=0.3322, over 5777076.91 frames. ], batch size: 76, lr: 3.70e-02, grad_scale: 16.0 2024-09-16 15:53:43,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2024-09-16 15:53:44,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2024-09-16 15:54:03,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=30340.0, ans=0.0 2024-09-16 15:54:09,615 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:54:25,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2024-09-16 15:54:41,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=30460.0, ans=0.2 2024-09-16 15:54:44,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.365e+02 1.556e+02 1.852e+02 9.980e+02, threshold=3.113e+02, percent-clipped=5.0 2024-09-16 15:54:50,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=30460.0, ans=0.125 2024-09-16 15:54:57,987 INFO [train.py:1198] (1/2) Epoch 2, batch 3100, loss[loss=0.3651, ctc_loss=0.3356, cr_loss=0.4895, attn_decoder_loss=0.3575, over 29250.00 frames. ], tot_loss[loss=0.3365, ctc_loss=0.2928, cr_loss=0.4451, attn_decoder_loss=0.3315, over 5776473.54 frames. ], batch size: 100, lr: 3.69e-02, grad_scale: 16.0 2024-09-16 15:55:01,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-09-16 15:55:16,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=30540.0, ans=0.125 2024-09-16 15:55:34,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=30580.0, ans=0.125 2024-09-16 15:55:56,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=30620.0, ans=0.125 2024-09-16 15:56:00,779 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:56:14,180 INFO [train.py:1198] (1/2) Epoch 2, batch 3150, loss[loss=0.3792, ctc_loss=0.3463, cr_loss=0.5109, attn_decoder_loss=0.3715, over 28914.00 frames. ], tot_loss[loss=0.3367, ctc_loss=0.2928, cr_loss=0.4461, attn_decoder_loss=0.3317, over 5782851.00 frames. ], batch size: 104, lr: 3.69e-02, grad_scale: 16.0 2024-09-16 15:56:17,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=30700.0, ans=0.0 2024-09-16 15:56:42,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=30740.0, ans=0.125 2024-09-16 15:56:56,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30780.0, ans=0.125 2024-09-16 15:57:03,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2024-09-16 15:57:20,638 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.339e+02 1.514e+02 1.730e+02 4.890e+02, threshold=3.027e+02, percent-clipped=3.0 2024-09-16 15:57:34,169 INFO [train.py:1198] (1/2) Epoch 2, batch 3200, loss[loss=0.3152, ctc_loss=0.2584, cr_loss=0.4455, attn_decoder_loss=0.3116, over 29410.00 frames. ], tot_loss[loss=0.3352, ctc_loss=0.2907, cr_loss=0.445, attn_decoder_loss=0.3303, over 5793108.65 frames. ], batch size: 79, lr: 3.68e-02, grad_scale: 32.0 2024-09-16 15:57:38,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.73 vs. limit=15.0 2024-09-16 15:57:59,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.56 vs. limit=15.0 2024-09-16 15:58:08,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=30980.0, ans=0.2 2024-09-16 15:58:29,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.40 vs. limit=15.0 2024-09-16 15:58:44,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.86 vs. limit=15.0 2024-09-16 15:58:49,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.14 vs. limit=12.0 2024-09-16 15:58:50,039 INFO [train.py:1198] (1/2) Epoch 2, batch 3250, loss[loss=0.3449, ctc_loss=0.3031, cr_loss=0.4328, attn_decoder_loss=0.3399, over 29713.00 frames. ], tot_loss[loss=0.3355, ctc_loss=0.2908, cr_loss=0.4452, attn_decoder_loss=0.3305, over 5800257.94 frames. ], batch size: 84, lr: 3.68e-02, grad_scale: 16.0 2024-09-16 15:58:50,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=31100.0, ans=0.125 2024-09-16 15:58:57,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=31100.0, ans=0.125 2024-09-16 15:58:58,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=31100.0, ans=0.125 2024-09-16 15:59:01,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=31100.0, ans=0.025 2024-09-16 15:59:21,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2024-09-16 15:59:21,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31180.0, ans=0.1 2024-09-16 15:59:23,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=31180.0, ans=0.125 2024-09-16 15:59:24,989 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:59:47,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=31220.0, ans=0.125 2024-09-16 15:59:47,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=31220.0, ans=0.125 2024-09-16 15:59:53,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.355e+02 1.599e+02 1.863e+02 1.090e+03, threshold=3.197e+02, percent-clipped=6.0 2024-09-16 16:00:06,053 INFO [train.py:1198] (1/2) Epoch 2, batch 3300, loss[loss=0.3559, ctc_loss=0.3103, cr_loss=0.4832, attn_decoder_loss=0.3503, over 28218.00 frames. ], tot_loss[loss=0.3339, ctc_loss=0.2895, cr_loss=0.4435, attn_decoder_loss=0.3289, over 5797736.23 frames. ], batch size: 111, lr: 3.67e-02, grad_scale: 16.0 2024-09-16 16:00:23,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=31340.0, ans=0.125 2024-09-16 16:00:49,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=31380.0, ans=0.0 2024-09-16 16:01:04,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=31420.0, ans=0.025 2024-09-16 16:01:09,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=31460.0, ans=0.0 2024-09-16 16:01:18,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=31460.0, ans=0.125 2024-09-16 16:01:25,564 INFO [train.py:1198] (1/2) Epoch 2, batch 3350, loss[loss=0.3586, ctc_loss=0.3196, cr_loss=0.4848, attn_decoder_loss=0.3521, over 28968.00 frames. ], tot_loss[loss=0.3352, ctc_loss=0.2915, cr_loss=0.4445, attn_decoder_loss=0.3302, over 5774381.21 frames. ], batch size: 104, lr: 3.66e-02, grad_scale: 16.0 2024-09-16 16:01:27,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=31500.0, ans=0.2 2024-09-16 16:01:37,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.16 vs. limit=22.5 2024-09-16 16:01:51,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=31540.0, ans=0.025 2024-09-16 16:01:54,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=31580.0, ans=0.07 2024-09-16 16:02:18,837 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:02:27,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=12.0 2024-09-16 16:02:28,106 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:02:29,253 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.443e+02 1.604e+02 1.937e+02 5.792e+02, threshold=3.209e+02, percent-clipped=1.0 2024-09-16 16:02:41,560 INFO [train.py:1198] (1/2) Epoch 2, batch 3400, loss[loss=0.3018, ctc_loss=0.2656, cr_loss=0.4082, attn_decoder_loss=0.2968, over 29386.00 frames. ], tot_loss[loss=0.3348, ctc_loss=0.2913, cr_loss=0.4451, attn_decoder_loss=0.3298, over 5766982.06 frames. ], batch size: 67, lr: 3.66e-02, grad_scale: 16.0 2024-09-16 16:03:16,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=31780.0, ans=0.125 2024-09-16 16:03:44,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31860.0, ans=0.1 2024-09-16 16:03:57,317 INFO [train.py:1198] (1/2) Epoch 2, batch 3450, loss[loss=0.359, ctc_loss=0.3109, cr_loss=0.4776, attn_decoder_loss=0.3537, over 28434.00 frames. ], tot_loss[loss=0.3347, ctc_loss=0.2907, cr_loss=0.4452, attn_decoder_loss=0.3297, over 5776026.85 frames. ], batch size: 112, lr: 3.65e-02, grad_scale: 16.0 2024-09-16 16:03:59,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=31900.0, ans=0.2 2024-09-16 16:04:00,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=31900.0, ans=0.1 2024-09-16 16:04:54,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=32020.0, ans=0.125 2024-09-16 16:05:07,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=32060.0, ans=0.125 2024-09-16 16:05:09,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=32060.0, ans=0.1 2024-09-16 16:05:12,126 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.023e+02 1.335e+02 1.514e+02 1.734e+02 4.417e+02, threshold=3.028e+02, percent-clipped=1.0 2024-09-16 16:05:12,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=32060.0, ans=0.025 2024-09-16 16:05:20,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2024-09-16 16:05:24,249 INFO [train.py:1198] (1/2) Epoch 2, batch 3500, loss[loss=0.3093, ctc_loss=0.267, cr_loss=0.4373, attn_decoder_loss=0.3043, over 29735.00 frames. ], tot_loss[loss=0.3336, ctc_loss=0.2895, cr_loss=0.4444, attn_decoder_loss=0.3287, over 5777631.59 frames. ], batch size: 72, lr: 3.65e-02, grad_scale: 16.0 2024-09-16 16:05:29,171 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:05:38,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=32140.0, ans=0.0 2024-09-16 16:05:48,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=32140.0, ans=0.125 2024-09-16 16:05:48,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-09-16 16:05:51,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=32140.0, ans=0.025 2024-09-16 16:06:01,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=32180.0, ans=0.0038739130434782606 2024-09-16 16:06:30,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.39 vs. limit=15.0 2024-09-16 16:06:36,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=32260.0, ans=0.0 2024-09-16 16:06:37,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=32300.0, ans=0.003847826086956522 2024-09-16 16:06:38,936 INFO [train.py:1198] (1/2) Epoch 2, batch 3550, loss[loss=0.3425, ctc_loss=0.288, cr_loss=0.4475, attn_decoder_loss=0.3386, over 29725.00 frames. ], tot_loss[loss=0.3332, ctc_loss=0.2888, cr_loss=0.4447, attn_decoder_loss=0.3282, over 5783918.12 frames. ], batch size: 89, lr: 3.64e-02, grad_scale: 16.0 2024-09-16 16:06:39,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=32300.0, ans=0.125 2024-09-16 16:06:59,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=32340.0, ans=0.0 2024-09-16 16:07:02,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.59 vs. limit=10.0 2024-09-16 16:07:13,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=32380.0, ans=0.0 2024-09-16 16:07:15,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2024-09-16 16:07:35,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=32420.0, ans=0.125 2024-09-16 16:07:41,330 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.383e+02 1.528e+02 1.788e+02 3.393e+02, threshold=3.056e+02, percent-clipped=1.0 2024-09-16 16:07:43,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=32460.0, ans=0.125 2024-09-16 16:07:53,265 INFO [train.py:1198] (1/2) Epoch 2, batch 3600, loss[loss=0.314, ctc_loss=0.2544, cr_loss=0.4339, attn_decoder_loss=0.311, over 29494.00 frames. ], tot_loss[loss=0.3328, ctc_loss=0.2879, cr_loss=0.4445, attn_decoder_loss=0.3279, over 5793232.06 frames. ], batch size: 77, lr: 3.63e-02, grad_scale: 32.0 2024-09-16 16:07:55,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=32500.0, ans=0.0 2024-09-16 16:08:35,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=12.0 2024-09-16 16:08:36,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=32620.0, ans=0.125 2024-09-16 16:08:38,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=32620.0, ans=22.5 2024-09-16 16:09:07,512 INFO [train.py:1198] (1/2) Epoch 2, batch 3650, loss[loss=0.3269, ctc_loss=0.2778, cr_loss=0.4263, attn_decoder_loss=0.3229, over 29489.00 frames. ], tot_loss[loss=0.3318, ctc_loss=0.2864, cr_loss=0.4436, attn_decoder_loss=0.3269, over 5794561.94 frames. ], batch size: 90, lr: 3.63e-02, grad_scale: 16.0 2024-09-16 16:09:13,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=32700.0, ans=0.003760869565217391 2024-09-16 16:09:21,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=32740.0, ans=0.0 2024-09-16 16:09:22,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=32740.0, ans=0.2 2024-09-16 16:09:24,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=32740.0, ans=0.1 2024-09-16 16:09:30,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-09-16 16:09:34,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=32740.0, ans=0.0 2024-09-16 16:09:46,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=12.0 2024-09-16 16:10:12,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.348e+02 1.495e+02 1.801e+02 3.465e+02, threshold=2.990e+02, percent-clipped=2.0 2024-09-16 16:10:21,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=32860.0, ans=0.0037260869565217385 2024-09-16 16:10:22,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=22.5 2024-09-16 16:10:24,624 INFO [train.py:1198] (1/2) Epoch 2, batch 3700, loss[loss=0.3244, ctc_loss=0.2798, cr_loss=0.4376, attn_decoder_loss=0.3196, over 29715.00 frames. ], tot_loss[loss=0.3318, ctc_loss=0.2863, cr_loss=0.4439, attn_decoder_loss=0.3269, over 5804064.36 frames. ], batch size: 84, lr: 3.62e-02, grad_scale: 16.0 2024-09-16 16:10:32,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=32900.0, ans=0.125 2024-09-16 16:10:32,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=32900.0, ans=0.125 2024-09-16 16:10:37,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2024-09-16 16:10:46,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-09-16 16:10:52,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2024-09-16 16:10:55,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=32980.0, ans=0.125 2024-09-16 16:11:06,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=32980.0, ans=0.125 2024-09-16 16:11:06,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=32980.0, ans=0.0 2024-09-16 16:11:11,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=33020.0, ans=0.0 2024-09-16 16:11:27,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=33060.0, ans=0.0036826086956521734 2024-09-16 16:11:38,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=33060.0, ans=0.125 2024-09-16 16:11:41,226 INFO [train.py:1198] (1/2) Epoch 2, batch 3750, loss[loss=0.2932, ctc_loss=0.2459, cr_loss=0.4126, attn_decoder_loss=0.2892, over 29349.00 frames. ], tot_loss[loss=0.332, ctc_loss=0.2866, cr_loss=0.4443, attn_decoder_loss=0.3271, over 5808460.73 frames. ], batch size: 67, lr: 3.62e-02, grad_scale: 16.0 2024-09-16 16:11:53,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=33100.0, ans=0.1 2024-09-16 16:12:17,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=33180.0, ans=0.025 2024-09-16 16:12:17,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=33180.0, ans=0.125 2024-09-16 16:12:22,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33180.0, ans=0.1 2024-09-16 16:12:23,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=33180.0, ans=0.125 2024-09-16 16:12:24,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=33220.0, ans=0.0 2024-09-16 16:12:38,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=33220.0, ans=0.0 2024-09-16 16:12:38,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=33220.0, ans=0.1 2024-09-16 16:12:40,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.65 vs. limit=15.0 2024-09-16 16:12:45,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=22.5 2024-09-16 16:12:45,645 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.433e+02 1.658e+02 2.025e+02 1.075e+03, threshold=3.317e+02, percent-clipped=10.0 2024-09-16 16:12:47,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=33260.0, ans=0.0036391304347826083 2024-09-16 16:12:55,953 INFO [train.py:1198] (1/2) Epoch 2, batch 3800, loss[loss=0.3508, ctc_loss=0.3026, cr_loss=0.4724, attn_decoder_loss=0.3457, over 29643.00 frames. ], tot_loss[loss=0.3314, ctc_loss=0.2861, cr_loss=0.4434, attn_decoder_loss=0.3265, over 5798032.24 frames. ], batch size: 86, lr: 3.61e-02, grad_scale: 16.0 2024-09-16 16:13:00,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=33300.0, ans=0.125 2024-09-16 16:13:08,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=33300.0, ans=0.125 2024-09-16 16:13:26,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=33380.0, ans=0.125 2024-09-16 16:13:45,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33420.0, ans=0.1 2024-09-16 16:13:46,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2024-09-16 16:13:50,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=33420.0, ans=0.0 2024-09-16 16:13:51,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=33420.0, ans=0.0036043478260869566 2024-09-16 16:13:52,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.22 vs. limit=22.5 2024-09-16 16:13:56,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2024-09-16 16:14:04,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=33460.0, ans=0.125 2024-09-16 16:14:10,550 INFO [train.py:1198] (1/2) Epoch 2, batch 3850, loss[loss=0.3526, ctc_loss=0.3082, cr_loss=0.4747, attn_decoder_loss=0.347, over 29294.00 frames. ], tot_loss[loss=0.3307, ctc_loss=0.2849, cr_loss=0.4432, attn_decoder_loss=0.3259, over 5811901.25 frames. ], batch size: 100, lr: 3.60e-02, grad_scale: 16.0 2024-09-16 16:14:23,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-09-16 16:14:38,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=33580.0, ans=0.09899494936611666 2024-09-16 16:14:40,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=33580.0, ans=0.125 2024-09-16 16:14:42,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2024-09-16 16:14:46,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=33580.0, ans=0.125 2024-09-16 16:14:48,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33580.0, ans=0.1 2024-09-16 16:15:01,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33620.0, ans=0.1 2024-09-16 16:15:14,725 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.336e+02 1.556e+02 1.830e+02 4.264e+02, threshold=3.112e+02, percent-clipped=3.0 2024-09-16 16:15:22,660 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.523e-02 2024-09-16 16:15:25,320 INFO [train.py:1198] (1/2) Epoch 2, batch 3900, loss[loss=0.3386, ctc_loss=0.2965, cr_loss=0.4547, attn_decoder_loss=0.3332, over 29639.00 frames. ], tot_loss[loss=0.3318, ctc_loss=0.2857, cr_loss=0.4449, attn_decoder_loss=0.327, over 5815510.86 frames. ], batch size: 86, lr: 3.60e-02, grad_scale: 16.0 2024-09-16 16:15:30,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2024-09-16 16:15:39,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=33740.0, ans=0.0035347826086956514 2024-09-16 16:16:14,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=33820.0, ans=0.125 2024-09-16 16:16:27,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=33860.0, ans=22.5 2024-09-16 16:16:27,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2024-09-16 16:16:42,727 INFO [train.py:1198] (1/2) Epoch 2, batch 3950, loss[loss=0.3411, ctc_loss=0.2886, cr_loss=0.4189, attn_decoder_loss=0.3376, over 29547.00 frames. ], tot_loss[loss=0.3303, ctc_loss=0.2834, cr_loss=0.4436, attn_decoder_loss=0.3257, over 5835103.72 frames. ], batch size: 97, lr: 3.59e-02, grad_scale: 16.0 2024-09-16 16:17:14,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=33980.0, ans=0.2 2024-09-16 16:17:18,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=33980.0, ans=0.125 2024-09-16 16:17:46,405 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.360e+02 1.544e+02 1.951e+02 4.705e+02, threshold=3.088e+02, percent-clipped=4.0 2024-09-16 16:17:53,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=12.0 2024-09-16 16:17:57,003 INFO [train.py:1198] (1/2) Epoch 2, batch 4000, loss[loss=0.3169, ctc_loss=0.2697, cr_loss=0.4124, attn_decoder_loss=0.313, over 29512.00 frames. ], tot_loss[loss=0.3307, ctc_loss=0.2839, cr_loss=0.444, attn_decoder_loss=0.326, over 5812788.35 frames. ], batch size: 74, lr: 3.59e-02, grad_scale: 32.0 2024-09-16 16:18:06,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.24 vs. limit=10.0 2024-09-16 16:18:11,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34140.0, ans=0.1 2024-09-16 16:18:11,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=34140.0, ans=0.125 2024-09-16 16:18:31,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=34180.0, ans=0.2 2024-09-16 16:18:41,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=34220.0, ans=0.125 2024-09-16 16:18:44,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=34220.0, ans=0.003430434782608696 2024-09-16 16:18:47,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=34220.0, ans=0.07 2024-09-16 16:18:57,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=34260.0, ans=0.025 2024-09-16 16:18:58,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=34260.0, ans=0.025 2024-09-16 16:18:59,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=34260.0, ans=0.125 2024-09-16 16:19:11,322 INFO [train.py:1198] (1/2) Epoch 2, batch 4050, loss[loss=0.3847, ctc_loss=0.3656, cr_loss=0.423, attn_decoder_loss=0.3774, over 19870.00 frames. ], tot_loss[loss=0.3312, ctc_loss=0.2846, cr_loss=0.444, attn_decoder_loss=0.3265, over 5795834.04 frames. ], batch size: 210, lr: 3.58e-02, grad_scale: 16.0 2024-09-16 16:19:26,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.29 vs. limit=22.5 2024-09-16 16:19:27,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=34340.0, ans=0.2 2024-09-16 16:19:43,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=34380.0, ans=0.0 2024-09-16 16:19:57,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=34420.0, ans=0.1 2024-09-16 16:20:05,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34420.0, ans=0.1 2024-09-16 16:20:13,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=22.5 2024-09-16 16:20:13,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=15.0 2024-09-16 16:20:16,504 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.473e+02 1.673e+02 1.934e+02 5.199e+02, threshold=3.345e+02, percent-clipped=3.0 2024-09-16 16:20:22,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=34460.0, ans=0.0 2024-09-16 16:20:22,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=34460.0, ans=0.05 2024-09-16 16:20:26,773 INFO [train.py:1198] (1/2) Epoch 2, batch 4100, loss[loss=0.3489, ctc_loss=0.3027, cr_loss=0.4398, attn_decoder_loss=0.3443, over 29506.00 frames. ], tot_loss[loss=0.3318, ctc_loss=0.2856, cr_loss=0.4451, attn_decoder_loss=0.3271, over 5791596.98 frames. ], batch size: 90, lr: 3.57e-02, grad_scale: 16.0 2024-09-16 16:20:30,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=34500.0, ans=0.2 2024-09-16 16:20:49,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=34540.0, ans=0.0 2024-09-16 16:21:25,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=34620.0, ans=0.125 2024-09-16 16:21:42,419 INFO [train.py:1198] (1/2) Epoch 2, batch 4150, loss[loss=0.3116, ctc_loss=0.2603, cr_loss=0.4381, attn_decoder_loss=0.3076, over 29513.00 frames. ], tot_loss[loss=0.3308, ctc_loss=0.2842, cr_loss=0.4442, attn_decoder_loss=0.3262, over 5797641.26 frames. ], batch size: 77, lr: 3.57e-02, grad_scale: 8.0 2024-09-16 16:21:50,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=34700.0, ans=0.125 2024-09-16 16:21:51,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=34700.0, ans=0.125 2024-09-16 16:22:16,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=34780.0, ans=0.0033086956521739133 2024-09-16 16:22:17,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=15.0 2024-09-16 16:22:25,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=34820.0, ans=0.125 2024-09-16 16:22:32,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=34820.0, ans=0.125 2024-09-16 16:22:43,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.16 vs. limit=15.0 2024-09-16 16:22:48,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.357e+02 1.525e+02 1.720e+02 3.077e+02, threshold=3.049e+02, percent-clipped=0.0 2024-09-16 16:22:49,661 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=6.20 vs. limit=12.0 2024-09-16 16:22:56,138 INFO [train.py:1198] (1/2) Epoch 2, batch 4200, loss[loss=0.3577, ctc_loss=0.3151, cr_loss=0.4818, attn_decoder_loss=0.3518, over 29499.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2843, cr_loss=0.4455, attn_decoder_loss=0.3263, over 5798440.52 frames. ], batch size: 90, lr: 3.56e-02, grad_scale: 8.0 2024-09-16 16:22:59,330 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:23:12,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=34940.0, ans=0.125 2024-09-16 16:23:26,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=34980.0, ans=0.2 2024-09-16 16:23:27,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2024-09-16 16:23:48,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=35020.0, ans=0.125 2024-09-16 16:23:59,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=35060.0, ans=0.125 2024-09-16 16:24:05,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=35060.0, ans=0.125 2024-09-16 16:24:09,853 INFO [train.py:1198] (1/2) Epoch 2, batch 4250, loss[loss=0.2999, ctc_loss=0.2409, cr_loss=0.4033, attn_decoder_loss=0.2975, over 29507.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2841, cr_loss=0.4452, attn_decoder_loss=0.3263, over 5804806.25 frames. ], batch size: 74, lr: 3.56e-02, grad_scale: 8.0 2024-09-16 16:24:11,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=35100.0, ans=0.003239130434782609 2024-09-16 16:24:13,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.37 vs. limit=10.0 2024-09-16 16:24:44,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=35180.0, ans=0.125 2024-09-16 16:24:45,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=35180.0, ans=0.2 2024-09-16 16:24:47,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=35180.0, ans=0.125 2024-09-16 16:25:18,428 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.431e+02 1.619e+02 1.873e+02 2.888e+02, threshold=3.237e+02, percent-clipped=0.0 2024-09-16 16:25:25,687 INFO [train.py:1198] (1/2) Epoch 2, batch 4300, loss[loss=0.3503, ctc_loss=0.3008, cr_loss=0.4466, attn_decoder_loss=0.3459, over 29540.00 frames. ], tot_loss[loss=0.3312, ctc_loss=0.2841, cr_loss=0.4451, attn_decoder_loss=0.3266, over 5793735.29 frames. ], batch size: 87, lr: 3.55e-02, grad_scale: 8.0 2024-09-16 16:25:44,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2024-09-16 16:26:06,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-09-16 16:26:10,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35420.0, ans=0.125 2024-09-16 16:26:33,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35460.0, ans=0.125 2024-09-16 16:26:40,293 INFO [train.py:1198] (1/2) Epoch 2, batch 4350, loss[loss=0.3503, ctc_loss=0.313, cr_loss=0.4738, attn_decoder_loss=0.3439, over 29501.00 frames. ], tot_loss[loss=0.335, ctc_loss=0.2875, cr_loss=0.4492, attn_decoder_loss=0.3303, over 5796610.69 frames. ], batch size: 97, lr: 3.54e-02, grad_scale: 8.0 2024-09-16 16:27:15,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2024-09-16 16:27:39,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=35660.0, ans=0.025 2024-09-16 16:27:40,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.29 vs. limit=15.0 2024-09-16 16:27:45,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=35660.0, ans=0.0 2024-09-16 16:27:48,087 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.185e+02 1.425e+02 1.627e+02 1.817e+02 2.716e+02, threshold=3.254e+02, percent-clipped=0.0 2024-09-16 16:27:48,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=35660.0, ans=0.003117391304347826 2024-09-16 16:27:53,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.54 vs. limit=22.5 2024-09-16 16:27:55,405 INFO [train.py:1198] (1/2) Epoch 2, batch 4400, loss[loss=0.3593, ctc_loss=0.3163, cr_loss=0.4836, attn_decoder_loss=0.3533, over 27470.00 frames. ], tot_loss[loss=0.3376, ctc_loss=0.2905, cr_loss=0.4515, attn_decoder_loss=0.3328, over 5765353.49 frames. ], batch size: 125, lr: 3.54e-02, grad_scale: 16.0 2024-09-16 16:28:00,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=35700.0, ans=0.125 2024-09-16 16:28:03,152 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:28:08,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=35700.0, ans=15.0 2024-09-16 16:28:18,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2024-09-16 16:28:28,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=35780.0, ans=0.125 2024-09-16 16:28:48,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=35820.0, ans=0.1 2024-09-16 16:29:09,910 INFO [train.py:1198] (1/2) Epoch 2, batch 4450, loss[loss=0.3594, ctc_loss=0.3379, cr_loss=0.4476, attn_decoder_loss=0.3519, over 19933.00 frames. ], tot_loss[loss=0.342, ctc_loss=0.2983, cr_loss=0.4539, attn_decoder_loss=0.3367, over 5569479.18 frames. ], batch size: 210, lr: 3.53e-02, grad_scale: 16.0 2024-09-16 16:29:13,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=35900.0, ans=0.125 2024-09-16 16:29:13,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=35900.0, ans=0.125 2024-09-16 16:29:16,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=35900.0, ans=0.125 2024-09-16 16:29:19,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=35900.0, ans=0.0 2024-09-16 16:29:31,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.01 vs. limit=22.5 2024-09-16 16:29:42,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=35980.0, ans=0.04949747468305833 2024-09-16 16:29:57,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=36020.0, ans=0.2 2024-09-16 16:30:02,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2024-09-16 16:30:08,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=36020.0, ans=0.125 2024-09-16 16:30:08,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=12.0 2024-09-16 16:30:18,622 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.305e+02 1.463e+02 1.734e+02 4.707e+02, threshold=2.926e+02, percent-clipped=2.0 2024-09-16 16:30:25,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=36100.0, ans=0.0030217391304347826 2024-09-16 16:30:26,377 INFO [train.py:1198] (1/2) Epoch 2, batch 4500, loss[loss=0.3582, ctc_loss=0.3385, cr_loss=0.4478, attn_decoder_loss=0.3505, over 20147.00 frames. ], tot_loss[loss=0.3463, ctc_loss=0.3076, cr_loss=0.4533, attn_decoder_loss=0.3405, over 5231069.68 frames. ], batch size: 210, lr: 3.53e-02, grad_scale: 16.0 2024-09-16 16:30:28,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2024-09-16 16:30:28,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.72 vs. limit=15.0 2024-09-16 16:30:31,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=36100.0, ans=0.1 2024-09-16 16:30:34,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=36100.0, ans=0.1 2024-09-16 16:30:37,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2024-09-16 16:30:52,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36140.0, ans=0.1 2024-09-16 16:30:53,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=36140.0, ans=0.125 2024-09-16 16:31:33,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=36200.0, ans=0.125 2024-09-16 16:31:58,162 INFO [train.py:1198] (1/2) Epoch 3, batch 0, loss[loss=0.3932, ctc_loss=0.2509, cr_loss=0.4093, attn_decoder_loss=0.4, over 29596.00 frames. ], tot_loss[loss=0.3932, ctc_loss=0.2509, cr_loss=0.4093, attn_decoder_loss=0.4, over 29596.00 frames. ], batch size: 73, lr: 3.35e-02, grad_scale: 8.0 2024-09-16 16:31:58,163 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 16:32:16,468 INFO [train.py:1230] (1/2) Epoch 3, validation: loss=0.2699, ctc_loss=0.1122, cr_loss=5.059e-15, attn_decoder_loss=0.2874, over 944034.00 frames. 2024-09-16 16:32:16,468 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 16:32:17,921 WARNING [optim.py:503] (1/2) Scaling gradients by 0.08523014932870865, model_norm_threshold=292.6158752441406 2024-09-16 16:32:18,132 WARNING [optim.py:575] (1/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.1.norm_self_attn.weight with proportion 0.29, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.447e+06, grad_sumsq=2.900e+09, orig_rms_sq=1.188e-03 2024-09-16 16:32:20,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=36200.0, ans=0.2 2024-09-16 16:32:25,521 WARNING [optim.py:503] (1/2) Scaling gradients by 0.08528286218643188, model_norm_threshold=292.6158752441406 2024-09-16 16:32:25,724 WARNING [optim.py:575] (1/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.0.norm_self_attn.weight with proportion 0.56, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.615e+06, grad_sumsq=1.664e+09, orig_rms_sq=3.977e-03 2024-09-16 16:32:27,307 WARNING [optim.py:503] (1/2) Scaling gradients by 0.07857576757669449, model_norm_threshold=292.6158752441406 2024-09-16 16:32:27,512 WARNING [optim.py:575] (1/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.0.norm_self_attn.weight with proportion 0.54, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.424e+06, grad_sumsq=1.867e+09, orig_rms_sq=3.977e-03 2024-09-16 16:32:52,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.52 vs. limit=15.0 2024-09-16 16:32:57,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=36280.0, ans=0.2 2024-09-16 16:32:57,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.50 vs. limit=15.0 2024-09-16 16:33:11,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-09-16 16:33:21,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=36360.0, ans=0.125 2024-09-16 16:33:23,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=36360.0, ans=0.035 2024-09-16 16:33:35,002 INFO [train.py:1198] (1/2) Epoch 3, batch 50, loss[loss=0.3026, ctc_loss=0.2566, cr_loss=0.4291, attn_decoder_loss=0.2982, over 29442.00 frames. ], tot_loss[loss=0.346, ctc_loss=0.2958, cr_loss=0.4521, attn_decoder_loss=0.3415, over 1268466.52 frames. ], batch size: 70, lr: 3.34e-02, grad_scale: 8.0 2024-09-16 16:34:07,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=36480.0, ans=0.2 2024-09-16 16:34:08,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.045e+02 1.388e+02 1.745e+02 2.275e+02 3.724e+03, threshold=3.490e+02, percent-clipped=16.0 2024-09-16 16:34:10,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=36480.0, ans=0.2 2024-09-16 16:34:19,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2024-09-16 16:34:50,236 INFO [train.py:1198] (1/2) Epoch 3, batch 100, loss[loss=0.3142, ctc_loss=0.2694, cr_loss=0.4274, attn_decoder_loss=0.3097, over 29505.00 frames. ], tot_loss[loss=0.3417, ctc_loss=0.2936, cr_loss=0.4528, attn_decoder_loss=0.3369, over 2252823.06 frames. ], batch size: 76, lr: 3.34e-02, grad_scale: 8.0 2024-09-16 16:35:47,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=36720.0, ans=0.125 2024-09-16 16:35:55,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.30 vs. limit=15.0 2024-09-16 16:36:04,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36760.0, ans=0.125 2024-09-16 16:36:07,251 INFO [train.py:1198] (1/2) Epoch 3, batch 150, loss[loss=0.306, ctc_loss=0.2685, cr_loss=0.431, attn_decoder_loss=0.3006, over 29435.00 frames. ], tot_loss[loss=0.3351, ctc_loss=0.2872, cr_loss=0.449, attn_decoder_loss=0.3304, over 3048253.59 frames. ], batch size: 70, lr: 3.33e-02, grad_scale: 8.0 2024-09-16 16:36:13,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36800.0, ans=0.1 2024-09-16 16:36:17,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2024-09-16 16:36:22,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=36840.0, ans=0.125 2024-09-16 16:36:28,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=36840.0, ans=0.125 2024-09-16 16:36:35,362 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:36:38,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=36880.0, ans=0.2 2024-09-16 16:36:42,364 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.372e+02 1.536e+02 1.787e+02 3.735e+02, threshold=3.071e+02, percent-clipped=1.0 2024-09-16 16:36:48,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=36880.0, ans=0.0 2024-09-16 16:37:05,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=36920.0, ans=0.025 2024-09-16 16:37:06,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=36920.0, ans=0.125 2024-09-16 16:37:07,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=36960.0, ans=0.025 2024-09-16 16:37:12,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=36960.0, ans=0.2 2024-09-16 16:37:24,139 INFO [train.py:1198] (1/2) Epoch 3, batch 200, loss[loss=0.3552, ctc_loss=0.308, cr_loss=0.4616, attn_decoder_loss=0.3502, over 27573.00 frames. ], tot_loss[loss=0.3307, ctc_loss=0.2824, cr_loss=0.4452, attn_decoder_loss=0.3262, over 3659883.73 frames. ], batch size: 124, lr: 3.33e-02, grad_scale: 8.0 2024-09-16 16:37:25,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=37000.0, ans=0.0 2024-09-16 16:37:55,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.22 vs. limit=15.0 2024-09-16 16:38:17,229 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:38:17,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=37120.0, ans=0.0028000000000000004 2024-09-16 16:38:39,572 INFO [train.py:1198] (1/2) Epoch 3, batch 250, loss[loss=0.3593, ctc_loss=0.3148, cr_loss=0.4683, attn_decoder_loss=0.3538, over 29197.00 frames. ], tot_loss[loss=0.3292, ctc_loss=0.2803, cr_loss=0.4449, attn_decoder_loss=0.3248, over 4142166.10 frames. ], batch size: 100, lr: 3.32e-02, grad_scale: 8.0 2024-09-16 16:38:52,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=37200.0, ans=0.125 2024-09-16 16:39:05,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.01 vs. limit=15.0 2024-09-16 16:39:15,191 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.348e+02 1.507e+02 1.717e+02 3.533e+02, threshold=3.014e+02, percent-clipped=1.0 2024-09-16 16:39:26,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-09-16 16:39:27,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=37320.0, ans=0.0 2024-09-16 16:39:32,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=37320.0, ans=0.04949747468305833 2024-09-16 16:39:40,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2024-09-16 16:39:48,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=37360.0, ans=0.125 2024-09-16 16:39:57,443 INFO [train.py:1198] (1/2) Epoch 3, batch 300, loss[loss=0.3387, ctc_loss=0.2865, cr_loss=0.4496, attn_decoder_loss=0.3346, over 29488.00 frames. ], tot_loss[loss=0.3285, ctc_loss=0.2795, cr_loss=0.4442, attn_decoder_loss=0.3241, over 4509202.51 frames. ], batch size: 92, lr: 3.32e-02, grad_scale: 8.0 2024-09-16 16:40:21,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37440.0, ans=0.1 2024-09-16 16:40:31,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=37480.0, ans=0.0 2024-09-16 16:41:16,155 INFO [train.py:1198] (1/2) Epoch 3, batch 350, loss[loss=0.2968, ctc_loss=0.2547, cr_loss=0.4052, attn_decoder_loss=0.2925, over 29312.00 frames. ], tot_loss[loss=0.3279, ctc_loss=0.2786, cr_loss=0.4446, attn_decoder_loss=0.3235, over 4795309.19 frames. ], batch size: 71, lr: 3.31e-02, grad_scale: 8.0 2024-09-16 16:41:42,205 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:41:42,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.26 vs. limit=15.0 2024-09-16 16:41:42,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.86 vs. limit=15.0 2024-09-16 16:41:49,290 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.316e+02 1.494e+02 1.817e+02 5.633e+02, threshold=2.988e+02, percent-clipped=5.0 2024-09-16 16:41:49,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=37680.0, ans=0.2 2024-09-16 16:41:59,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-09-16 16:42:01,961 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:42:06,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=37720.0, ans=0.125 2024-09-16 16:42:31,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=12.0 2024-09-16 16:42:32,039 INFO [train.py:1198] (1/2) Epoch 3, batch 400, loss[loss=0.3226, ctc_loss=0.2712, cr_loss=0.4241, attn_decoder_loss=0.3188, over 29683.00 frames. ], tot_loss[loss=0.3266, ctc_loss=0.2768, cr_loss=0.444, attn_decoder_loss=0.3222, over 5025279.16 frames. ], batch size: 82, lr: 3.31e-02, grad_scale: 16.0 2024-09-16 16:42:42,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=37800.0, ans=0.125 2024-09-16 16:42:53,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-09-16 16:43:07,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.16 vs. limit=22.5 2024-09-16 16:43:08,378 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:43:15,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=37880.0, ans=0.04949747468305833 2024-09-16 16:43:19,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-09-16 16:43:24,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37920.0, ans=0.1 2024-09-16 16:43:34,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=37960.0, ans=0.2 2024-09-16 16:43:35,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=37960.0, ans=0.125 2024-09-16 16:43:44,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=37960.0, ans=0.125 2024-09-16 16:43:45,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=37960.0, ans=0.125 2024-09-16 16:43:50,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2024-09-16 16:43:50,672 INFO [train.py:1198] (1/2) Epoch 3, batch 450, loss[loss=0.3328, ctc_loss=0.2741, cr_loss=0.4321, attn_decoder_loss=0.3298, over 29695.00 frames. ], tot_loss[loss=0.3266, ctc_loss=0.2766, cr_loss=0.4442, attn_decoder_loss=0.3223, over 5188647.04 frames. ], batch size: 83, lr: 3.30e-02, grad_scale: 8.0 2024-09-16 16:43:57,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=38000.0, ans=0.2 2024-09-16 16:44:02,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.70 vs. limit=15.0 2024-09-16 16:44:12,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=38040.0, ans=0.0026 2024-09-16 16:44:25,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.316e+02 1.463e+02 1.797e+02 4.950e+02, threshold=2.926e+02, percent-clipped=3.0 2024-09-16 16:44:27,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=38080.0, ans=0.1 2024-09-16 16:44:28,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=38080.0, ans=0.05 2024-09-16 16:44:35,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=38080.0, ans=0.125 2024-09-16 16:44:45,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=15.0 2024-09-16 16:44:52,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=38160.0, ans=0.07 2024-09-16 16:44:58,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=38160.0, ans=0.125 2024-09-16 16:45:01,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=38160.0, ans=0.0 2024-09-16 16:45:06,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=38160.0, ans=0.2 2024-09-16 16:45:09,106 INFO [train.py:1198] (1/2) Epoch 3, batch 500, loss[loss=0.3336, ctc_loss=0.2773, cr_loss=0.4691, attn_decoder_loss=0.3294, over 29459.00 frames. ], tot_loss[loss=0.325, ctc_loss=0.2745, cr_loss=0.4427, attn_decoder_loss=0.3208, over 5330429.97 frames. ], batch size: 94, lr: 3.30e-02, grad_scale: 8.0 2024-09-16 16:45:10,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=38200.0, ans=0.2 2024-09-16 16:45:15,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=12.0 2024-09-16 16:45:20,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=38200.0, ans=0.125 2024-09-16 16:45:37,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.70 vs. limit=15.0 2024-09-16 16:45:51,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=38280.0, ans=0.125 2024-09-16 16:46:07,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-09-16 16:46:25,200 INFO [train.py:1198] (1/2) Epoch 3, batch 550, loss[loss=0.3505, ctc_loss=0.307, cr_loss=0.4624, attn_decoder_loss=0.345, over 28722.00 frames. ], tot_loss[loss=0.325, ctc_loss=0.2745, cr_loss=0.4433, attn_decoder_loss=0.3208, over 5423782.10 frames. ], batch size: 104, lr: 3.29e-02, grad_scale: 8.0 2024-09-16 16:46:34,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.99 vs. limit=15.0 2024-09-16 16:46:39,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=38400.0, ans=0.07 2024-09-16 16:47:02,145 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.886e+01 1.383e+02 1.615e+02 1.876e+02 3.927e+02, threshold=3.230e+02, percent-clipped=4.0 2024-09-16 16:47:33,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=38560.0, ans=0.025 2024-09-16 16:47:34,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=38560.0, ans=0.125 2024-09-16 16:47:43,869 INFO [train.py:1198] (1/2) Epoch 3, batch 600, loss[loss=0.352, ctc_loss=0.3004, cr_loss=0.4896, attn_decoder_loss=0.3469, over 29275.00 frames. ], tot_loss[loss=0.3253, ctc_loss=0.2746, cr_loss=0.4444, attn_decoder_loss=0.321, over 5511214.39 frames. ], batch size: 100, lr: 3.28e-02, grad_scale: 8.0 2024-09-16 16:47:56,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=38600.0, ans=0.0 2024-09-16 16:47:59,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.69 vs. limit=15.0 2024-09-16 16:48:00,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=38640.0, ans=0.0 2024-09-16 16:48:24,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=38680.0, ans=0.125 2024-09-16 16:48:38,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=38720.0, ans=0.0 2024-09-16 16:49:02,379 INFO [train.py:1198] (1/2) Epoch 3, batch 650, loss[loss=0.3254, ctc_loss=0.2729, cr_loss=0.4637, attn_decoder_loss=0.321, over 29753.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2722, cr_loss=0.4428, attn_decoder_loss=0.3191, over 5587936.34 frames. ], batch size: 81, lr: 3.28e-02, grad_scale: 8.0 2024-09-16 16:49:11,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=38800.0, ans=0.2 2024-09-16 16:49:27,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2024-09-16 16:49:37,337 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.937e+01 1.311e+02 1.474e+02 1.676e+02 3.343e+02, threshold=2.947e+02, percent-clipped=2.0 2024-09-16 16:49:47,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2024-09-16 16:49:57,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=38920.0, ans=0.1 2024-09-16 16:50:07,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=38960.0, ans=0.125 2024-09-16 16:50:18,166 INFO [train.py:1198] (1/2) Epoch 3, batch 700, loss[loss=0.3084, ctc_loss=0.2504, cr_loss=0.4225, attn_decoder_loss=0.3054, over 29534.00 frames. ], tot_loss[loss=0.324, ctc_loss=0.2728, cr_loss=0.444, attn_decoder_loss=0.3198, over 5637638.84 frames. ], batch size: 76, lr: 3.27e-02, grad_scale: 8.0 2024-09-16 16:50:31,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=39000.0, ans=0.125 2024-09-16 16:50:46,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=39040.0, ans=0.07 2024-09-16 16:51:35,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2024-09-16 16:51:36,713 INFO [train.py:1198] (1/2) Epoch 3, batch 750, loss[loss=0.3334, ctc_loss=0.2752, cr_loss=0.4415, attn_decoder_loss=0.33, over 29691.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2719, cr_loss=0.4433, attn_decoder_loss=0.3191, over 5676305.26 frames. ], batch size: 82, lr: 3.27e-02, grad_scale: 8.0 2024-09-16 16:51:48,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2024-09-16 16:52:02,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=39240.0, ans=0.2 2024-09-16 16:52:11,484 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.529e+01 1.527e+02 1.781e+02 2.064e+02 4.131e+02, threshold=3.563e+02, percent-clipped=5.0 2024-09-16 16:52:13,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=39280.0, ans=0.002330434782608696 2024-09-16 16:52:32,447 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:52:40,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.93 vs. limit=22.5 2024-09-16 16:52:51,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=39360.0, ans=22.5 2024-09-16 16:52:54,785 INFO [train.py:1198] (1/2) Epoch 3, batch 800, loss[loss=0.2994, ctc_loss=0.243, cr_loss=0.4174, attn_decoder_loss=0.2964, over 29613.00 frames. ], tot_loss[loss=0.3235, ctc_loss=0.2724, cr_loss=0.4439, attn_decoder_loss=0.3194, over 5706456.91 frames. ], batch size: 73, lr: 3.26e-02, grad_scale: 16.0 2024-09-16 16:52:58,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=39400.0, ans=0.0 2024-09-16 16:53:39,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=39520.0, ans=0.2 2024-09-16 16:53:43,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=39520.0, ans=0.125 2024-09-16 16:54:10,344 INFO [train.py:1198] (1/2) Epoch 3, batch 850, loss[loss=0.3255, ctc_loss=0.2662, cr_loss=0.4421, attn_decoder_loss=0.3223, over 29681.00 frames. ], tot_loss[loss=0.3223, ctc_loss=0.271, cr_loss=0.4429, attn_decoder_loss=0.3182, over 5736188.93 frames. ], batch size: 89, lr: 3.26e-02, grad_scale: 8.0 2024-09-16 16:54:30,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=39640.0, ans=0.0022521739130434773 2024-09-16 16:54:48,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.307e+02 1.480e+02 1.661e+02 7.090e+02, threshold=2.960e+02, percent-clipped=1.0 2024-09-16 16:55:01,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=39720.0, ans=0.0 2024-09-16 16:55:27,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=39800.0, ans=0.125 2024-09-16 16:55:28,603 INFO [train.py:1198] (1/2) Epoch 3, batch 900, loss[loss=0.2831, ctc_loss=0.2352, cr_loss=0.3701, attn_decoder_loss=0.2802, over 29597.00 frames. ], tot_loss[loss=0.3227, ctc_loss=0.2714, cr_loss=0.4433, attn_decoder_loss=0.3186, over 5740228.44 frames. ], batch size: 73, lr: 3.25e-02, grad_scale: 8.0 2024-09-16 16:55:40,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=39800.0, ans=0.0 2024-09-16 16:55:54,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=22.5 2024-09-16 16:55:54,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=39840.0, ans=0.1 2024-09-16 16:56:46,822 INFO [train.py:1198] (1/2) Epoch 3, batch 950, loss[loss=0.3058, ctc_loss=0.2503, cr_loss=0.4542, attn_decoder_loss=0.3019, over 29506.00 frames. ], tot_loss[loss=0.3234, ctc_loss=0.2723, cr_loss=0.4442, attn_decoder_loss=0.3192, over 5742212.04 frames. ], batch size: 74, lr: 3.25e-02, grad_scale: 8.0 2024-09-16 16:56:52,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.79 vs. limit=15.0 2024-09-16 16:57:00,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=40040.0, ans=0.125 2024-09-16 16:57:01,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=12.0 2024-09-16 16:57:09,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=40040.0, ans=0.125 2024-09-16 16:57:14,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.34 vs. limit=10.0 2024-09-16 16:57:16,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2024-09-16 16:57:18,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=40080.0, ans=0.0 2024-09-16 16:57:20,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.55 vs. limit=22.5 2024-09-16 16:57:22,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.433e+02 1.639e+02 1.993e+02 1.138e+03, threshold=3.278e+02, percent-clipped=4.0 2024-09-16 16:57:31,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.13 vs. limit=15.0 2024-09-16 16:57:37,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=40120.0, ans=0.2 2024-09-16 16:57:58,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=40160.0, ans=0.125 2024-09-16 16:58:01,736 INFO [train.py:1198] (1/2) Epoch 3, batch 1000, loss[loss=0.3073, ctc_loss=0.2508, cr_loss=0.4406, attn_decoder_loss=0.3038, over 29493.00 frames. ], tot_loss[loss=0.3245, ctc_loss=0.2735, cr_loss=0.4449, attn_decoder_loss=0.3203, over 5735973.45 frames. ], batch size: 77, lr: 3.24e-02, grad_scale: 8.0 2024-09-16 16:58:27,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=40240.0, ans=0.0 2024-09-16 16:58:36,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.10 vs. limit=22.5 2024-09-16 16:58:40,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=40280.0, ans=0.125 2024-09-16 16:58:46,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=40280.0, ans=0.0 2024-09-16 16:58:51,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=40320.0, ans=0.125 2024-09-16 16:59:14,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.66 vs. limit=22.5 2024-09-16 16:59:19,696 INFO [train.py:1198] (1/2) Epoch 3, batch 1050, loss[loss=0.3322, ctc_loss=0.2679, cr_loss=0.4535, attn_decoder_loss=0.3293, over 29696.00 frames. ], tot_loss[loss=0.3234, ctc_loss=0.2721, cr_loss=0.4439, attn_decoder_loss=0.3193, over 5743433.39 frames. ], batch size: 85, lr: 3.24e-02, grad_scale: 8.0 2024-09-16 16:59:56,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.303e+02 1.463e+02 1.706e+02 2.902e+02, threshold=2.927e+02, percent-clipped=0.0 2024-09-16 16:59:58,369 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:00:09,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=40520.0, ans=0.002060869565217392 2024-09-16 17:00:29,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=40560.0, ans=0.125 2024-09-16 17:00:37,928 INFO [train.py:1198] (1/2) Epoch 3, batch 1100, loss[loss=0.3175, ctc_loss=0.2753, cr_loss=0.423, attn_decoder_loss=0.3128, over 29446.00 frames. ], tot_loss[loss=0.3223, ctc_loss=0.2707, cr_loss=0.4431, attn_decoder_loss=0.3182, over 5755751.91 frames. ], batch size: 78, lr: 3.23e-02, grad_scale: 8.0 2024-09-16 17:00:56,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=12.0 2024-09-16 17:01:04,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2024-09-16 17:01:09,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=40680.0, ans=0.125 2024-09-16 17:01:24,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=40720.0, ans=0.125 2024-09-16 17:01:33,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2024-09-16 17:01:34,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=40720.0, ans=0.125 2024-09-16 17:01:37,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=40760.0, ans=0.125 2024-09-16 17:01:40,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=40760.0, ans=0.2 2024-09-16 17:01:53,971 INFO [train.py:1198] (1/2) Epoch 3, batch 1150, loss[loss=0.3366, ctc_loss=0.2844, cr_loss=0.4654, attn_decoder_loss=0.332, over 29452.00 frames. ], tot_loss[loss=0.3226, ctc_loss=0.2707, cr_loss=0.4436, attn_decoder_loss=0.3185, over 5754005.29 frames. ], batch size: 78, lr: 3.23e-02, grad_scale: 8.0 2024-09-16 17:02:22,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=40840.0, ans=0.125 2024-09-16 17:02:24,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.71 vs. limit=22.5 2024-09-16 17:02:32,377 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.405e+02 1.623e+02 1.892e+02 4.412e+02, threshold=3.246e+02, percent-clipped=6.0 2024-09-16 17:02:59,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=40960.0, ans=0.0019652173913043483 2024-09-16 17:03:01,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=40960.0, ans=0.125 2024-09-16 17:03:11,755 INFO [train.py:1198] (1/2) Epoch 3, batch 1200, loss[loss=0.3265, ctc_loss=0.2649, cr_loss=0.4588, attn_decoder_loss=0.3231, over 29690.00 frames. ], tot_loss[loss=0.3236, ctc_loss=0.2716, cr_loss=0.444, attn_decoder_loss=0.3196, over 5746549.05 frames. ], batch size: 85, lr: 3.22e-02, grad_scale: 16.0 2024-09-16 17:03:15,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=41000.0, ans=0.0 2024-09-16 17:03:31,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=41040.0, ans=0.2 2024-09-16 17:03:34,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=41040.0, ans=0.2 2024-09-16 17:03:53,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.95 vs. limit=15.0 2024-09-16 17:04:19,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=41160.0, ans=0.025 2024-09-16 17:04:22,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=41160.0, ans=0.025 2024-09-16 17:04:27,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.80 vs. limit=22.5 2024-09-16 17:04:29,439 INFO [train.py:1198] (1/2) Epoch 3, batch 1250, loss[loss=0.3363, ctc_loss=0.2783, cr_loss=0.4749, attn_decoder_loss=0.3322, over 29532.00 frames. ], tot_loss[loss=0.3242, ctc_loss=0.2717, cr_loss=0.445, attn_decoder_loss=0.3202, over 5774893.78 frames. ], batch size: 92, lr: 3.22e-02, grad_scale: 8.0 2024-09-16 17:04:44,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=41240.0, ans=0.125 2024-09-16 17:04:48,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=41240.0, ans=0.0 2024-09-16 17:04:58,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=41280.0, ans=0.0 2024-09-16 17:05:07,515 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.013e+02 1.378e+02 1.544e+02 1.840e+02 6.927e+02, threshold=3.087e+02, percent-clipped=1.0 2024-09-16 17:05:31,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=22.5 2024-09-16 17:05:39,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=41360.0, ans=0.125 2024-09-16 17:05:47,462 INFO [train.py:1198] (1/2) Epoch 3, batch 1300, loss[loss=0.3261, ctc_loss=0.2635, cr_loss=0.4544, attn_decoder_loss=0.3229, over 28290.00 frames. ], tot_loss[loss=0.3225, ctc_loss=0.2698, cr_loss=0.4433, attn_decoder_loss=0.3185, over 5781387.80 frames. ], batch size: 111, lr: 3.21e-02, grad_scale: 8.0 2024-09-16 17:05:50,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=41400.0, ans=0.025 2024-09-16 17:05:52,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2024-09-16 17:06:15,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=41440.0, ans=0.0 2024-09-16 17:06:26,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2024-09-16 17:06:33,352 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:07:03,974 INFO [train.py:1198] (1/2) Epoch 3, batch 1350, loss[loss=0.3101, ctc_loss=0.2457, cr_loss=0.3999, attn_decoder_loss=0.3084, over 29765.00 frames. ], tot_loss[loss=0.3215, ctc_loss=0.2683, cr_loss=0.4429, attn_decoder_loss=0.3176, over 5798425.98 frames. ], batch size: 81, lr: 3.21e-02, grad_scale: 8.0 2024-09-16 17:07:08,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=41600.0, ans=0.0 2024-09-16 17:07:10,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=41600.0, ans=0.0018260869565217396 2024-09-16 17:07:16,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=41600.0, ans=0.2 2024-09-16 17:07:22,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=41640.0, ans=0.0 2024-09-16 17:07:41,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.317e+02 1.447e+02 1.601e+02 2.528e+02, threshold=2.895e+02, percent-clipped=1.0 2024-09-16 17:07:41,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=41680.0, ans=0.0 2024-09-16 17:07:47,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=41720.0, ans=0.0 2024-09-16 17:07:56,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=41720.0, ans=0.0018000000000000013 2024-09-16 17:08:09,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=14.15 vs. limit=15.0 2024-09-16 17:08:20,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=41800.0, ans=0.0 2024-09-16 17:08:21,361 INFO [train.py:1198] (1/2) Epoch 3, batch 1400, loss[loss=0.2762, ctc_loss=0.2227, cr_loss=0.3935, attn_decoder_loss=0.2734, over 29585.00 frames. ], tot_loss[loss=0.3207, ctc_loss=0.2673, cr_loss=0.442, attn_decoder_loss=0.3169, over 5809071.72 frames. ], batch size: 69, lr: 3.20e-02, grad_scale: 8.0 2024-09-16 17:09:04,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2024-09-16 17:09:19,677 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.98 vs. limit=15.0 2024-09-16 17:09:26,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=41960.0, ans=0.2 2024-09-16 17:09:37,196 INFO [train.py:1198] (1/2) Epoch 3, batch 1450, loss[loss=0.3463, ctc_loss=0.2928, cr_loss=0.4909, attn_decoder_loss=0.3413, over 29472.00 frames. ], tot_loss[loss=0.3217, ctc_loss=0.2682, cr_loss=0.4433, attn_decoder_loss=0.3178, over 5807215.45 frames. ], batch size: 94, lr: 3.20e-02, grad_scale: 8.0 2024-09-16 17:10:03,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=42040.0, ans=0.125 2024-09-16 17:10:12,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2024-09-16 17:10:17,159 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.984e+01 1.371e+02 1.551e+02 1.946e+02 4.633e+02, threshold=3.101e+02, percent-clipped=3.0 2024-09-16 17:10:39,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=42160.0, ans=0.001704347826086956 2024-09-16 17:10:44,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=42160.0, ans=0.125 2024-09-16 17:10:55,034 INFO [train.py:1198] (1/2) Epoch 3, batch 1500, loss[loss=0.3271, ctc_loss=0.2794, cr_loss=0.4373, attn_decoder_loss=0.3227, over 29630.00 frames. ], tot_loss[loss=0.3221, ctc_loss=0.2686, cr_loss=0.4439, attn_decoder_loss=0.3182, over 5808090.62 frames. ], batch size: 86, lr: 3.19e-02, grad_scale: 8.0 2024-09-16 17:11:14,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=42240.0, ans=0.125 2024-09-16 17:11:41,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2024-09-16 17:11:45,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=42320.0, ans=0.0 2024-09-16 17:11:47,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=42320.0, ans=0.1 2024-09-16 17:12:13,824 INFO [train.py:1198] (1/2) Epoch 3, batch 1550, loss[loss=0.3561, ctc_loss=0.3082, cr_loss=0.4934, attn_decoder_loss=0.3504, over 29530.00 frames. ], tot_loss[loss=0.3229, ctc_loss=0.2704, cr_loss=0.4445, attn_decoder_loss=0.3189, over 5782877.81 frames. ], batch size: 90, lr: 3.19e-02, grad_scale: 8.0 2024-09-16 17:12:51,101 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.385e+02 1.541e+02 1.743e+02 3.737e+02, threshold=3.082e+02, percent-clipped=1.0 2024-09-16 17:12:55,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=42480.0, ans=0.0 2024-09-16 17:13:10,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=10.88 vs. limit=15.0 2024-09-16 17:13:18,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=42560.0, ans=0.125 2024-09-16 17:13:27,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.44 vs. limit=22.5 2024-09-16 17:13:28,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=42560.0, ans=0.125 2024-09-16 17:13:30,977 INFO [train.py:1198] (1/2) Epoch 3, batch 1600, loss[loss=0.33, ctc_loss=0.2737, cr_loss=0.462, attn_decoder_loss=0.326, over 29675.00 frames. ], tot_loss[loss=0.3225, ctc_loss=0.2704, cr_loss=0.4438, attn_decoder_loss=0.3185, over 5763804.90 frames. ], batch size: 85, lr: 3.18e-02, grad_scale: 16.0 2024-09-16 17:14:31,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=42760.0, ans=0.0 2024-09-16 17:14:46,580 INFO [train.py:1198] (1/2) Epoch 3, batch 1650, loss[loss=0.3334, ctc_loss=0.2734, cr_loss=0.4496, attn_decoder_loss=0.3301, over 29700.00 frames. ], tot_loss[loss=0.322, ctc_loss=0.2694, cr_loss=0.4436, attn_decoder_loss=0.318, over 5756691.06 frames. ], batch size: 89, lr: 3.18e-02, grad_scale: 8.0 2024-09-16 17:15:26,273 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.391e+02 1.585e+02 1.858e+02 6.012e+02, threshold=3.169e+02, percent-clipped=6.0 2024-09-16 17:15:29,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=42880.0, ans=0.1 2024-09-16 17:15:30,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-09-16 17:15:41,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=42920.0, ans=0.0 2024-09-16 17:16:04,590 INFO [train.py:1198] (1/2) Epoch 3, batch 1700, loss[loss=0.2668, ctc_loss=0.2037, cr_loss=0.4131, attn_decoder_loss=0.2646, over 29587.00 frames. ], tot_loss[loss=0.3215, ctc_loss=0.2684, cr_loss=0.4434, attn_decoder_loss=0.3175, over 5778485.02 frames. ], batch size: 69, lr: 3.17e-02, grad_scale: 8.0 2024-09-16 17:16:07,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=43000.0, ans=0.035 2024-09-16 17:16:12,456 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:16:22,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-16 17:16:29,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=43040.0, ans=0.0 2024-09-16 17:16:35,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=43080.0, ans=0.0 2024-09-16 17:16:56,000 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:17:00,581 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:17:22,491 INFO [train.py:1198] (1/2) Epoch 3, batch 1750, loss[loss=0.2902, ctc_loss=0.2438, cr_loss=0.4403, attn_decoder_loss=0.2855, over 29357.00 frames. ], tot_loss[loss=0.3208, ctc_loss=0.2677, cr_loss=0.4429, attn_decoder_loss=0.3168, over 5787083.81 frames. ], batch size: 67, lr: 3.17e-02, grad_scale: 8.0 2024-09-16 17:17:33,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=43200.0, ans=0.125 2024-09-16 17:17:57,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=43280.0, ans=0.125 2024-09-16 17:18:01,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.347e+02 1.520e+02 1.785e+02 2.603e+02, threshold=3.040e+02, percent-clipped=0.0 2024-09-16 17:18:06,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=43320.0, ans=0.1 2024-09-16 17:18:38,186 INFO [train.py:1198] (1/2) Epoch 3, batch 1800, loss[loss=0.3204, ctc_loss=0.2628, cr_loss=0.4513, attn_decoder_loss=0.3168, over 29685.00 frames. ], tot_loss[loss=0.3206, ctc_loss=0.2675, cr_loss=0.4428, attn_decoder_loss=0.3167, over 5789722.28 frames. ], batch size: 83, lr: 3.16e-02, grad_scale: 8.0 2024-09-16 17:18:41,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=43400.0, ans=0.0 2024-09-16 17:18:52,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43440.0, ans=0.1 2024-09-16 17:19:08,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2024-09-16 17:19:11,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=22.5 2024-09-16 17:19:34,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=43520.0, ans=0.0 2024-09-16 17:19:36,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=43520.0, ans=0.0014086956521739136 2024-09-16 17:19:54,247 INFO [train.py:1198] (1/2) Epoch 3, batch 1850, loss[loss=0.3388, ctc_loss=0.2819, cr_loss=0.4575, attn_decoder_loss=0.335, over 29619.00 frames. ], tot_loss[loss=0.3204, ctc_loss=0.2671, cr_loss=0.443, attn_decoder_loss=0.3165, over 5796699.06 frames. ], batch size: 86, lr: 3.16e-02, grad_scale: 8.0 2024-09-16 17:20:04,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=43600.0, ans=0.05 2024-09-16 17:20:07,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=43600.0, ans=0.125 2024-09-16 17:20:17,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=43640.0, ans=0.125 2024-09-16 17:20:35,358 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.038e+02 1.308e+02 1.428e+02 1.692e+02 5.194e+02, threshold=2.856e+02, percent-clipped=3.0 2024-09-16 17:21:04,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=43760.0, ans=0.1 2024-09-16 17:21:06,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=43760.0, ans=0.0 2024-09-16 17:21:13,411 INFO [train.py:1198] (1/2) Epoch 3, batch 1900, loss[loss=0.3458, ctc_loss=0.2971, cr_loss=0.4724, attn_decoder_loss=0.3407, over 29696.00 frames. ], tot_loss[loss=0.3213, ctc_loss=0.2679, cr_loss=0.4444, attn_decoder_loss=0.3174, over 5804527.46 frames. ], batch size: 89, lr: 3.15e-02, grad_scale: 8.0 2024-09-16 17:21:22,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=43800.0, ans=0.125 2024-09-16 17:21:28,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=43840.0, ans=0.0 2024-09-16 17:21:31,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=43840.0, ans=0.5 2024-09-16 17:21:36,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=43840.0, ans=0.125 2024-09-16 17:21:54,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=43880.0, ans=0.025 2024-09-16 17:22:02,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=43920.0, ans=0.025 2024-09-16 17:22:03,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.46 vs. limit=10.0 2024-09-16 17:22:29,843 INFO [train.py:1198] (1/2) Epoch 3, batch 1950, loss[loss=0.2989, ctc_loss=0.2382, cr_loss=0.4346, attn_decoder_loss=0.296, over 29459.00 frames. ], tot_loss[loss=0.322, ctc_loss=0.2676, cr_loss=0.446, attn_decoder_loss=0.3181, over 5819636.34 frames. ], batch size: 78, lr: 3.15e-02, grad_scale: 8.0 2024-09-16 17:22:33,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=44000.0, ans=0.125 2024-09-16 17:22:45,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=44040.0, ans=0.125 2024-09-16 17:22:49,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-09-16 17:23:09,312 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.320e+02 1.491e+02 1.683e+02 2.702e+02, threshold=2.982e+02, percent-clipped=0.0 2024-09-16 17:23:17,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=44120.0, ans=0.0 2024-09-16 17:23:32,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=44160.0, ans=0.1 2024-09-16 17:23:35,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_na.min_abs, batch_count=44160.0, ans=0.02 2024-09-16 17:23:45,534 INFO [train.py:1198] (1/2) Epoch 3, batch 2000, loss[loss=0.2871, ctc_loss=0.2403, cr_loss=0.4093, attn_decoder_loss=0.2832, over 29367.00 frames. ], tot_loss[loss=0.3228, ctc_loss=0.2688, cr_loss=0.4467, attn_decoder_loss=0.3188, over 5796090.13 frames. ], batch size: 67, lr: 3.14e-02, grad_scale: 16.0 2024-09-16 17:23:52,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=44200.0, ans=0.125 2024-09-16 17:24:23,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44280.0, ans=0.1 2024-09-16 17:24:58,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=44360.0, ans=0.125 2024-09-16 17:25:05,788 INFO [train.py:1198] (1/2) Epoch 3, batch 2050, loss[loss=0.2963, ctc_loss=0.2492, cr_loss=0.4155, attn_decoder_loss=0.2923, over 29445.00 frames. ], tot_loss[loss=0.3217, ctc_loss=0.2679, cr_loss=0.4453, attn_decoder_loss=0.3177, over 5788609.27 frames. ], batch size: 70, lr: 3.14e-02, grad_scale: 8.0 2024-09-16 17:25:30,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=44440.0, ans=0.125 2024-09-16 17:25:42,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=44480.0, ans=0.0 2024-09-16 17:25:46,772 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.399e+02 1.535e+02 1.932e+02 1.271e+03, threshold=3.069e+02, percent-clipped=4.0 2024-09-16 17:25:54,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-16 17:26:21,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=15.0 2024-09-16 17:26:21,853 INFO [train.py:1198] (1/2) Epoch 3, batch 2100, loss[loss=0.3276, ctc_loss=0.2685, cr_loss=0.4556, attn_decoder_loss=0.324, over 29765.00 frames. ], tot_loss[loss=0.3207, ctc_loss=0.2667, cr_loss=0.4451, attn_decoder_loss=0.3168, over 5800645.57 frames. ], batch size: 81, lr: 3.13e-02, grad_scale: 8.0 2024-09-16 17:26:37,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=44640.0, ans=0.05 2024-09-16 17:26:41,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=44640.0, ans=0.125 2024-09-16 17:26:50,840 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:26:53,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=44680.0, ans=0.125 2024-09-16 17:26:56,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=44680.0, ans=0.0 2024-09-16 17:26:56,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=44680.0, ans=0.125 2024-09-16 17:26:58,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=44680.0, ans=0.07 2024-09-16 17:26:59,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=44680.0, ans=0.1 2024-09-16 17:27:04,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=44680.0, ans=0.125 2024-09-16 17:27:22,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=44760.0, ans=0.125 2024-09-16 17:27:38,162 INFO [train.py:1198] (1/2) Epoch 3, batch 2150, loss[loss=0.3055, ctc_loss=0.2544, cr_loss=0.3852, attn_decoder_loss=0.3026, over 29459.00 frames. ], tot_loss[loss=0.3191, ctc_loss=0.2648, cr_loss=0.4435, attn_decoder_loss=0.3153, over 5815540.95 frames. ], batch size: 78, lr: 3.13e-02, grad_scale: 8.0 2024-09-16 17:27:39,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=44800.0, ans=12.0 2024-09-16 17:28:01,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=44840.0, ans=0.0 2024-09-16 17:28:10,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=44880.0, ans=0.125 2024-09-16 17:28:20,978 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.285e+02 1.430e+02 1.712e+02 4.702e+02, threshold=2.859e+02, percent-clipped=3.0 2024-09-16 17:28:27,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=44920.0, ans=0.0011043478260869578 2024-09-16 17:28:33,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=44920.0, ans=15.0 2024-09-16 17:28:39,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=44960.0, ans=0.125 2024-09-16 17:28:55,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=44960.0, ans=0.05 2024-09-16 17:28:56,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45000.0, ans=0.1 2024-09-16 17:28:57,951 INFO [train.py:1198] (1/2) Epoch 3, batch 2200, loss[loss=0.3243, ctc_loss=0.2679, cr_loss=0.4462, attn_decoder_loss=0.3206, over 29632.00 frames. ], tot_loss[loss=0.3192, ctc_loss=0.2648, cr_loss=0.4432, attn_decoder_loss=0.3153, over 5812741.71 frames. ], batch size: 86, lr: 3.12e-02, grad_scale: 8.0 2024-09-16 17:29:33,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.67 vs. limit=22.5 2024-09-16 17:29:34,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=45080.0, ans=0.125 2024-09-16 17:29:38,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=45080.0, ans=0.125 2024-09-16 17:30:13,754 INFO [train.py:1198] (1/2) Epoch 3, batch 2250, loss[loss=0.3292, ctc_loss=0.268, cr_loss=0.4485, attn_decoder_loss=0.326, over 29692.00 frames. ], tot_loss[loss=0.3188, ctc_loss=0.2641, cr_loss=0.4426, attn_decoder_loss=0.315, over 5810926.21 frames. ], batch size: 82, lr: 3.12e-02, grad_scale: 8.0 2024-09-16 17:30:24,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45200.0, ans=0.1 2024-09-16 17:30:54,403 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.411e+02 1.554e+02 1.919e+02 3.789e+02, threshold=3.108e+02, percent-clipped=3.0 2024-09-16 17:30:54,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=45280.0, ans=0.07 2024-09-16 17:30:59,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=45320.0, ans=0.125 2024-09-16 17:31:10,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.14 vs. limit=10.0 2024-09-16 17:31:21,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=45360.0, ans=0.2 2024-09-16 17:31:29,210 INFO [train.py:1198] (1/2) Epoch 3, batch 2300, loss[loss=0.288, ctc_loss=0.2288, cr_loss=0.418, attn_decoder_loss=0.2853, over 29344.00 frames. ], tot_loss[loss=0.3179, ctc_loss=0.2633, cr_loss=0.4417, attn_decoder_loss=0.3142, over 5799027.41 frames. ], batch size: 71, lr: 3.11e-02, grad_scale: 8.0 2024-09-16 17:31:45,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=45440.0, ans=0.0 2024-09-16 17:32:00,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=45480.0, ans=0.2 2024-09-16 17:32:09,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=45480.0, ans=0.0 2024-09-16 17:32:14,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=45480.0, ans=0.1 2024-09-16 17:32:20,395 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:32:49,796 INFO [train.py:1198] (1/2) Epoch 3, batch 2350, loss[loss=0.3367, ctc_loss=0.2791, cr_loss=0.4699, attn_decoder_loss=0.3326, over 29704.00 frames. ], tot_loss[loss=0.3181, ctc_loss=0.2633, cr_loss=0.4423, attn_decoder_loss=0.3144, over 5805070.31 frames. ], batch size: 83, lr: 3.11e-02, grad_scale: 8.0 2024-09-16 17:33:03,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=45640.0, ans=0.125 2024-09-16 17:33:22,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=45680.0, ans=0.125 2024-09-16 17:33:22,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=45680.0, ans=0.125 2024-09-16 17:33:30,847 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.361e+02 1.552e+02 1.880e+02 4.928e+02, threshold=3.104e+02, percent-clipped=4.0 2024-09-16 17:33:32,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=45680.0, ans=0.125 2024-09-16 17:33:48,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.58 vs. limit=22.5 2024-09-16 17:34:01,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=45760.0, ans=0.0009217391304347823 2024-09-16 17:34:04,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=45800.0, ans=0.0 2024-09-16 17:34:06,304 INFO [train.py:1198] (1/2) Epoch 3, batch 2400, loss[loss=0.314, ctc_loss=0.2658, cr_loss=0.4354, attn_decoder_loss=0.3096, over 29534.00 frames. ], tot_loss[loss=0.3186, ctc_loss=0.2638, cr_loss=0.4433, attn_decoder_loss=0.3149, over 5807918.63 frames. ], batch size: 76, lr: 3.10e-02, grad_scale: 16.0 2024-09-16 17:34:09,611 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:34:11,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=45800.0, ans=0.0009130434782608689 2024-09-16 17:34:37,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=45880.0, ans=0.0 2024-09-16 17:34:38,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=45880.0, ans=0.025 2024-09-16 17:34:46,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45880.0, ans=0.1 2024-09-16 17:34:47,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=45880.0, ans=0.125 2024-09-16 17:34:49,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.56 vs. limit=15.0 2024-09-16 17:34:58,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=45920.0, ans=0.125 2024-09-16 17:34:58,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=45920.0, ans=0.0008869565217391306 2024-09-16 17:35:01,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=45920.0, ans=0.0008869565217391306 2024-09-16 17:35:07,558 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:35:12,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=45960.0, ans=0.0008782608695652172 2024-09-16 17:35:22,219 INFO [train.py:1198] (1/2) Epoch 3, batch 2450, loss[loss=0.3241, ctc_loss=0.274, cr_loss=0.449, attn_decoder_loss=0.3197, over 29713.00 frames. ], tot_loss[loss=0.3201, ctc_loss=0.2656, cr_loss=0.4444, attn_decoder_loss=0.3163, over 5785692.78 frames. ], batch size: 82, lr: 3.10e-02, grad_scale: 8.0 2024-09-16 17:35:47,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=46040.0, ans=0.95 2024-09-16 17:35:54,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=46080.0, ans=0.125 2024-09-16 17:35:57,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=46080.0, ans=0.125 2024-09-16 17:36:06,479 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.425e+02 1.645e+02 1.863e+02 7.632e+02, threshold=3.291e+02, percent-clipped=3.0 2024-09-16 17:36:24,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=46120.0, ans=0.04949747468305833 2024-09-16 17:36:41,948 INFO [train.py:1198] (1/2) Epoch 3, batch 2500, loss[loss=0.3061, ctc_loss=0.2334, cr_loss=0.4246, attn_decoder_loss=0.3048, over 29609.00 frames. ], tot_loss[loss=0.3193, ctc_loss=0.2644, cr_loss=0.4439, attn_decoder_loss=0.3155, over 5795647.65 frames. ], batch size: 86, lr: 3.09e-02, grad_scale: 8.0 2024-09-16 17:36:48,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=46200.0, ans=0.05 2024-09-16 17:36:58,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=46240.0, ans=0.07 2024-09-16 17:37:04,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-09-16 17:37:06,843 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:37:23,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.25 vs. limit=10.0 2024-09-16 17:37:43,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=46360.0, ans=0.0 2024-09-16 17:37:43,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.11 vs. limit=15.0 2024-09-16 17:37:46,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=46360.0, ans=0.1 2024-09-16 17:37:49,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=46360.0, ans=0.025 2024-09-16 17:37:50,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=46360.0, ans=0.125 2024-09-16 17:37:58,393 INFO [train.py:1198] (1/2) Epoch 3, batch 2550, loss[loss=0.2726, ctc_loss=0.2076, cr_loss=0.3915, attn_decoder_loss=0.2711, over 29391.00 frames. ], tot_loss[loss=0.3189, ctc_loss=0.2637, cr_loss=0.4442, attn_decoder_loss=0.3152, over 5797607.79 frames. ], batch size: 67, lr: 3.09e-02, grad_scale: 8.0 2024-09-16 17:38:13,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=46440.0, ans=0.0007739130434782603 2024-09-16 17:38:40,772 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.310e+02 1.464e+02 1.728e+02 3.657e+02, threshold=2.928e+02, percent-clipped=2.0 2024-09-16 17:39:14,457 INFO [train.py:1198] (1/2) Epoch 3, batch 2600, loss[loss=0.2976, ctc_loss=0.2385, cr_loss=0.4222, attn_decoder_loss=0.2948, over 29450.00 frames. ], tot_loss[loss=0.3189, ctc_loss=0.2634, cr_loss=0.4442, attn_decoder_loss=0.3152, over 5794106.73 frames. ], batch size: 78, lr: 3.08e-02, grad_scale: 8.0 2024-09-16 17:39:22,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=46600.0, ans=0.0 2024-09-16 17:39:52,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=46680.0, ans=0.125 2024-09-16 17:39:54,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-09-16 17:40:00,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-09-16 17:40:06,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=46720.0, ans=0.2 2024-09-16 17:40:09,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=46720.0, ans=0.125 2024-09-16 17:40:18,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=46760.0, ans=0.0 2024-09-16 17:40:21,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=46760.0, ans=0.05 2024-09-16 17:40:23,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=46760.0, ans=0.125 2024-09-16 17:40:24,942 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:40:24,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=46760.0, ans=0.1 2024-09-16 17:40:33,428 INFO [train.py:1198] (1/2) Epoch 3, batch 2650, loss[loss=0.3354, ctc_loss=0.2879, cr_loss=0.4586, attn_decoder_loss=0.3305, over 29258.00 frames. ], tot_loss[loss=0.3191, ctc_loss=0.2635, cr_loss=0.4446, attn_decoder_loss=0.3154, over 5799977.10 frames. ], batch size: 100, lr: 3.08e-02, grad_scale: 8.0 2024-09-16 17:40:33,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=46800.0, ans=0.125 2024-09-16 17:40:57,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=46840.0, ans=0.125 2024-09-16 17:40:58,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=46840.0, ans=10.0 2024-09-16 17:41:08,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=46880.0, ans=0.05 2024-09-16 17:41:15,665 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.367e+02 1.536e+02 1.778e+02 3.177e+02, threshold=3.072e+02, percent-clipped=2.0 2024-09-16 17:41:15,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=46880.0, ans=0.125 2024-09-16 17:41:25,127 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:41:29,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=46920.0, ans=0.125 2024-09-16 17:41:49,023 INFO [train.py:1198] (1/2) Epoch 3, batch 2700, loss[loss=0.343, ctc_loss=0.2885, cr_loss=0.5187, attn_decoder_loss=0.3375, over 29518.00 frames. ], tot_loss[loss=0.3198, ctc_loss=0.2644, cr_loss=0.4463, attn_decoder_loss=0.316, over 5794938.22 frames. ], batch size: 87, lr: 3.08e-02, grad_scale: 8.0 2024-09-16 17:42:04,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=47040.0, ans=0.0006434782608695649 2024-09-16 17:42:10,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=47040.0, ans=0.025 2024-09-16 17:42:17,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=47080.0, ans=0.125 2024-09-16 17:42:24,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=47080.0, ans=0.125 2024-09-16 17:42:44,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=47120.0, ans=0.125 2024-09-16 17:43:05,727 INFO [train.py:1198] (1/2) Epoch 3, batch 2750, loss[loss=0.3052, ctc_loss=0.2374, cr_loss=0.41, attn_decoder_loss=0.3036, over 29507.00 frames. ], tot_loss[loss=0.3186, ctc_loss=0.2633, cr_loss=0.4446, attn_decoder_loss=0.3148, over 5793763.80 frames. ], batch size: 75, lr: 3.07e-02, grad_scale: 8.0 2024-09-16 17:43:06,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=47200.0, ans=0.04949747468305833 2024-09-16 17:43:28,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-09-16 17:43:45,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=47280.0, ans=0.07 2024-09-16 17:43:48,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47280.0, ans=0.1 2024-09-16 17:43:50,014 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.345e+02 1.531e+02 1.898e+02 4.354e+02, threshold=3.062e+02, percent-clipped=3.0 2024-09-16 17:43:53,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=47320.0, ans=0.04949747468305833 2024-09-16 17:44:23,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=19.56 vs. limit=15.0 2024-09-16 17:44:25,560 INFO [train.py:1198] (1/2) Epoch 3, batch 2800, loss[loss=0.3676, ctc_loss=0.3493, cr_loss=0.4212, attn_decoder_loss=0.3603, over 20641.00 frames. ], tot_loss[loss=0.319, ctc_loss=0.2637, cr_loss=0.4442, attn_decoder_loss=0.3153, over 5775159.89 frames. ], batch size: 210, lr: 3.07e-02, grad_scale: 16.0 2024-09-16 17:44:27,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=47400.0, ans=0.125 2024-09-16 17:44:42,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=47440.0, ans=0.2 2024-09-16 17:44:55,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=47480.0, ans=0.125 2024-09-16 17:44:57,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=47480.0, ans=0.125 2024-09-16 17:44:59,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=47480.0, ans=0.125 2024-09-16 17:45:01,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=47480.0, ans=0.0 2024-09-16 17:45:16,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=47520.0, ans=0.125 2024-09-16 17:45:23,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47520.0, ans=0.1 2024-09-16 17:45:40,749 INFO [train.py:1198] (1/2) Epoch 3, batch 2850, loss[loss=0.3081, ctc_loss=0.2582, cr_loss=0.4759, attn_decoder_loss=0.3031, over 29499.00 frames. ], tot_loss[loss=0.3193, ctc_loss=0.264, cr_loss=0.4444, attn_decoder_loss=0.3156, over 5760975.16 frames. ], batch size: 77, lr: 3.06e-02, grad_scale: 8.0 2024-09-16 17:45:54,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=47640.0, ans=0.0 2024-09-16 17:46:01,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=47640.0, ans=0.025 2024-09-16 17:46:25,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.383e+02 1.687e+02 2.154e+02 5.154e+02, threshold=3.374e+02, percent-clipped=7.0 2024-09-16 17:46:45,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.29 vs. limit=15.0 2024-09-16 17:46:51,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=47760.0, ans=0.125 2024-09-16 17:46:55,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=47800.0, ans=0.125 2024-09-16 17:46:56,749 INFO [train.py:1198] (1/2) Epoch 3, batch 2900, loss[loss=0.3192, ctc_loss=0.2645, cr_loss=0.4614, attn_decoder_loss=0.315, over 29416.00 frames. ], tot_loss[loss=0.3201, ctc_loss=0.2643, cr_loss=0.4461, attn_decoder_loss=0.3164, over 5786606.87 frames. ], batch size: 79, lr: 3.06e-02, grad_scale: 8.0 2024-09-16 17:47:09,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=47800.0, ans=0.2 2024-09-16 17:47:20,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-09-16 17:47:23,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=47840.0, ans=0.125 2024-09-16 17:47:47,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=47920.0, ans=0.2 2024-09-16 17:47:50,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=47920.0, ans=0.0004521739130434795 2024-09-16 17:47:55,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.69 vs. limit=15.0 2024-09-16 17:48:00,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=47960.0, ans=0.125 2024-09-16 17:48:24,232 INFO [train.py:1198] (1/2) Epoch 3, batch 2950, loss[loss=0.3079, ctc_loss=0.2541, cr_loss=0.4352, attn_decoder_loss=0.3042, over 29518.00 frames. ], tot_loss[loss=0.3183, ctc_loss=0.2625, cr_loss=0.4445, attn_decoder_loss=0.3146, over 5781527.37 frames. ], batch size: 75, lr: 3.05e-02, grad_scale: 8.0 2024-09-16 17:49:08,107 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.336e+02 1.504e+02 1.810e+02 3.679e+02, threshold=3.009e+02, percent-clipped=1.0 2024-09-16 17:49:14,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2024-09-16 17:49:19,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=48120.0, ans=0.05 2024-09-16 17:49:19,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=48120.0, ans=0.0 2024-09-16 17:49:26,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=48160.0, ans=0.5 2024-09-16 17:49:37,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=48160.0, ans=0.125 2024-09-16 17:49:40,395 INFO [train.py:1198] (1/2) Epoch 3, batch 3000, loss[loss=0.3198, ctc_loss=0.2614, cr_loss=0.4206, attn_decoder_loss=0.3169, over 29774.00 frames. ], tot_loss[loss=0.3181, ctc_loss=0.2625, cr_loss=0.4441, attn_decoder_loss=0.3144, over 5782323.57 frames. ], batch size: 81, lr: 3.05e-02, grad_scale: 8.0 2024-09-16 17:49:40,396 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 17:49:58,749 INFO [train.py:1230] (1/2) Epoch 3, validation: loss=0.2335, ctc_loss=0.0936, cr_loss=4.436e-15, attn_decoder_loss=0.2491, over 944034.00 frames. 2024-09-16 17:49:58,749 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 17:50:13,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.97 vs. limit=22.5 2024-09-16 17:50:31,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.89 vs. limit=22.5 2024-09-16 17:50:41,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=48280.0, ans=0.125 2024-09-16 17:50:44,053 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-09-16 17:51:16,887 INFO [train.py:1198] (1/2) Epoch 3, batch 3050, loss[loss=0.3029, ctc_loss=0.241, cr_loss=0.4453, attn_decoder_loss=0.2999, over 29540.00 frames. ], tot_loss[loss=0.319, ctc_loss=0.2635, cr_loss=0.4445, attn_decoder_loss=0.3153, over 5776014.81 frames. ], batch size: 76, lr: 3.04e-02, grad_scale: 4.0 2024-09-16 17:51:41,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=48440.0, ans=0.025 2024-09-16 17:51:52,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=48480.0, ans=0.0 2024-09-16 17:51:55,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-09-16 17:52:04,205 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.201e+02 1.405e+02 1.578e+02 1.940e+02 5.924e+02, threshold=3.157e+02, percent-clipped=5.0 2024-09-16 17:52:34,209 INFO [train.py:1198] (1/2) Epoch 3, batch 3100, loss[loss=0.3478, ctc_loss=0.2989, cr_loss=0.4771, attn_decoder_loss=0.3427, over 29255.00 frames. ], tot_loss[loss=0.3186, ctc_loss=0.2631, cr_loss=0.4443, attn_decoder_loss=0.3149, over 5776381.15 frames. ], batch size: 100, lr: 3.04e-02, grad_scale: 8.0 2024-09-16 17:52:40,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48600.0, ans=0.1 2024-09-16 17:52:51,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=15.0 2024-09-16 17:52:54,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=48640.0, ans=0.00029565217391304237 2024-09-16 17:52:58,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=48640.0, ans=0.025 2024-09-16 17:53:48,582 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:53:49,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-09-16 17:53:50,241 INFO [train.py:1198] (1/2) Epoch 3, batch 3150, loss[loss=0.3336, ctc_loss=0.2756, cr_loss=0.4803, attn_decoder_loss=0.3294, over 28928.00 frames. ], tot_loss[loss=0.3186, ctc_loss=0.263, cr_loss=0.4446, attn_decoder_loss=0.3149, over 5783341.28 frames. ], batch size: 104, lr: 3.03e-02, grad_scale: 8.0 2024-09-16 17:53:58,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=48800.0, ans=0.125 2024-09-16 17:54:01,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=48800.0, ans=0.125 2024-09-16 17:54:01,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-09-16 17:54:35,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.334e+02 1.533e+02 1.776e+02 7.773e+02, threshold=3.065e+02, percent-clipped=4.0 2024-09-16 17:54:38,836 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:54:50,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=48960.0, ans=0.035 2024-09-16 17:54:57,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.43 vs. limit=15.0 2024-09-16 17:55:07,980 INFO [train.py:1198] (1/2) Epoch 3, batch 3200, loss[loss=0.3063, ctc_loss=0.241, cr_loss=0.4233, attn_decoder_loss=0.3042, over 29411.00 frames. ], tot_loss[loss=0.3174, ctc_loss=0.2617, cr_loss=0.4442, attn_decoder_loss=0.3137, over 5793645.54 frames. ], batch size: 79, lr: 3.03e-02, grad_scale: 16.0 2024-09-16 17:55:14,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=49000.0, ans=0.125 2024-09-16 17:55:15,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=49000.0, ans=0.025 2024-09-16 17:55:22,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=12.0 2024-09-16 17:55:33,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=49040.0, ans=0.00020869565217391216 2024-09-16 17:55:33,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=49040.0, ans=0.125 2024-09-16 17:55:34,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.41 vs. limit=10.0 2024-09-16 17:55:35,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.10 vs. limit=10.0 2024-09-16 17:55:35,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2024-09-16 17:56:23,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=49160.0, ans=0.0001826086956521738 2024-09-16 17:56:23,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=49160.0, ans=0.2 2024-09-16 17:56:26,020 INFO [train.py:1198] (1/2) Epoch 3, batch 3250, loss[loss=0.3364, ctc_loss=0.2782, cr_loss=0.4672, attn_decoder_loss=0.3325, over 29698.00 frames. ], tot_loss[loss=0.3179, ctc_loss=0.2617, cr_loss=0.4452, attn_decoder_loss=0.3142, over 5800542.56 frames. ], batch size: 84, lr: 3.03e-02, grad_scale: 8.0 2024-09-16 17:56:44,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=49240.0, ans=0.125 2024-09-16 17:56:53,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=49240.0, ans=0.0 2024-09-16 17:56:57,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=49280.0, ans=0.0 2024-09-16 17:57:00,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=49280.0, ans=0.125 2024-09-16 17:57:12,499 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.316e+02 1.449e+02 1.854e+02 6.916e+02, threshold=2.898e+02, percent-clipped=2.0 2024-09-16 17:57:24,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=49360.0, ans=0.00013913043478260868 2024-09-16 17:57:41,272 INFO [train.py:1198] (1/2) Epoch 3, batch 3300, loss[loss=0.3357, ctc_loss=0.2739, cr_loss=0.4701, attn_decoder_loss=0.3321, over 28279.00 frames. ], tot_loss[loss=0.3168, ctc_loss=0.2611, cr_loss=0.4442, attn_decoder_loss=0.3131, over 5797763.73 frames. ], batch size: 111, lr: 3.02e-02, grad_scale: 8.0 2024-09-16 17:57:42,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-09-16 17:57:49,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2024-09-16 17:57:50,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=49400.0, ans=0.125 2024-09-16 17:57:50,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=49400.0, ans=0.0 2024-09-16 17:57:54,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=49400.0, ans=15.0 2024-09-16 17:58:12,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=49480.0, ans=0.125 2024-09-16 17:58:24,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2024-09-16 17:58:27,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=49520.0, ans=0.2 2024-09-16 17:58:27,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2024-09-16 17:58:34,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=49520.0, ans=0.125 2024-09-16 17:58:34,796 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:58:56,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=49560.0, ans=0.125 2024-09-16 17:58:59,505 INFO [train.py:1198] (1/2) Epoch 3, batch 3350, loss[loss=0.3337, ctc_loss=0.278, cr_loss=0.4749, attn_decoder_loss=0.3294, over 28910.00 frames. ], tot_loss[loss=0.3184, ctc_loss=0.263, cr_loss=0.4457, attn_decoder_loss=0.3146, over 5775235.32 frames. ], batch size: 104, lr: 3.02e-02, grad_scale: 8.0 2024-09-16 17:59:01,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=49600.0, ans=0.125 2024-09-16 17:59:04,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49600.0, ans=0.1 2024-09-16 17:59:15,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=49640.0, ans=0.0 2024-09-16 17:59:17,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=49640.0, ans=0.125 2024-09-16 17:59:48,720 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.330e+02 1.460e+02 1.779e+02 4.186e+02, threshold=2.920e+02, percent-clipped=7.0 2024-09-16 17:59:58,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=49720.0, ans=0.0 2024-09-16 18:00:02,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49760.0, ans=0.1 2024-09-16 18:00:05,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=49760.0, ans=0.125 2024-09-16 18:00:13,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=49760.0, ans=5.217391304347847e-05 2024-09-16 18:00:17,591 INFO [train.py:1198] (1/2) Epoch 3, batch 3400, loss[loss=0.2961, ctc_loss=0.2484, cr_loss=0.4053, attn_decoder_loss=0.2924, over 29327.00 frames. ], tot_loss[loss=0.3184, ctc_loss=0.2631, cr_loss=0.4463, attn_decoder_loss=0.3146, over 5767634.56 frames. ], batch size: 67, lr: 3.01e-02, grad_scale: 8.0 2024-09-16 18:00:22,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=49800.0, ans=0.125 2024-09-16 18:00:28,484 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:00:30,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.15 vs. limit=22.5 2024-09-16 18:00:38,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49840.0, ans=0.1 2024-09-16 18:01:13,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=49920.0, ans=0.0 2024-09-16 18:01:19,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=49960.0, ans=0.0 2024-09-16 18:01:33,048 INFO [train.py:1198] (1/2) Epoch 3, batch 3450, loss[loss=0.3242, ctc_loss=0.2602, cr_loss=0.4867, attn_decoder_loss=0.3205, over 28336.00 frames. ], tot_loss[loss=0.3184, ctc_loss=0.2626, cr_loss=0.4465, attn_decoder_loss=0.3147, over 5775497.59 frames. ], batch size: 111, lr: 3.01e-02, grad_scale: 8.0 2024-09-16 18:01:45,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=50000.0, ans=0.125 2024-09-16 18:02:03,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50080.0, ans=0.1 2024-09-16 18:02:12,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-09-16 18:02:19,801 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.389e+02 1.591e+02 1.812e+02 6.127e+02, threshold=3.183e+02, percent-clipped=1.0 2024-09-16 18:02:26,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=50120.0, ans=0.125 2024-09-16 18:02:27,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=50120.0, ans=0.1 2024-09-16 18:02:38,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=50160.0, ans=0.2 2024-09-16 18:02:46,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=50160.0, ans=0.0 2024-09-16 18:02:47,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=50160.0, ans=0.125 2024-09-16 18:02:50,652 INFO [train.py:1198] (1/2) Epoch 3, batch 3500, loss[loss=0.2851, ctc_loss=0.2424, cr_loss=0.4011, attn_decoder_loss=0.2809, over 29352.00 frames. ], tot_loss[loss=0.3175, ctc_loss=0.2618, cr_loss=0.4448, attn_decoder_loss=0.3139, over 5778110.86 frames. ], batch size: 71, lr: 3.00e-02, grad_scale: 8.0 2024-09-16 18:03:08,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=50240.0, ans=0.0 2024-09-16 18:03:21,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50280.0, ans=0.1 2024-09-16 18:03:46,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-16 18:03:46,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=50320.0, ans=0.125 2024-09-16 18:03:51,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=50360.0, ans=0.125 2024-09-16 18:04:07,560 INFO [train.py:1198] (1/2) Epoch 3, batch 3550, loss[loss=0.3247, ctc_loss=0.2529, cr_loss=0.4512, attn_decoder_loss=0.3227, over 29705.00 frames. ], tot_loss[loss=0.317, ctc_loss=0.2609, cr_loss=0.4451, attn_decoder_loss=0.3133, over 5784186.87 frames. ], batch size: 89, lr: 3.00e-02, grad_scale: 4.0 2024-09-16 18:04:15,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=50400.0, ans=0.125 2024-09-16 18:04:20,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.27 vs. limit=22.5 2024-09-16 18:04:23,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=50440.0, ans=15.0 2024-09-16 18:04:24,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=50440.0, ans=0.0 2024-09-16 18:04:39,165 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:04:40,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-16 18:04:41,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-09-16 18:04:53,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=15.0 2024-09-16 18:04:55,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.391e+02 1.610e+02 2.091e+02 4.528e+02, threshold=3.220e+02, percent-clipped=5.0 2024-09-16 18:05:21,616 INFO [train.py:1198] (1/2) Epoch 3, batch 3600, loss[loss=0.2966, ctc_loss=0.2365, cr_loss=0.3999, attn_decoder_loss=0.2944, over 29495.00 frames. ], tot_loss[loss=0.3166, ctc_loss=0.2601, cr_loss=0.4448, attn_decoder_loss=0.3129, over 5792642.09 frames. ], batch size: 77, lr: 2.99e-02, grad_scale: 8.0 2024-09-16 18:06:30,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50760.0, ans=0.1 2024-09-16 18:06:35,930 INFO [train.py:1198] (1/2) Epoch 3, batch 3650, loss[loss=0.3209, ctc_loss=0.2511, cr_loss=0.4475, attn_decoder_loss=0.3188, over 29507.00 frames. ], tot_loss[loss=0.3153, ctc_loss=0.2581, cr_loss=0.4435, attn_decoder_loss=0.3118, over 5795458.80 frames. ], batch size: 90, lr: 2.99e-02, grad_scale: 4.0 2024-09-16 18:06:57,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=50840.0, ans=15.0 2024-09-16 18:07:00,290 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:07:07,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=50880.0, ans=0.0 2024-09-16 18:07:11,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=50880.0, ans=22.5 2024-09-16 18:07:12,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=50880.0, ans=0.125 2024-09-16 18:07:19,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=50920.0, ans=10.0 2024-09-16 18:07:25,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.018e+02 1.262e+02 1.447e+02 1.690e+02 1.332e+03, threshold=2.894e+02, percent-clipped=3.0 2024-09-16 18:07:27,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=50920.0, ans=0.125 2024-09-16 18:07:27,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=50920.0, ans=0.125 2024-09-16 18:07:27,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2024-09-16 18:07:40,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50960.0, ans=0.1 2024-09-16 18:07:50,876 INFO [train.py:1198] (1/2) Epoch 3, batch 3700, loss[loss=0.3205, ctc_loss=0.2572, cr_loss=0.4862, attn_decoder_loss=0.3167, over 29704.00 frames. ], tot_loss[loss=0.3152, ctc_loss=0.2578, cr_loss=0.4441, attn_decoder_loss=0.3118, over 5805305.67 frames. ], batch size: 84, lr: 2.99e-02, grad_scale: 8.0 2024-09-16 18:07:52,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=51000.0, ans=0.5 2024-09-16 18:07:54,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=51000.0, ans=0.125 2024-09-16 18:08:09,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=51040.0, ans=0.125 2024-09-16 18:08:41,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=51120.0, ans=0.2 2024-09-16 18:08:56,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=51160.0, ans=0.125 2024-09-16 18:09:09,217 INFO [train.py:1198] (1/2) Epoch 3, batch 3750, loss[loss=0.268, ctc_loss=0.2084, cr_loss=0.3814, attn_decoder_loss=0.2662, over 29291.00 frames. ], tot_loss[loss=0.3149, ctc_loss=0.2576, cr_loss=0.4442, attn_decoder_loss=0.3114, over 5808405.31 frames. ], batch size: 67, lr: 2.98e-02, grad_scale: 8.0 2024-09-16 18:09:18,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=51200.0, ans=0.05 2024-09-16 18:09:18,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2024-09-16 18:09:31,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=51240.0, ans=0.0 2024-09-16 18:09:32,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-09-16 18:09:58,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.044e+02 1.287e+02 1.522e+02 1.821e+02 1.090e+03, threshold=3.043e+02, percent-clipped=9.0 2024-09-16 18:10:00,468 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:10:04,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=51320.0, ans=0.05 2024-09-16 18:10:22,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=51400.0, ans=0.125 2024-09-16 18:10:23,886 INFO [train.py:1198] (1/2) Epoch 3, batch 3800, loss[loss=0.3288, ctc_loss=0.2764, cr_loss=0.4594, attn_decoder_loss=0.3244, over 29640.00 frames. ], tot_loss[loss=0.3145, ctc_loss=0.2573, cr_loss=0.4432, attn_decoder_loss=0.311, over 5797680.16 frames. ], batch size: 86, lr: 2.98e-02, grad_scale: 8.0 2024-09-16 18:10:27,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=51400.0, ans=0.0 2024-09-16 18:10:33,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=51400.0, ans=0.125 2024-09-16 18:10:33,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=51400.0, ans=15.0 2024-09-16 18:10:35,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=6.0 2024-09-16 18:10:36,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2024-09-16 18:10:54,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.96 vs. limit=22.5 2024-09-16 18:10:55,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=51480.0, ans=0.125 2024-09-16 18:11:38,185 INFO [train.py:1198] (1/2) Epoch 3, batch 3850, loss[loss=0.3337, ctc_loss=0.2686, cr_loss=0.4422, attn_decoder_loss=0.3311, over 29299.00 frames. ], tot_loss[loss=0.3138, ctc_loss=0.2562, cr_loss=0.443, attn_decoder_loss=0.3104, over 5811833.46 frames. ], batch size: 100, lr: 2.97e-02, grad_scale: 8.0 2024-09-16 18:11:59,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=51640.0, ans=0.2 2024-09-16 18:12:11,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=51680.0, ans=0.0 2024-09-16 18:12:27,161 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.321e+02 1.509e+02 1.752e+02 3.872e+02, threshold=3.018e+02, percent-clipped=1.0 2024-09-16 18:12:39,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=51760.0, ans=0.125 2024-09-16 18:12:43,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=51760.0, ans=0.2 2024-09-16 18:12:52,648 INFO [train.py:1198] (1/2) Epoch 3, batch 3900, loss[loss=0.3292, ctc_loss=0.267, cr_loss=0.4658, attn_decoder_loss=0.3258, over 29619.00 frames. ], tot_loss[loss=0.3143, ctc_loss=0.2563, cr_loss=0.4436, attn_decoder_loss=0.3109, over 5816195.00 frames. ], batch size: 86, lr: 2.97e-02, grad_scale: 8.0 2024-09-16 18:12:54,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=51800.0, ans=0.025 2024-09-16 18:12:54,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=51800.0, ans=0.125 2024-09-16 18:13:01,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-09-16 18:13:03,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=51800.0, ans=0.2 2024-09-16 18:13:06,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=51840.0, ans=0.125 2024-09-16 18:13:07,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=51840.0, ans=0.2 2024-09-16 18:13:16,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=51840.0, ans=0.2 2024-09-16 18:13:25,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=51880.0, ans=0.0 2024-09-16 18:13:34,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2024-09-16 18:13:48,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=51920.0, ans=0.0 2024-09-16 18:13:52,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.02 vs. limit=10.0 2024-09-16 18:13:52,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2024-09-16 18:14:02,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=51960.0, ans=0.125 2024-09-16 18:14:02,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=51960.0, ans=0.125 2024-09-16 18:14:06,787 INFO [train.py:1198] (1/2) Epoch 3, batch 3950, loss[loss=0.3263, ctc_loss=0.2674, cr_loss=0.4445, attn_decoder_loss=0.3229, over 29502.00 frames. ], tot_loss[loss=0.3142, ctc_loss=0.2559, cr_loss=0.444, attn_decoder_loss=0.3108, over 5835628.84 frames. ], batch size: 97, lr: 2.96e-02, grad_scale: 8.0 2024-09-16 18:14:11,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=52000.0, ans=0.125 2024-09-16 18:14:18,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=52000.0, ans=0.0 2024-09-16 18:14:42,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=52080.0, ans=0.125 2024-09-16 18:14:48,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52080.0, ans=0.1 2024-09-16 18:14:58,188 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.359e+02 1.491e+02 1.794e+02 3.719e+02, threshold=2.982e+02, percent-clipped=2.0 2024-09-16 18:15:22,980 INFO [train.py:1198] (1/2) Epoch 3, batch 4000, loss[loss=0.2915, ctc_loss=0.2346, cr_loss=0.4404, attn_decoder_loss=0.288, over 29535.00 frames. ], tot_loss[loss=0.3152, ctc_loss=0.2572, cr_loss=0.445, attn_decoder_loss=0.3118, over 5813357.60 frames. ], batch size: 74, lr: 2.96e-02, grad_scale: 16.0 2024-09-16 18:16:00,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=52280.0, ans=0.125 2024-09-16 18:16:16,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=52320.0, ans=0.1 2024-09-16 18:16:32,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=52360.0, ans=0.2 2024-09-16 18:16:34,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=52360.0, ans=0.125 2024-09-16 18:16:36,968 INFO [train.py:1198] (1/2) Epoch 3, batch 4050, loss[loss=0.3691, ctc_loss=0.3419, cr_loss=0.4795, attn_decoder_loss=0.3614, over 20307.00 frames. ], tot_loss[loss=0.3152, ctc_loss=0.2574, cr_loss=0.4448, attn_decoder_loss=0.3117, over 5798354.98 frames. ], batch size: 209, lr: 2.96e-02, grad_scale: 4.0 2024-09-16 18:16:44,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=52400.0, ans=0.1 2024-09-16 18:17:06,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=52480.0, ans=0.2 2024-09-16 18:17:06,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=52480.0, ans=0.125 2024-09-16 18:17:12,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=52480.0, ans=0.0 2024-09-16 18:17:18,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=52480.0, ans=0.05 2024-09-16 18:17:28,049 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.348e+02 1.567e+02 1.841e+02 9.373e+02, threshold=3.134e+02, percent-clipped=5.0 2024-09-16 18:17:28,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=52520.0, ans=0.0 2024-09-16 18:17:28,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=52520.0, ans=0.125 2024-09-16 18:17:32,842 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:17:41,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=52560.0, ans=0.0 2024-09-16 18:17:50,222 INFO [train.py:1198] (1/2) Epoch 3, batch 4100, loss[loss=0.3327, ctc_loss=0.2661, cr_loss=0.453, attn_decoder_loss=0.33, over 29514.00 frames. ], tot_loss[loss=0.3154, ctc_loss=0.2579, cr_loss=0.4454, attn_decoder_loss=0.3119, over 5793509.37 frames. ], batch size: 90, lr: 2.95e-02, grad_scale: 8.0 2024-09-16 18:17:50,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=52600.0, ans=0.125 2024-09-16 18:17:56,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=52600.0, ans=0.0 2024-09-16 18:18:33,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=52720.0, ans=0.0 2024-09-16 18:18:37,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=52720.0, ans=0.0 2024-09-16 18:18:47,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=52720.0, ans=0.05 2024-09-16 18:18:50,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=52760.0, ans=0.125 2024-09-16 18:18:51,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=52760.0, ans=0.125 2024-09-16 18:19:06,558 INFO [train.py:1198] (1/2) Epoch 3, batch 4150, loss[loss=0.3134, ctc_loss=0.2532, cr_loss=0.4826, attn_decoder_loss=0.3094, over 29512.00 frames. ], tot_loss[loss=0.3142, ctc_loss=0.2566, cr_loss=0.4438, attn_decoder_loss=0.3108, over 5798524.06 frames. ], batch size: 77, lr: 2.95e-02, grad_scale: 4.0 2024-09-16 18:19:14,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=52800.0, ans=0.125 2024-09-16 18:19:18,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=52800.0, ans=0.0 2024-09-16 18:19:21,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=52840.0, ans=0.0 2024-09-16 18:19:21,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=52840.0, ans=0.125 2024-09-16 18:19:26,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=12.0 2024-09-16 18:19:37,556 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:19:54,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=52920.0, ans=0.125 2024-09-16 18:19:59,721 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.278e+02 1.455e+02 1.672e+02 3.435e+02, threshold=2.910e+02, percent-clipped=1.0 2024-09-16 18:20:04,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=52960.0, ans=0.2 2024-09-16 18:20:11,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=52960.0, ans=0.125 2024-09-16 18:20:19,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=53000.0, ans=0.0 2024-09-16 18:20:20,392 INFO [train.py:1198] (1/2) Epoch 3, batch 4200, loss[loss=0.3433, ctc_loss=0.2882, cr_loss=0.4838, attn_decoder_loss=0.3386, over 29538.00 frames. ], tot_loss[loss=0.3144, ctc_loss=0.2562, cr_loss=0.4447, attn_decoder_loss=0.3109, over 5800806.46 frames. ], batch size: 90, lr: 2.94e-02, grad_scale: 8.0 2024-09-16 18:20:25,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=53000.0, ans=0.125 2024-09-16 18:20:35,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=53040.0, ans=0.0 2024-09-16 18:20:42,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=53040.0, ans=0.125 2024-09-16 18:20:43,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.82 vs. limit=10.0 2024-09-16 18:21:12,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=53120.0, ans=0.1 2024-09-16 18:21:34,065 INFO [train.py:1198] (1/2) Epoch 3, batch 4250, loss[loss=0.2973, ctc_loss=0.2287, cr_loss=0.4458, attn_decoder_loss=0.295, over 29522.00 frames. ], tot_loss[loss=0.3147, ctc_loss=0.2564, cr_loss=0.4451, attn_decoder_loss=0.3113, over 5805922.07 frames. ], batch size: 74, lr: 2.94e-02, grad_scale: 4.0 2024-09-16 18:21:37,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=53200.0, ans=0.125 2024-09-16 18:21:38,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=53200.0, ans=0.0 2024-09-16 18:21:47,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=53240.0, ans=0.0 2024-09-16 18:21:48,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=53240.0, ans=0.125 2024-09-16 18:22:03,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=53280.0, ans=0.125 2024-09-16 18:22:09,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=53280.0, ans=0.125 2024-09-16 18:22:26,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=53320.0, ans=0.125 2024-09-16 18:22:28,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=53320.0, ans=0.0 2024-09-16 18:22:29,120 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.928e+01 1.354e+02 1.567e+02 1.958e+02 1.183e+03, threshold=3.135e+02, percent-clipped=4.0 2024-09-16 18:22:32,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=53360.0, ans=0.07 2024-09-16 18:22:36,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=53360.0, ans=0.125 2024-09-16 18:22:47,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=53400.0, ans=0.125 2024-09-16 18:22:49,052 INFO [train.py:1198] (1/2) Epoch 3, batch 4300, loss[loss=0.3363, ctc_loss=0.2681, cr_loss=0.4755, attn_decoder_loss=0.3333, over 29513.00 frames. ], tot_loss[loss=0.3147, ctc_loss=0.2562, cr_loss=0.4447, attn_decoder_loss=0.3113, over 5794611.23 frames. ], batch size: 87, lr: 2.93e-02, grad_scale: 8.0 2024-09-16 18:22:58,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-09-16 18:23:10,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=53440.0, ans=0.125 2024-09-16 18:23:23,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=53480.0, ans=0.125 2024-09-16 18:23:39,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=53520.0, ans=0.125 2024-09-16 18:23:41,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=53520.0, ans=0.0 2024-09-16 18:23:50,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=53560.0, ans=0.125 2024-09-16 18:24:03,892 INFO [train.py:1198] (1/2) Epoch 3, batch 4350, loss[loss=0.3251, ctc_loss=0.2707, cr_loss=0.4548, attn_decoder_loss=0.321, over 29451.00 frames. ], tot_loss[loss=0.3186, ctc_loss=0.26, cr_loss=0.4495, attn_decoder_loss=0.3151, over 5795894.44 frames. ], batch size: 97, lr: 2.93e-02, grad_scale: 4.0 2024-09-16 18:24:44,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=53680.0, ans=0.125 2024-09-16 18:24:45,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.67 vs. limit=22.5 2024-09-16 18:24:47,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=53720.0, ans=0.125 2024-09-16 18:24:49,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=53720.0, ans=0.125 2024-09-16 18:24:59,247 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.313e+02 1.497e+02 1.843e+02 5.151e+02, threshold=2.995e+02, percent-clipped=3.0 2024-09-16 18:25:06,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=53760.0, ans=0.0 2024-09-16 18:25:10,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=53760.0, ans=0.0 2024-09-16 18:25:17,590 INFO [train.py:1198] (1/2) Epoch 3, batch 4400, loss[loss=0.3349, ctc_loss=0.2848, cr_loss=0.4495, attn_decoder_loss=0.3305, over 27358.00 frames. ], tot_loss[loss=0.3215, ctc_loss=0.2633, cr_loss=0.4518, attn_decoder_loss=0.318, over 5766961.94 frames. ], batch size: 124, lr: 2.93e-02, grad_scale: 8.0 2024-09-16 18:25:19,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=53800.0, ans=0.0 2024-09-16 18:25:23,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=53800.0, ans=0.0 2024-09-16 18:25:28,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=53800.0, ans=0.125 2024-09-16 18:25:28,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=15.0 2024-09-16 18:25:29,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=53800.0, ans=0.1 2024-09-16 18:25:32,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.69 vs. limit=10.0 2024-09-16 18:25:38,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2024-09-16 18:25:56,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=53880.0, ans=10.0 2024-09-16 18:25:56,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=53880.0, ans=0.125 2024-09-16 18:26:18,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=53960.0, ans=0.125 2024-09-16 18:26:30,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54000.0, ans=0.1 2024-09-16 18:26:31,923 INFO [train.py:1198] (1/2) Epoch 3, batch 4450, loss[loss=0.3462, ctc_loss=0.3142, cr_loss=0.4457, attn_decoder_loss=0.3399, over 20196.00 frames. ], tot_loss[loss=0.3258, ctc_loss=0.2707, cr_loss=0.4545, attn_decoder_loss=0.3219, over 5574534.42 frames. ], batch size: 209, lr: 2.92e-02, grad_scale: 8.0 2024-09-16 18:26:43,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=11.23 vs. limit=10.0 2024-09-16 18:26:58,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=54040.0, ans=0.1 2024-09-16 18:26:59,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=54040.0, ans=0.125 2024-09-16 18:27:01,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.70 vs. limit=15.0 2024-09-16 18:27:01,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2024-09-16 18:27:09,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.76 vs. limit=15.0 2024-09-16 18:27:14,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=54080.0, ans=0.0 2024-09-16 18:27:28,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=54120.0, ans=0.125 2024-09-16 18:27:29,242 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.938e+01 1.292e+02 1.431e+02 1.663e+02 2.911e+02, threshold=2.863e+02, percent-clipped=0.0 2024-09-16 18:27:47,155 INFO [train.py:1198] (1/2) Epoch 3, batch 4500, loss[loss=0.3526, ctc_loss=0.3368, cr_loss=0.444, attn_decoder_loss=0.3445, over 20125.00 frames. ], tot_loss[loss=0.3307, ctc_loss=0.2806, cr_loss=0.4546, attn_decoder_loss=0.3262, over 5233880.25 frames. ], batch size: 209, lr: 2.92e-02, grad_scale: 8.0 2024-09-16 18:28:11,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=54240.0, ans=0.0 2024-09-16 18:29:13,313 INFO [train.py:1198] (1/2) Epoch 4, batch 0, loss[loss=0.4052, ctc_loss=0.2539, cr_loss=0.4238, attn_decoder_loss=0.4126, over 29612.00 frames. ], tot_loss[loss=0.4052, ctc_loss=0.2539, cr_loss=0.4238, attn_decoder_loss=0.4126, over 29612.00 frames. ], batch size: 73, lr: 2.73e-02, grad_scale: 4.0 2024-09-16 18:29:13,314 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 18:29:31,685 INFO [train.py:1230] (1/2) Epoch 4, validation: loss=0.259, ctc_loss=0.0933, cr_loss=4.939e-15, attn_decoder_loss=0.2774, over 944034.00 frames. 2024-09-16 18:29:31,685 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 18:29:47,067 WARNING [optim.py:503] (1/2) Scaling gradients by 0.06680610030889511, model_norm_threshold=286.2942810058594 2024-09-16 18:29:47,278 WARNING [optim.py:575] (1/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.1.self_attn.linear_k.weight with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.084e+06, grad_sumsq=4.710e+06, orig_rms_sq=1.079e+00 2024-09-16 18:29:56,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=54340.0, ans=0.125 2024-09-16 18:30:07,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=54380.0, ans=0.125 2024-09-16 18:30:19,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=22.5 2024-09-16 18:30:42,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=54460.0, ans=0.1 2024-09-16 18:30:44,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.88 vs. limit=15.0 2024-09-16 18:30:51,636 INFO [train.py:1198] (1/2) Epoch 4, batch 50, loss[loss=0.2885, ctc_loss=0.2383, cr_loss=0.425, attn_decoder_loss=0.2847, over 29424.00 frames. ], tot_loss[loss=0.3248, ctc_loss=0.2645, cr_loss=0.4464, attn_decoder_loss=0.3216, over 1267681.34 frames. ], batch size: 70, lr: 2.72e-02, grad_scale: 2.0 2024-09-16 18:30:56,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=54500.0, ans=0.125 2024-09-16 18:31:05,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=54540.0, ans=0.125 2024-09-16 18:31:07,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2024-09-16 18:31:12,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2024-09-16 18:31:15,933 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.005e+02 1.251e+02 1.386e+02 1.651e+02 4.285e+03, threshold=2.772e+02, percent-clipped=8.0 2024-09-16 18:31:16,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=54540.0, ans=0.0 2024-09-16 18:31:17,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=54540.0, ans=0.0 2024-09-16 18:31:26,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=54580.0, ans=0.0 2024-09-16 18:31:31,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=54580.0, ans=0.0 2024-09-16 18:31:37,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=54620.0, ans=0.035 2024-09-16 18:31:38,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=54620.0, ans=10.0 2024-09-16 18:31:40,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.44 vs. limit=22.5 2024-09-16 18:32:07,180 INFO [train.py:1198] (1/2) Epoch 4, batch 100, loss[loss=0.3149, ctc_loss=0.2645, cr_loss=0.4838, attn_decoder_loss=0.3097, over 29536.00 frames. ], tot_loss[loss=0.3223, ctc_loss=0.2628, cr_loss=0.4493, attn_decoder_loss=0.3189, over 2251133.36 frames. ], batch size: 76, lr: 2.72e-02, grad_scale: 4.0 2024-09-16 18:32:23,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.64 vs. limit=15.0 2024-09-16 18:32:30,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=54740.0, ans=0.2 2024-09-16 18:32:33,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2024-09-16 18:32:40,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=54780.0, ans=0.125 2024-09-16 18:32:52,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=54820.0, ans=0.125 2024-09-16 18:32:57,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=54820.0, ans=0.0 2024-09-16 18:33:11,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=54860.0, ans=0.125 2024-09-16 18:33:23,762 INFO [train.py:1198] (1/2) Epoch 4, batch 150, loss[loss=0.2878, ctc_loss=0.2349, cr_loss=0.3971, attn_decoder_loss=0.2849, over 29452.00 frames. ], tot_loss[loss=0.3163, ctc_loss=0.2567, cr_loss=0.4451, attn_decoder_loss=0.313, over 3046515.03 frames. ], batch size: 70, lr: 2.72e-02, grad_scale: 4.0 2024-09-16 18:33:25,461 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:33:27,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=54900.0, ans=0.125 2024-09-16 18:33:36,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=54900.0, ans=0.125 2024-09-16 18:33:48,184 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.258e+02 1.425e+02 1.595e+02 3.260e+02, threshold=2.849e+02, percent-clipped=3.0 2024-09-16 18:34:00,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54980.0, ans=0.1 2024-09-16 18:34:06,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=54980.0, ans=0.125 2024-09-16 18:34:38,961 INFO [train.py:1198] (1/2) Epoch 4, batch 200, loss[loss=0.3337, ctc_loss=0.2811, cr_loss=0.4558, attn_decoder_loss=0.3295, over 27494.00 frames. ], tot_loss[loss=0.3136, ctc_loss=0.2536, cr_loss=0.4427, attn_decoder_loss=0.3104, over 3658327.97 frames. ], batch size: 125, lr: 2.71e-02, grad_scale: 8.0 2024-09-16 18:34:54,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2024-09-16 18:35:01,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=55140.0, ans=15.0 2024-09-16 18:35:05,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=55140.0, ans=0.125 2024-09-16 18:35:10,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.27 vs. limit=15.0 2024-09-16 18:35:14,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=55180.0, ans=0.0 2024-09-16 18:35:22,528 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:35:25,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55220.0, ans=0.1 2024-09-16 18:35:34,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=55220.0, ans=0.2 2024-09-16 18:35:43,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=55260.0, ans=0.0 2024-09-16 18:35:43,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=55260.0, ans=0.0 2024-09-16 18:35:49,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-16 18:35:56,984 INFO [train.py:1198] (1/2) Epoch 4, batch 250, loss[loss=0.3228, ctc_loss=0.2624, cr_loss=0.4708, attn_decoder_loss=0.319, over 29262.00 frames. ], tot_loss[loss=0.3122, ctc_loss=0.2514, cr_loss=0.4424, attn_decoder_loss=0.3091, over 4140471.40 frames. ], batch size: 100, lr: 2.71e-02, grad_scale: 4.0 2024-09-16 18:36:03,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=55300.0, ans=0.07 2024-09-16 18:36:09,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2024-09-16 18:36:22,542 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.428e+01 1.364e+02 1.529e+02 1.729e+02 3.264e+02, threshold=3.057e+02, percent-clipped=1.0 2024-09-16 18:36:35,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=55380.0, ans=0.025 2024-09-16 18:36:35,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=55380.0, ans=0.125 2024-09-16 18:36:59,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=55460.0, ans=0.125 2024-09-16 18:37:12,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-09-16 18:37:14,449 INFO [train.py:1198] (1/2) Epoch 4, batch 300, loss[loss=0.3305, ctc_loss=0.2663, cr_loss=0.4438, attn_decoder_loss=0.3278, over 29494.00 frames. ], tot_loss[loss=0.3118, ctc_loss=0.2511, cr_loss=0.4419, attn_decoder_loss=0.3087, over 4509608.61 frames. ], batch size: 92, lr: 2.70e-02, grad_scale: 8.0 2024-09-16 18:37:18,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.54 vs. limit=15.0 2024-09-16 18:37:41,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=55540.0, ans=0.1 2024-09-16 18:37:46,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=55580.0, ans=0.125 2024-09-16 18:37:47,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=55580.0, ans=0.125 2024-09-16 18:38:29,982 INFO [train.py:1198] (1/2) Epoch 4, batch 350, loss[loss=0.2648, ctc_loss=0.2034, cr_loss=0.3633, attn_decoder_loss=0.2635, over 29312.00 frames. ], tot_loss[loss=0.3117, ctc_loss=0.251, cr_loss=0.4422, attn_decoder_loss=0.3086, over 4794815.25 frames. ], batch size: 71, lr: 2.70e-02, grad_scale: 8.0 2024-09-16 18:38:37,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=55700.0, ans=0.125 2024-09-16 18:38:54,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.92 vs. limit=10.0 2024-09-16 18:38:55,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=55740.0, ans=0.07 2024-09-16 18:38:59,297 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.338e+02 1.528e+02 1.849e+02 4.816e+02, threshold=3.056e+02, percent-clipped=1.0 2024-09-16 18:39:09,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.54 vs. limit=15.0 2024-09-16 18:39:22,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=55820.0, ans=0.1 2024-09-16 18:39:30,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=15.0 2024-09-16 18:39:31,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=55860.0, ans=0.09899494936611666 2024-09-16 18:39:46,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=55900.0, ans=0.0 2024-09-16 18:39:47,899 INFO [train.py:1198] (1/2) Epoch 4, batch 400, loss[loss=0.3152, ctc_loss=0.2561, cr_loss=0.4715, attn_decoder_loss=0.3113, over 29701.00 frames. ], tot_loss[loss=0.3106, ctc_loss=0.2494, cr_loss=0.4415, attn_decoder_loss=0.3076, over 5024418.47 frames. ], batch size: 82, lr: 2.70e-02, grad_scale: 8.0 2024-09-16 18:39:49,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=55900.0, ans=0.2 2024-09-16 18:39:54,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=55900.0, ans=0.125 2024-09-16 18:40:10,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55940.0, ans=0.1 2024-09-16 18:40:13,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=55940.0, ans=0.125 2024-09-16 18:40:27,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=55980.0, ans=0.125 2024-09-16 18:40:48,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=12.0 2024-09-16 18:41:05,939 INFO [train.py:1198] (1/2) Epoch 4, batch 450, loss[loss=0.309, ctc_loss=0.2412, cr_loss=0.432, attn_decoder_loss=0.3069, over 29684.00 frames. ], tot_loss[loss=0.3105, ctc_loss=0.2494, cr_loss=0.4418, attn_decoder_loss=0.3075, over 5187008.35 frames. ], batch size: 83, lr: 2.69e-02, grad_scale: 8.0 2024-09-16 18:41:34,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.042e+02 1.288e+02 1.422e+02 1.644e+02 6.882e+02, threshold=2.845e+02, percent-clipped=3.0 2024-09-16 18:41:53,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=14.42 vs. limit=15.0 2024-09-16 18:42:03,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=56220.0, ans=0.2 2024-09-16 18:42:15,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.65 vs. limit=12.0 2024-09-16 18:42:15,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=56260.0, ans=0.125 2024-09-16 18:42:21,532 INFO [train.py:1198] (1/2) Epoch 4, batch 500, loss[loss=0.3301, ctc_loss=0.2654, cr_loss=0.4564, attn_decoder_loss=0.3271, over 29452.00 frames. ], tot_loss[loss=0.3093, ctc_loss=0.2481, cr_loss=0.4407, attn_decoder_loss=0.3063, over 5328453.18 frames. ], batch size: 94, lr: 2.69e-02, grad_scale: 8.0 2024-09-16 18:42:33,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2024-09-16 18:42:38,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=56340.0, ans=0.1 2024-09-16 18:42:45,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.98 vs. limit=10.0 2024-09-16 18:42:48,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=56340.0, ans=0.0 2024-09-16 18:42:52,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=56380.0, ans=0.5 2024-09-16 18:42:58,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=56380.0, ans=0.5 2024-09-16 18:43:12,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=56420.0, ans=0.1 2024-09-16 18:43:33,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56460.0, ans=0.1 2024-09-16 18:43:37,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=56500.0, ans=0.125 2024-09-16 18:43:37,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=56500.0, ans=0.125 2024-09-16 18:43:38,983 INFO [train.py:1198] (1/2) Epoch 4, batch 550, loss[loss=0.3381, ctc_loss=0.2802, cr_loss=0.4876, attn_decoder_loss=0.3337, over 28815.00 frames. ], tot_loss[loss=0.3097, ctc_loss=0.2486, cr_loss=0.4401, attn_decoder_loss=0.3067, over 5421303.24 frames. ], batch size: 104, lr: 2.69e-02, grad_scale: 8.0 2024-09-16 18:43:52,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=22.5 2024-09-16 18:44:03,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=56540.0, ans=0.0 2024-09-16 18:44:05,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=56540.0, ans=0.0 2024-09-16 18:44:09,213 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.307e+02 1.429e+02 1.661e+02 4.927e+02, threshold=2.859e+02, percent-clipped=1.0 2024-09-16 18:44:36,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=56620.0, ans=0.125 2024-09-16 18:44:39,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56660.0, ans=0.1 2024-09-16 18:44:47,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2024-09-16 18:44:49,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=56660.0, ans=0.1 2024-09-16 18:44:56,983 INFO [train.py:1198] (1/2) Epoch 4, batch 600, loss[loss=0.3324, ctc_loss=0.2746, cr_loss=0.4422, attn_decoder_loss=0.329, over 29236.00 frames. ], tot_loss[loss=0.3099, ctc_loss=0.2484, cr_loss=0.4414, attn_decoder_loss=0.3069, over 5508279.28 frames. ], batch size: 100, lr: 2.68e-02, grad_scale: 8.0 2024-09-16 18:45:15,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=56740.0, ans=0.025 2024-09-16 18:45:16,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=56740.0, ans=0.125 2024-09-16 18:45:24,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=56740.0, ans=0.125 2024-09-16 18:45:25,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56780.0, ans=0.1 2024-09-16 18:46:12,467 INFO [train.py:1198] (1/2) Epoch 4, batch 650, loss[loss=0.311, ctc_loss=0.249, cr_loss=0.4257, attn_decoder_loss=0.3085, over 29766.00 frames. ], tot_loss[loss=0.3089, ctc_loss=0.2471, cr_loss=0.44, attn_decoder_loss=0.306, over 5585372.46 frames. ], batch size: 81, lr: 2.68e-02, grad_scale: 4.0 2024-09-16 18:46:21,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=56900.0, ans=0.125 2024-09-16 18:46:29,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=56940.0, ans=0.95 2024-09-16 18:46:41,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=56980.0, ans=0.125 2024-09-16 18:46:46,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.273e+02 1.380e+02 1.624e+02 3.709e+02, threshold=2.760e+02, percent-clipped=3.0 2024-09-16 18:46:49,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=56980.0, ans=0.5 2024-09-16 18:46:54,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=56980.0, ans=0.1 2024-09-16 18:47:06,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.83 vs. limit=10.0 2024-09-16 18:47:30,025 INFO [train.py:1198] (1/2) Epoch 4, batch 700, loss[loss=0.2906, ctc_loss=0.2285, cr_loss=0.4173, attn_decoder_loss=0.2883, over 29531.00 frames. ], tot_loss[loss=0.3095, ctc_loss=0.2477, cr_loss=0.4408, attn_decoder_loss=0.3066, over 5635399.26 frames. ], batch size: 76, lr: 2.67e-02, grad_scale: 8.0 2024-09-16 18:47:38,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-09-16 18:47:46,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=57140.0, ans=0.125 2024-09-16 18:47:47,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.00 vs. limit=10.0 2024-09-16 18:47:54,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-09-16 18:47:57,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=57140.0, ans=0.0 2024-09-16 18:48:02,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=57180.0, ans=0.125 2024-09-16 18:48:08,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=57180.0, ans=0.2 2024-09-16 18:48:16,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57220.0, ans=0.1 2024-09-16 18:48:20,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=6.23 vs. limit=12.0 2024-09-16 18:48:37,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=57260.0, ans=0.0 2024-09-16 18:48:40,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57260.0, ans=0.1 2024-09-16 18:48:43,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=57260.0, ans=15.0 2024-09-16 18:48:46,078 INFO [train.py:1198] (1/2) Epoch 4, batch 750, loss[loss=0.3241, ctc_loss=0.2541, cr_loss=0.4501, attn_decoder_loss=0.3219, over 29717.00 frames. ], tot_loss[loss=0.3088, ctc_loss=0.2471, cr_loss=0.4402, attn_decoder_loss=0.3059, over 5674634.65 frames. ], batch size: 82, lr: 2.67e-02, grad_scale: 4.0 2024-09-16 18:48:52,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=57300.0, ans=0.0 2024-09-16 18:49:13,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=57340.0, ans=0.125 2024-09-16 18:49:21,183 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.371e+02 1.558e+02 1.817e+02 5.424e+02, threshold=3.116e+02, percent-clipped=2.0 2024-09-16 18:49:23,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=57380.0, ans=0.125 2024-09-16 18:49:41,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=57420.0, ans=0.0 2024-09-16 18:49:54,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=57460.0, ans=0.125 2024-09-16 18:50:03,607 INFO [train.py:1198] (1/2) Epoch 4, batch 800, loss[loss=0.2801, ctc_loss=0.22, cr_loss=0.3866, attn_decoder_loss=0.2782, over 29598.00 frames. ], tot_loss[loss=0.3083, ctc_loss=0.2466, cr_loss=0.4398, attn_decoder_loss=0.3054, over 5705113.11 frames. ], batch size: 73, lr: 2.67e-02, grad_scale: 8.0 2024-09-16 18:50:20,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=57540.0, ans=0.125 2024-09-16 18:50:35,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=57580.0, ans=0.0 2024-09-16 18:50:36,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.84 vs. limit=15.0 2024-09-16 18:50:47,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.21 vs. limit=22.5 2024-09-16 18:50:52,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=57620.0, ans=0.5 2024-09-16 18:51:06,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=57660.0, ans=0.125 2024-09-16 18:51:20,830 INFO [train.py:1198] (1/2) Epoch 4, batch 850, loss[loss=0.3179, ctc_loss=0.2498, cr_loss=0.4674, attn_decoder_loss=0.3151, over 29698.00 frames. ], tot_loss[loss=0.308, ctc_loss=0.246, cr_loss=0.4398, attn_decoder_loss=0.3051, over 5735864.04 frames. ], batch size: 89, lr: 2.66e-02, grad_scale: 4.0 2024-09-16 18:51:24,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=57700.0, ans=0.125 2024-09-16 18:51:32,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=57700.0, ans=0.125 2024-09-16 18:51:52,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=57780.0, ans=0.0 2024-09-16 18:51:55,364 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.339e+02 1.546e+02 1.753e+02 3.025e+02, threshold=3.091e+02, percent-clipped=0.0 2024-09-16 18:51:57,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=57780.0, ans=0.125 2024-09-16 18:52:36,400 INFO [train.py:1198] (1/2) Epoch 4, batch 900, loss[loss=0.2811, ctc_loss=0.2172, cr_loss=0.437, attn_decoder_loss=0.2785, over 29581.00 frames. ], tot_loss[loss=0.3084, ctc_loss=0.2467, cr_loss=0.4408, attn_decoder_loss=0.3055, over 5740917.23 frames. ], batch size: 73, lr: 2.66e-02, grad_scale: 8.0 2024-09-16 18:52:47,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57900.0, ans=0.1 2024-09-16 18:53:05,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=57940.0, ans=0.1 2024-09-16 18:53:41,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=58060.0, ans=0.0 2024-09-16 18:53:52,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58100.0, ans=0.1 2024-09-16 18:53:53,340 INFO [train.py:1198] (1/2) Epoch 4, batch 950, loss[loss=0.2802, ctc_loss=0.2192, cr_loss=0.3988, attn_decoder_loss=0.2781, over 29523.00 frames. ], tot_loss[loss=0.3091, ctc_loss=0.2476, cr_loss=0.4418, attn_decoder_loss=0.3061, over 5743885.03 frames. ], batch size: 74, lr: 2.66e-02, grad_scale: 4.0 2024-09-16 18:53:54,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2024-09-16 18:54:29,621 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.318e+02 1.459e+02 1.683e+02 8.183e+02, threshold=2.918e+02, percent-clipped=3.0 2024-09-16 18:55:08,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=58260.0, ans=0.0 2024-09-16 18:55:10,833 INFO [train.py:1198] (1/2) Epoch 4, batch 1000, loss[loss=0.2834, ctc_loss=0.2138, cr_loss=0.4122, attn_decoder_loss=0.2819, over 29511.00 frames. ], tot_loss[loss=0.3095, ctc_loss=0.2478, cr_loss=0.4417, attn_decoder_loss=0.3065, over 5737047.40 frames. ], batch size: 77, lr: 2.65e-02, grad_scale: 8.0 2024-09-16 18:55:24,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=58340.0, ans=0.0 2024-09-16 18:55:42,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=58380.0, ans=0.0 2024-09-16 18:55:46,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2024-09-16 18:55:49,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=58380.0, ans=0.2 2024-09-16 18:55:52,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=58380.0, ans=0.0 2024-09-16 18:55:52,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.29 vs. limit=15.0 2024-09-16 18:55:53,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.04 vs. limit=10.0 2024-09-16 18:56:28,231 INFO [train.py:1198] (1/2) Epoch 4, batch 1050, loss[loss=0.3264, ctc_loss=0.2681, cr_loss=0.454, attn_decoder_loss=0.3228, over 29683.00 frames. ], tot_loss[loss=0.3078, ctc_loss=0.2461, cr_loss=0.44, attn_decoder_loss=0.3049, over 5744033.92 frames. ], batch size: 85, lr: 2.65e-02, grad_scale: 4.0 2024-09-16 18:56:54,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=58540.0, ans=0.125 2024-09-16 18:56:55,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=58540.0, ans=0.125 2024-09-16 18:56:55,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=58540.0, ans=0.0 2024-09-16 18:57:06,149 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.263e+02 1.458e+02 1.745e+02 4.654e+02, threshold=2.917e+02, percent-clipped=3.0 2024-09-16 18:57:13,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=58620.0, ans=0.125 2024-09-16 18:57:20,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2024-09-16 18:57:33,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=58660.0, ans=0.125 2024-09-16 18:57:43,659 INFO [train.py:1198] (1/2) Epoch 4, batch 1100, loss[loss=0.3102, ctc_loss=0.2473, cr_loss=0.4598, attn_decoder_loss=0.307, over 29433.00 frames. ], tot_loss[loss=0.3072, ctc_loss=0.2453, cr_loss=0.4402, attn_decoder_loss=0.3043, over 5755581.84 frames. ], batch size: 78, lr: 2.65e-02, grad_scale: 8.0 2024-09-16 18:58:01,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.91 vs. limit=22.5 2024-09-16 18:58:29,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=58820.0, ans=0.125 2024-09-16 18:59:01,079 INFO [train.py:1198] (1/2) Epoch 4, batch 1150, loss[loss=0.3074, ctc_loss=0.2486, cr_loss=0.453, attn_decoder_loss=0.3039, over 29453.00 frames. ], tot_loss[loss=0.3078, ctc_loss=0.246, cr_loss=0.4407, attn_decoder_loss=0.3049, over 5755397.91 frames. ], batch size: 78, lr: 2.64e-02, grad_scale: 4.0 2024-09-16 18:59:04,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=58900.0, ans=0.125 2024-09-16 18:59:15,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=58940.0, ans=0.125 2024-09-16 18:59:23,179 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-09-16 18:59:25,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=58940.0, ans=0.125 2024-09-16 18:59:30,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=58980.0, ans=0.09899494936611666 2024-09-16 18:59:40,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.556e+01 1.271e+02 1.479e+02 1.697e+02 4.647e+02, threshold=2.959e+02, percent-clipped=3.0 2024-09-16 18:59:50,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=59020.0, ans=0.0 2024-09-16 18:59:56,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=59020.0, ans=0.0 2024-09-16 19:00:18,989 INFO [train.py:1198] (1/2) Epoch 4, batch 1200, loss[loss=0.3082, ctc_loss=0.2321, cr_loss=0.4198, attn_decoder_loss=0.3074, over 29690.00 frames. ], tot_loss[loss=0.3087, ctc_loss=0.2467, cr_loss=0.4413, attn_decoder_loss=0.3057, over 5748009.67 frames. ], batch size: 85, lr: 2.64e-02, grad_scale: 8.0 2024-09-16 19:01:02,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=12.0 2024-09-16 19:01:03,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=59220.0, ans=0.1 2024-09-16 19:01:09,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59220.0, ans=0.1 2024-09-16 19:01:25,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-16 19:01:34,922 INFO [train.py:1198] (1/2) Epoch 4, batch 1250, loss[loss=0.3189, ctc_loss=0.2482, cr_loss=0.4318, attn_decoder_loss=0.3171, over 29530.00 frames. ], tot_loss[loss=0.309, ctc_loss=0.2468, cr_loss=0.4425, attn_decoder_loss=0.3061, over 5775614.23 frames. ], batch size: 92, lr: 2.63e-02, grad_scale: 4.0 2024-09-16 19:02:03,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=59380.0, ans=0.5 2024-09-16 19:02:15,815 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.296e+02 1.466e+02 1.683e+02 4.153e+02, threshold=2.932e+02, percent-clipped=2.0 2024-09-16 19:02:17,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=59380.0, ans=0.125 2024-09-16 19:02:39,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59460.0, ans=0.1 2024-09-16 19:02:52,639 INFO [train.py:1198] (1/2) Epoch 4, batch 1300, loss[loss=0.3163, ctc_loss=0.2595, cr_loss=0.4401, attn_decoder_loss=0.3129, over 28362.00 frames. ], tot_loss[loss=0.3076, ctc_loss=0.2453, cr_loss=0.4412, attn_decoder_loss=0.3047, over 5779063.11 frames. ], batch size: 111, lr: 2.63e-02, grad_scale: 8.0 2024-09-16 19:03:00,674 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:03:23,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=59580.0, ans=0.0 2024-09-16 19:03:27,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=59580.0, ans=0.125 2024-09-16 19:04:08,493 INFO [train.py:1198] (1/2) Epoch 4, batch 1350, loss[loss=0.3093, ctc_loss=0.2538, cr_loss=0.4366, attn_decoder_loss=0.3058, over 29752.00 frames. ], tot_loss[loss=0.307, ctc_loss=0.2442, cr_loss=0.4407, attn_decoder_loss=0.3042, over 5797097.11 frames. ], batch size: 81, lr: 2.63e-02, grad_scale: 4.0 2024-09-16 19:04:15,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=59700.0, ans=0.125 2024-09-16 19:04:21,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59700.0, ans=0.1 2024-09-16 19:04:44,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=59780.0, ans=0.0 2024-09-16 19:04:45,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=59780.0, ans=0.2 2024-09-16 19:04:47,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-09-16 19:04:51,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=59780.0, ans=0.0 2024-09-16 19:04:52,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.260e+02 1.419e+02 1.691e+02 3.213e+02, threshold=2.838e+02, percent-clipped=1.0 2024-09-16 19:05:02,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.09 vs. limit=22.5 2024-09-16 19:05:04,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=59820.0, ans=0.1 2024-09-16 19:05:24,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=59900.0, ans=0.0 2024-09-16 19:05:25,807 INFO [train.py:1198] (1/2) Epoch 4, batch 1400, loss[loss=0.2645, ctc_loss=0.1959, cr_loss=0.3777, attn_decoder_loss=0.2637, over 29570.00 frames. ], tot_loss[loss=0.3067, ctc_loss=0.2436, cr_loss=0.4402, attn_decoder_loss=0.304, over 5808160.36 frames. ], batch size: 69, lr: 2.62e-02, grad_scale: 8.0 2024-09-16 19:05:32,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=59900.0, ans=0.125 2024-09-16 19:05:41,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=59940.0, ans=0.125 2024-09-16 19:05:46,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2024-09-16 19:05:51,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=59940.0, ans=0.125 2024-09-16 19:06:02,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=59980.0, ans=0.125 2024-09-16 19:06:26,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=60060.0, ans=0.2 2024-09-16 19:06:29,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=60060.0, ans=0.2 2024-09-16 19:06:38,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.63 vs. limit=22.5 2024-09-16 19:06:43,762 INFO [train.py:1198] (1/2) Epoch 4, batch 1450, loss[loss=0.3081, ctc_loss=0.2406, cr_loss=0.4614, attn_decoder_loss=0.3054, over 29437.00 frames. ], tot_loss[loss=0.3074, ctc_loss=0.2442, cr_loss=0.4411, attn_decoder_loss=0.3046, over 5804177.16 frames. ], batch size: 94, lr: 2.62e-02, grad_scale: 4.0 2024-09-16 19:07:01,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=60140.0, ans=15.0 2024-09-16 19:07:21,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=60180.0, ans=0.0 2024-09-16 19:07:24,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=60180.0, ans=0.0 2024-09-16 19:07:26,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=60180.0, ans=0.125 2024-09-16 19:07:26,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=60180.0, ans=0.09899494936611666 2024-09-16 19:07:27,554 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.279e+02 1.464e+02 1.663e+02 3.366e+02, threshold=2.927e+02, percent-clipped=3.0 2024-09-16 19:07:28,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2024-09-16 19:07:36,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=60220.0, ans=0.125 2024-09-16 19:07:38,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.61 vs. limit=15.0 2024-09-16 19:07:57,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60300.0, ans=0.1 2024-09-16 19:07:59,067 INFO [train.py:1198] (1/2) Epoch 4, batch 1500, loss[loss=0.3173, ctc_loss=0.2494, cr_loss=0.456, attn_decoder_loss=0.3147, over 29635.00 frames. ], tot_loss[loss=0.3074, ctc_loss=0.2439, cr_loss=0.4417, attn_decoder_loss=0.3046, over 5805261.19 frames. ], batch size: 86, lr: 2.62e-02, grad_scale: 8.0 2024-09-16 19:08:25,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=60340.0, ans=0.125 2024-09-16 19:08:27,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=60340.0, ans=0.125 2024-09-16 19:08:48,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=60420.0, ans=0.0 2024-09-16 19:08:53,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60420.0, ans=0.1 2024-09-16 19:09:00,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=60460.0, ans=0.125 2024-09-16 19:09:14,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=60460.0, ans=0.125 2024-09-16 19:09:14,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=60460.0, ans=0.125 2024-09-16 19:09:16,976 INFO [train.py:1198] (1/2) Epoch 4, batch 1550, loss[loss=0.3228, ctc_loss=0.2648, cr_loss=0.4734, attn_decoder_loss=0.3187, over 29528.00 frames. ], tot_loss[loss=0.3076, ctc_loss=0.2446, cr_loss=0.4411, attn_decoder_loss=0.3047, over 5780493.16 frames. ], batch size: 90, lr: 2.61e-02, grad_scale: 4.0 2024-09-16 19:09:17,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=60500.0, ans=0.125 2024-09-16 19:09:17,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=60500.0, ans=0.02 2024-09-16 19:09:50,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60580.0, ans=0.1 2024-09-16 19:10:01,803 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.038e+02 1.301e+02 1.510e+02 1.822e+02 6.597e+02, threshold=3.020e+02, percent-clipped=6.0 2024-09-16 19:10:26,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=60660.0, ans=0.125 2024-09-16 19:10:29,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=60660.0, ans=0.125 2024-09-16 19:10:34,137 INFO [train.py:1198] (1/2) Epoch 4, batch 1600, loss[loss=0.3053, ctc_loss=0.2332, cr_loss=0.4513, attn_decoder_loss=0.3033, over 29704.00 frames. ], tot_loss[loss=0.3077, ctc_loss=0.245, cr_loss=0.4416, attn_decoder_loss=0.3048, over 5763431.07 frames. ], batch size: 85, lr: 2.61e-02, grad_scale: 8.0 2024-09-16 19:10:35,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=60700.0, ans=0.125 2024-09-16 19:10:56,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=60740.0, ans=0.125 2024-09-16 19:10:56,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60740.0, ans=0.1 2024-09-16 19:11:00,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=60740.0, ans=0.0 2024-09-16 19:11:18,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=60820.0, ans=0.0 2024-09-16 19:11:26,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60820.0, ans=0.1 2024-09-16 19:11:33,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=60860.0, ans=0.0 2024-09-16 19:11:39,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=60860.0, ans=0.0 2024-09-16 19:11:47,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=60860.0, ans=0.125 2024-09-16 19:11:50,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=60900.0, ans=0.125 2024-09-16 19:11:52,014 INFO [train.py:1198] (1/2) Epoch 4, batch 1650, loss[loss=0.3086, ctc_loss=0.2404, cr_loss=0.4363, attn_decoder_loss=0.3065, over 29703.00 frames. ], tot_loss[loss=0.3075, ctc_loss=0.2452, cr_loss=0.4414, attn_decoder_loss=0.3046, over 5757319.16 frames. ], batch size: 89, lr: 2.61e-02, grad_scale: 4.0 2024-09-16 19:11:57,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=60900.0, ans=0.0 2024-09-16 19:11:58,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=60900.0, ans=0.0 2024-09-16 19:12:02,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.03 vs. limit=15.0 2024-09-16 19:12:05,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=60940.0, ans=0.0 2024-09-16 19:12:11,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=60940.0, ans=0.2 2024-09-16 19:12:23,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=60980.0, ans=0.125 2024-09-16 19:12:27,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=60980.0, ans=0.125 2024-09-16 19:12:38,885 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.008e+02 1.275e+02 1.417e+02 1.655e+02 4.421e+02, threshold=2.835e+02, percent-clipped=2.0 2024-09-16 19:12:39,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=61020.0, ans=0.125 2024-09-16 19:12:40,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=61020.0, ans=0.125 2024-09-16 19:12:57,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=61060.0, ans=0.2 2024-09-16 19:13:07,420 INFO [train.py:1198] (1/2) Epoch 4, batch 1700, loss[loss=0.2745, ctc_loss=0.2108, cr_loss=0.3969, attn_decoder_loss=0.2727, over 29582.00 frames. ], tot_loss[loss=0.3072, ctc_loss=0.2446, cr_loss=0.4408, attn_decoder_loss=0.3044, over 5778104.86 frames. ], batch size: 69, lr: 2.60e-02, grad_scale: 8.0 2024-09-16 19:13:32,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=12.0 2024-09-16 19:13:33,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=61140.0, ans=0.5 2024-09-16 19:13:34,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=61140.0, ans=0.125 2024-09-16 19:13:40,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=61180.0, ans=0.125 2024-09-16 19:13:51,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=61220.0, ans=0.025 2024-09-16 19:13:56,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=61220.0, ans=0.125 2024-09-16 19:13:57,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=61220.0, ans=0.125 2024-09-16 19:14:04,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.78 vs. limit=15.0 2024-09-16 19:14:12,731 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:14:24,994 INFO [train.py:1198] (1/2) Epoch 4, batch 1750, loss[loss=0.2718, ctc_loss=0.2126, cr_loss=0.4106, attn_decoder_loss=0.2692, over 29334.00 frames. ], tot_loss[loss=0.3062, ctc_loss=0.243, cr_loss=0.4405, attn_decoder_loss=0.3034, over 5785249.16 frames. ], batch size: 67, lr: 2.60e-02, grad_scale: 8.0 2024-09-16 19:14:31,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=61300.0, ans=0.2 2024-09-16 19:14:49,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=61340.0, ans=0.0 2024-09-16 19:15:07,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=61380.0, ans=0.125 2024-09-16 19:15:11,788 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.273e+01 1.237e+02 1.382e+02 1.538e+02 2.452e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-16 19:15:34,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=61460.0, ans=0.125 2024-09-16 19:15:42,002 INFO [train.py:1198] (1/2) Epoch 4, batch 1800, loss[loss=0.2984, ctc_loss=0.23, cr_loss=0.4199, attn_decoder_loss=0.2966, over 29690.00 frames. ], tot_loss[loss=0.3064, ctc_loss=0.2434, cr_loss=0.4406, attn_decoder_loss=0.3036, over 5788963.87 frames. ], batch size: 83, lr: 2.60e-02, grad_scale: 8.0 2024-09-16 19:16:07,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=61540.0, ans=22.5 2024-09-16 19:16:57,521 INFO [train.py:1198] (1/2) Epoch 4, batch 1850, loss[loss=0.3143, ctc_loss=0.2429, cr_loss=0.4345, attn_decoder_loss=0.3126, over 29626.00 frames. ], tot_loss[loss=0.3062, ctc_loss=0.2428, cr_loss=0.4406, attn_decoder_loss=0.3035, over 5795526.93 frames. ], batch size: 86, lr: 2.59e-02, grad_scale: 4.0 2024-09-16 19:17:00,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=61700.0, ans=0.0 2024-09-16 19:17:29,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=61780.0, ans=0.1 2024-09-16 19:17:32,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=61780.0, ans=0.0 2024-09-16 19:17:44,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=61820.0, ans=0.2 2024-09-16 19:17:46,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.284e+02 1.452e+02 1.621e+02 3.527e+02, threshold=2.905e+02, percent-clipped=2.0 2024-09-16 19:18:12,185 INFO [train.py:1198] (1/2) Epoch 4, batch 1900, loss[loss=0.3074, ctc_loss=0.2457, cr_loss=0.4309, attn_decoder_loss=0.3047, over 29713.00 frames. ], tot_loss[loss=0.3069, ctc_loss=0.2433, cr_loss=0.4414, attn_decoder_loss=0.3041, over 5804183.50 frames. ], batch size: 89, lr: 2.59e-02, grad_scale: 8.0 2024-09-16 19:18:12,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=61900.0, ans=0.0 2024-09-16 19:18:29,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=61940.0, ans=0.125 2024-09-16 19:18:36,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-16 19:18:37,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=61940.0, ans=0.2 2024-09-16 19:18:49,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=61980.0, ans=0.025 2024-09-16 19:19:00,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.89 vs. limit=12.0 2024-09-16 19:19:13,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=62060.0, ans=0.0 2024-09-16 19:19:22,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.04 vs. limit=22.5 2024-09-16 19:19:23,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=62060.0, ans=0.125 2024-09-16 19:19:31,268 INFO [train.py:1198] (1/2) Epoch 4, batch 1950, loss[loss=0.2962, ctc_loss=0.2276, cr_loss=0.4154, attn_decoder_loss=0.2946, over 29416.00 frames. ], tot_loss[loss=0.3084, ctc_loss=0.2446, cr_loss=0.4435, attn_decoder_loss=0.3056, over 5818808.96 frames. ], batch size: 78, lr: 2.59e-02, grad_scale: 4.0 2024-09-16 19:19:43,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=62100.0, ans=0.1 2024-09-16 19:19:45,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=62140.0, ans=0.125 2024-09-16 19:19:48,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=62140.0, ans=0.1 2024-09-16 19:19:52,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=62140.0, ans=0.1 2024-09-16 19:19:53,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.08 vs. limit=10.0 2024-09-16 19:20:06,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=62180.0, ans=0.0 2024-09-16 19:20:16,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=62220.0, ans=0.125 2024-09-16 19:20:22,228 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.025e+02 1.204e+02 1.396e+02 1.540e+02 6.321e+02, threshold=2.792e+02, percent-clipped=2.0 2024-09-16 19:20:28,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=62220.0, ans=0.125 2024-09-16 19:20:33,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=62260.0, ans=0.125 2024-09-16 19:20:46,381 INFO [train.py:1198] (1/2) Epoch 4, batch 2000, loss[loss=0.2667, ctc_loss=0.2004, cr_loss=0.3754, attn_decoder_loss=0.2657, over 29343.00 frames. ], tot_loss[loss=0.3092, ctc_loss=0.2455, cr_loss=0.4443, attn_decoder_loss=0.3064, over 5798129.19 frames. ], batch size: 67, lr: 2.58e-02, grad_scale: 8.0 2024-09-16 19:21:00,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=62340.0, ans=0.2 2024-09-16 19:21:08,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=62340.0, ans=0.1 2024-09-16 19:21:25,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.25 vs. limit=10.0 2024-09-16 19:22:02,148 INFO [train.py:1198] (1/2) Epoch 4, batch 2050, loss[loss=0.2679, ctc_loss=0.2009, cr_loss=0.4105, attn_decoder_loss=0.2662, over 29440.00 frames. ], tot_loss[loss=0.3077, ctc_loss=0.2436, cr_loss=0.442, attn_decoder_loss=0.305, over 5791235.76 frames. ], batch size: 70, lr: 2.58e-02, grad_scale: 4.0 2024-09-16 19:22:22,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=62540.0, ans=0.0 2024-09-16 19:22:54,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=62620.0, ans=0.0 2024-09-16 19:22:55,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=62620.0, ans=0.0 2024-09-16 19:22:57,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.877e+01 1.306e+02 1.501e+02 1.885e+02 4.145e+02, threshold=3.002e+02, percent-clipped=3.0 2024-09-16 19:22:59,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-09-16 19:23:13,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=12.0 2024-09-16 19:23:14,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=62660.0, ans=15.0 2024-09-16 19:23:21,639 INFO [train.py:1198] (1/2) Epoch 4, batch 2100, loss[loss=0.3166, ctc_loss=0.254, cr_loss=0.4655, attn_decoder_loss=0.3132, over 29753.00 frames. ], tot_loss[loss=0.3069, ctc_loss=0.2427, cr_loss=0.4409, attn_decoder_loss=0.3042, over 5802875.60 frames. ], batch size: 81, lr: 2.58e-02, grad_scale: 8.0 2024-09-16 19:23:40,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=62740.0, ans=0.025 2024-09-16 19:23:41,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=62740.0, ans=0.05 2024-09-16 19:23:42,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.88 vs. limit=10.0 2024-09-16 19:24:09,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=62820.0, ans=0.0 2024-09-16 19:24:26,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=62860.0, ans=0.125 2024-09-16 19:24:36,628 INFO [train.py:1198] (1/2) Epoch 4, batch 2150, loss[loss=0.2843, ctc_loss=0.2102, cr_loss=0.428, attn_decoder_loss=0.283, over 29433.00 frames. ], tot_loss[loss=0.3058, ctc_loss=0.2411, cr_loss=0.4401, attn_decoder_loss=0.3032, over 5817465.91 frames. ], batch size: 78, lr: 2.57e-02, grad_scale: 4.0 2024-09-16 19:24:45,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-09-16 19:25:24,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2024-09-16 19:25:31,031 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.004e+02 1.239e+02 1.413e+02 1.658e+02 2.671e+02, threshold=2.826e+02, percent-clipped=0.0 2024-09-16 19:25:35,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-16 19:25:38,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=63060.0, ans=0.125 2024-09-16 19:25:44,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=63060.0, ans=0.2 2024-09-16 19:25:52,109 INFO [train.py:1198] (1/2) Epoch 4, batch 2200, loss[loss=0.3166, ctc_loss=0.2428, cr_loss=0.4194, attn_decoder_loss=0.3155, over 29619.00 frames. ], tot_loss[loss=0.3056, ctc_loss=0.2411, cr_loss=0.4399, attn_decoder_loss=0.303, over 5813998.16 frames. ], batch size: 86, lr: 2.57e-02, grad_scale: 8.0 2024-09-16 19:26:04,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.20 vs. limit=15.0 2024-09-16 19:26:23,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=63180.0, ans=0.0 2024-09-16 19:26:37,328 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:27:09,942 INFO [train.py:1198] (1/2) Epoch 4, batch 2250, loss[loss=0.3077, ctc_loss=0.2335, cr_loss=0.415, attn_decoder_loss=0.3067, over 29689.00 frames. ], tot_loss[loss=0.3053, ctc_loss=0.2409, cr_loss=0.4393, attn_decoder_loss=0.3027, over 5813534.01 frames. ], batch size: 82, lr: 2.57e-02, grad_scale: 4.0 2024-09-16 19:27:31,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.66 vs. limit=22.5 2024-09-16 19:27:40,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.01 vs. limit=22.5 2024-09-16 19:27:51,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=63380.0, ans=0.035 2024-09-16 19:28:03,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=63420.0, ans=0.0 2024-09-16 19:28:07,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.265e+02 1.418e+02 1.691e+02 4.004e+02, threshold=2.836e+02, percent-clipped=3.0 2024-09-16 19:28:15,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=63460.0, ans=0.125 2024-09-16 19:28:21,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=63460.0, ans=0.125 2024-09-16 19:28:27,219 INFO [train.py:1198] (1/2) Epoch 4, batch 2300, loss[loss=0.2663, ctc_loss=0.1986, cr_loss=0.4094, attn_decoder_loss=0.2647, over 29735.00 frames. ], tot_loss[loss=0.3046, ctc_loss=0.2402, cr_loss=0.4383, attn_decoder_loss=0.302, over 5801430.33 frames. ], batch size: 72, lr: 2.56e-02, grad_scale: 8.0 2024-09-16 19:28:45,300 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:28:47,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2024-09-16 19:28:49,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.29 vs. limit=15.0 2024-09-16 19:28:49,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=63540.0, ans=0.125 2024-09-16 19:28:50,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2024-09-16 19:28:55,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=63580.0, ans=0.025 2024-09-16 19:29:30,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=63660.0, ans=0.04949747468305833 2024-09-16 19:29:35,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=63660.0, ans=0.125 2024-09-16 19:29:42,584 INFO [train.py:1198] (1/2) Epoch 4, batch 2350, loss[loss=0.3237, ctc_loss=0.2604, cr_loss=0.4898, attn_decoder_loss=0.3198, over 29690.00 frames. ], tot_loss[loss=0.3044, ctc_loss=0.2398, cr_loss=0.4379, attn_decoder_loss=0.3019, over 5807368.78 frames. ], batch size: 83, lr: 2.56e-02, grad_scale: 4.0 2024-09-16 19:29:50,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-09-16 19:29:55,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63740.0, ans=0.1 2024-09-16 19:30:02,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=63740.0, ans=0.125 2024-09-16 19:30:07,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=63740.0, ans=0.0 2024-09-16 19:30:13,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=63780.0, ans=0.2 2024-09-16 19:30:31,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-09-16 19:30:41,597 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.383e+02 1.538e+02 1.780e+02 4.486e+02, threshold=3.076e+02, percent-clipped=4.0 2024-09-16 19:30:49,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=63860.0, ans=0.0 2024-09-16 19:30:57,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=63860.0, ans=0.125 2024-09-16 19:30:59,750 INFO [train.py:1198] (1/2) Epoch 4, batch 2400, loss[loss=0.2871, ctc_loss=0.2163, cr_loss=0.4111, attn_decoder_loss=0.2858, over 29557.00 frames. ], tot_loss[loss=0.3048, ctc_loss=0.24, cr_loss=0.4386, attn_decoder_loss=0.3023, over 5810519.27 frames. ], batch size: 76, lr: 2.56e-02, grad_scale: 8.0 2024-09-16 19:31:04,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=12.60 vs. limit=15.0 2024-09-16 19:31:23,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=63940.0, ans=0.0 2024-09-16 19:31:36,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=63980.0, ans=0.025 2024-09-16 19:31:37,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=63980.0, ans=0.0 2024-09-16 19:31:51,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.61 vs. limit=15.0 2024-09-16 19:32:06,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.60 vs. limit=22.5 2024-09-16 19:32:22,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=64060.0, ans=0.125 2024-09-16 19:32:24,847 INFO [train.py:1198] (1/2) Epoch 4, batch 2450, loss[loss=0.3105, ctc_loss=0.2477, cr_loss=0.4673, attn_decoder_loss=0.307, over 29706.00 frames. ], tot_loss[loss=0.3058, ctc_loss=0.2412, cr_loss=0.4395, attn_decoder_loss=0.3032, over 5786703.73 frames. ], batch size: 82, lr: 2.55e-02, grad_scale: 4.0 2024-09-16 19:32:26,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=64100.0, ans=0.125 2024-09-16 19:32:29,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=64100.0, ans=0.0 2024-09-16 19:32:37,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=64100.0, ans=0.025 2024-09-16 19:32:45,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=6.13 vs. limit=12.0 2024-09-16 19:33:20,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=64220.0, ans=0.025 2024-09-16 19:33:23,220 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.817e+01 1.239e+02 1.387e+02 1.580e+02 7.191e+02, threshold=2.774e+02, percent-clipped=3.0 2024-09-16 19:33:35,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=64260.0, ans=0.0 2024-09-16 19:33:39,947 INFO [train.py:1198] (1/2) Epoch 4, batch 2500, loss[loss=0.306, ctc_loss=0.2324, cr_loss=0.4239, attn_decoder_loss=0.3048, over 29640.00 frames. ], tot_loss[loss=0.3057, ctc_loss=0.241, cr_loss=0.4401, attn_decoder_loss=0.3031, over 5797120.07 frames. ], batch size: 86, lr: 2.55e-02, grad_scale: 8.0 2024-09-16 19:33:53,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=64340.0, ans=0.125 2024-09-16 19:34:09,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=64340.0, ans=0.0 2024-09-16 19:34:27,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=64420.0, ans=0.07 2024-09-16 19:34:35,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=64420.0, ans=0.125 2024-09-16 19:34:50,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=64460.0, ans=0.125 2024-09-16 19:34:51,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=64460.0, ans=0.0 2024-09-16 19:34:59,439 INFO [train.py:1198] (1/2) Epoch 4, batch 2550, loss[loss=0.2674, ctc_loss=0.198, cr_loss=0.3979, attn_decoder_loss=0.2663, over 29358.00 frames. ], tot_loss[loss=0.3056, ctc_loss=0.2406, cr_loss=0.4403, attn_decoder_loss=0.303, over 5799378.92 frames. ], batch size: 67, lr: 2.55e-02, grad_scale: 4.0 2024-09-16 19:35:06,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2024-09-16 19:35:12,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-09-16 19:35:32,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.11 vs. limit=10.0 2024-09-16 19:35:37,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=64580.0, ans=0.125 2024-09-16 19:35:46,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=64620.0, ans=10.0 2024-09-16 19:36:00,133 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.618e+01 1.258e+02 1.410e+02 1.550e+02 4.677e+02, threshold=2.819e+02, percent-clipped=4.0 2024-09-16 19:36:08,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-09-16 19:36:15,314 INFO [train.py:1198] (1/2) Epoch 4, batch 2600, loss[loss=0.2904, ctc_loss=0.2254, cr_loss=0.4119, attn_decoder_loss=0.2884, over 29442.00 frames. ], tot_loss[loss=0.306, ctc_loss=0.2409, cr_loss=0.4403, attn_decoder_loss=0.3034, over 5795990.55 frames. ], batch size: 78, lr: 2.54e-02, grad_scale: 8.0 2024-09-16 19:36:23,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=64700.0, ans=0.025 2024-09-16 19:37:27,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=64860.0, ans=0.125 2024-09-16 19:37:29,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=64900.0, ans=0.0 2024-09-16 19:37:30,536 INFO [train.py:1198] (1/2) Epoch 4, batch 2650, loss[loss=0.3255, ctc_loss=0.2578, cr_loss=0.4645, attn_decoder_loss=0.3227, over 29294.00 frames. ], tot_loss[loss=0.3061, ctc_loss=0.241, cr_loss=0.4408, attn_decoder_loss=0.3035, over 5801883.51 frames. ], batch size: 100, lr: 2.54e-02, grad_scale: 4.0 2024-09-16 19:37:56,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=64940.0, ans=0.125 2024-09-16 19:38:00,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=64940.0, ans=0.2 2024-09-16 19:38:17,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=65020.0, ans=0.0 2024-09-16 19:38:34,064 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.250e+02 1.369e+02 1.564e+02 3.210e+02, threshold=2.738e+02, percent-clipped=1.0 2024-09-16 19:38:34,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=65060.0, ans=0.125 2024-09-16 19:38:49,703 INFO [train.py:1198] (1/2) Epoch 4, batch 2700, loss[loss=0.308, ctc_loss=0.2333, cr_loss=0.4738, attn_decoder_loss=0.3058, over 29511.00 frames. ], tot_loss[loss=0.3059, ctc_loss=0.2406, cr_loss=0.4406, attn_decoder_loss=0.3033, over 5797444.19 frames. ], batch size: 87, lr: 2.54e-02, grad_scale: 8.0 2024-09-16 19:39:30,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=65180.0, ans=0.5 2024-09-16 19:39:30,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=65180.0, ans=0.125 2024-09-16 19:39:33,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=65220.0, ans=0.2 2024-09-16 19:39:47,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=65220.0, ans=0.125 2024-09-16 19:39:53,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=65260.0, ans=0.05 2024-09-16 19:39:54,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=65260.0, ans=0.0 2024-09-16 19:39:59,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=65260.0, ans=0.1 2024-09-16 19:40:05,370 INFO [train.py:1198] (1/2) Epoch 4, batch 2750, loss[loss=0.2958, ctc_loss=0.2275, cr_loss=0.4405, attn_decoder_loss=0.2936, over 29517.00 frames. ], tot_loss[loss=0.3045, ctc_loss=0.2396, cr_loss=0.4395, attn_decoder_loss=0.3019, over 5796483.28 frames. ], batch size: 75, lr: 2.53e-02, grad_scale: 4.0 2024-09-16 19:40:05,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=65300.0, ans=0.0 2024-09-16 19:40:10,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=65300.0, ans=0.0 2024-09-16 19:40:39,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=65380.0, ans=0.2 2024-09-16 19:40:48,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=65420.0, ans=0.125 2024-09-16 19:40:50,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=65420.0, ans=0.125 2024-09-16 19:41:08,343 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.842e+01 1.245e+02 1.440e+02 1.752e+02 4.612e+02, threshold=2.880e+02, percent-clipped=7.0 2024-09-16 19:41:13,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=65460.0, ans=0.0 2024-09-16 19:41:14,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2024-09-16 19:41:20,408 INFO [train.py:1198] (1/2) Epoch 4, batch 2800, loss[loss=0.3384, ctc_loss=0.3054, cr_loss=0.4945, attn_decoder_loss=0.331, over 20670.00 frames. ], tot_loss[loss=0.3044, ctc_loss=0.2398, cr_loss=0.4395, attn_decoder_loss=0.3018, over 5776842.12 frames. ], batch size: 211, lr: 2.53e-02, grad_scale: 8.0 2024-09-16 19:41:25,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-09-16 19:41:38,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-09-16 19:41:44,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=65540.0, ans=0.125 2024-09-16 19:41:51,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65580.0, ans=0.1 2024-09-16 19:41:52,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=65580.0, ans=0.125 2024-09-16 19:42:33,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=65660.0, ans=0.0 2024-09-16 19:42:40,277 INFO [train.py:1198] (1/2) Epoch 4, batch 2850, loss[loss=0.3142, ctc_loss=0.2574, cr_loss=0.4526, attn_decoder_loss=0.3105, over 29521.00 frames. ], tot_loss[loss=0.3055, ctc_loss=0.241, cr_loss=0.4398, attn_decoder_loss=0.3029, over 5761777.40 frames. ], batch size: 77, lr: 2.53e-02, grad_scale: 4.0 2024-09-16 19:42:41,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=65700.0, ans=0.125 2024-09-16 19:43:12,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=65780.0, ans=0.125 2024-09-16 19:43:15,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=65780.0, ans=0.125 2024-09-16 19:43:36,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=65820.0, ans=0.0 2024-09-16 19:43:45,309 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.371e+02 1.544e+02 1.863e+02 5.214e+02, threshold=3.089e+02, percent-clipped=4.0 2024-09-16 19:43:55,804 INFO [train.py:1198] (1/2) Epoch 4, batch 2900, loss[loss=0.2994, ctc_loss=0.2359, cr_loss=0.4438, attn_decoder_loss=0.2966, over 29457.00 frames. ], tot_loss[loss=0.3062, ctc_loss=0.241, cr_loss=0.442, attn_decoder_loss=0.3037, over 5787691.53 frames. ], batch size: 79, lr: 2.52e-02, grad_scale: 8.0 2024-09-16 19:43:57,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.61 vs. limit=10.0 2024-09-16 19:44:04,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=22.5 2024-09-16 19:44:09,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65940.0, ans=0.1 2024-09-16 19:44:09,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=65940.0, ans=0.125 2024-09-16 19:44:11,072 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:44:27,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65980.0, ans=0.1 2024-09-16 19:44:37,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=65980.0, ans=0.2 2024-09-16 19:45:10,935 INFO [train.py:1198] (1/2) Epoch 4, batch 2950, loss[loss=0.288, ctc_loss=0.2236, cr_loss=0.4098, attn_decoder_loss=0.286, over 29543.00 frames. ], tot_loss[loss=0.305, ctc_loss=0.2403, cr_loss=0.4405, attn_decoder_loss=0.3024, over 5781618.21 frames. ], batch size: 75, lr: 2.52e-02, grad_scale: 4.0 2024-09-16 19:45:51,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=66180.0, ans=0.125 2024-09-16 19:45:57,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66220.0, ans=0.1 2024-09-16 19:46:08,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=66220.0, ans=0.0 2024-09-16 19:46:19,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.755e+01 1.228e+02 1.356e+02 1.566e+02 3.773e+02, threshold=2.713e+02, percent-clipped=2.0 2024-09-16 19:46:22,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=66260.0, ans=0.0 2024-09-16 19:46:22,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=66260.0, ans=0.125 2024-09-16 19:46:30,793 INFO [train.py:1198] (1/2) Epoch 4, batch 3000, loss[loss=0.3019, ctc_loss=0.2362, cr_loss=0.427, attn_decoder_loss=0.2997, over 29751.00 frames. ], tot_loss[loss=0.3049, ctc_loss=0.24, cr_loss=0.4407, attn_decoder_loss=0.3023, over 5782830.24 frames. ], batch size: 81, lr: 2.52e-02, grad_scale: 8.0 2024-09-16 19:46:30,794 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 19:46:49,051 INFO [train.py:1230] (1/2) Epoch 4, validation: loss=0.2264, ctc_loss=0.07857, cr_loss=4.376e-15, attn_decoder_loss=0.2428, over 944034.00 frames. 2024-09-16 19:46:49,052 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 19:47:05,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.29 vs. limit=10.0 2024-09-16 19:47:06,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=66340.0, ans=0.125 2024-09-16 19:47:09,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=66340.0, ans=0.125 2024-09-16 19:47:10,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=66340.0, ans=0.0 2024-09-16 19:47:15,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.97 vs. limit=15.0 2024-09-16 19:47:29,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=66380.0, ans=0.0 2024-09-16 19:47:30,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=66380.0, ans=0.125 2024-09-16 19:47:41,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=66420.0, ans=0.125 2024-09-16 19:48:05,600 INFO [train.py:1198] (1/2) Epoch 4, batch 3050, loss[loss=0.2881, ctc_loss=0.2268, cr_loss=0.4391, attn_decoder_loss=0.2852, over 29555.00 frames. ], tot_loss[loss=0.3057, ctc_loss=0.2406, cr_loss=0.4416, attn_decoder_loss=0.3032, over 5777211.78 frames. ], batch size: 76, lr: 2.51e-02, grad_scale: 4.0 2024-09-16 19:48:27,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=66540.0, ans=0.0 2024-09-16 19:48:38,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2024-09-16 19:48:46,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=66580.0, ans=0.2 2024-09-16 19:49:13,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.917e+01 1.239e+02 1.360e+02 1.654e+02 2.744e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-16 19:49:20,803 INFO [train.py:1198] (1/2) Epoch 4, batch 3100, loss[loss=0.3093, ctc_loss=0.2348, cr_loss=0.4379, attn_decoder_loss=0.3078, over 29281.00 frames. ], tot_loss[loss=0.3052, ctc_loss=0.24, cr_loss=0.4406, attn_decoder_loss=0.3027, over 5777615.45 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 8.0 2024-09-16 19:49:22,651 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:49:39,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66740.0, ans=0.1 2024-09-16 19:49:43,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2024-09-16 19:49:45,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=66740.0, ans=0.125 2024-09-16 19:49:57,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66780.0, ans=0.1 2024-09-16 19:49:58,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=12.0 2024-09-16 19:50:07,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=66780.0, ans=0.125 2024-09-16 19:50:08,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=66820.0, ans=0.0 2024-09-16 19:50:16,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=66820.0, ans=0.95 2024-09-16 19:50:28,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=66860.0, ans=0.2 2024-09-16 19:50:40,202 INFO [train.py:1198] (1/2) Epoch 4, batch 3150, loss[loss=0.3248, ctc_loss=0.2571, cr_loss=0.4785, attn_decoder_loss=0.3216, over 28847.00 frames. ], tot_loss[loss=0.3046, ctc_loss=0.2391, cr_loss=0.4401, attn_decoder_loss=0.3021, over 5784045.07 frames. ], batch size: 104, lr: 2.51e-02, grad_scale: 4.0 2024-09-16 19:50:48,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.59 vs. limit=12.0 2024-09-16 19:51:05,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-09-16 19:51:18,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=66980.0, ans=0.125 2024-09-16 19:51:18,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66980.0, ans=0.1 2024-09-16 19:51:39,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67060.0, ans=0.1 2024-09-16 19:51:46,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=67060.0, ans=0.125 2024-09-16 19:51:49,431 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.339e+01 1.205e+02 1.438e+02 1.646e+02 4.024e+02, threshold=2.876e+02, percent-clipped=3.0 2024-09-16 19:51:51,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.86 vs. limit=15.0 2024-09-16 19:51:55,490 INFO [train.py:1198] (1/2) Epoch 4, batch 3200, loss[loss=0.3001, ctc_loss=0.2362, cr_loss=0.4646, attn_decoder_loss=0.2969, over 29420.00 frames. ], tot_loss[loss=0.3038, ctc_loss=0.2381, cr_loss=0.4394, attn_decoder_loss=0.3013, over 5794280.71 frames. ], batch size: 79, lr: 2.51e-02, grad_scale: 8.0 2024-09-16 19:51:55,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67100.0, ans=0.1 2024-09-16 19:52:27,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=67180.0, ans=0.125 2024-09-16 19:52:32,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=67180.0, ans=0.1 2024-09-16 19:52:34,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=67180.0, ans=0.125 2024-09-16 19:52:41,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=67220.0, ans=0.125 2024-09-16 19:53:11,543 INFO [train.py:1198] (1/2) Epoch 4, batch 3250, loss[loss=0.3191, ctc_loss=0.2497, cr_loss=0.4814, attn_decoder_loss=0.3161, over 29694.00 frames. ], tot_loss[loss=0.3045, ctc_loss=0.2387, cr_loss=0.4407, attn_decoder_loss=0.302, over 5800153.27 frames. ], batch size: 84, lr: 2.50e-02, grad_scale: 4.0 2024-09-16 19:53:20,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-09-16 19:53:27,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2024-09-16 19:53:51,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=67380.0, ans=0.125 2024-09-16 19:54:24,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=67460.0, ans=0.125 2024-09-16 19:54:26,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.284e+02 1.425e+02 1.663e+02 2.668e+02, threshold=2.850e+02, percent-clipped=0.0 2024-09-16 19:54:29,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=67500.0, ans=0.015 2024-09-16 19:54:30,766 INFO [train.py:1198] (1/2) Epoch 4, batch 3300, loss[loss=0.3099, ctc_loss=0.2383, cr_loss=0.4339, attn_decoder_loss=0.3082, over 28308.00 frames. ], tot_loss[loss=0.3036, ctc_loss=0.238, cr_loss=0.4399, attn_decoder_loss=0.3011, over 5797099.38 frames. ], batch size: 111, lr: 2.50e-02, grad_scale: 8.0 2024-09-16 19:54:31,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-09-16 19:54:50,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=67540.0, ans=0.125 2024-09-16 19:54:56,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=67540.0, ans=0.125 2024-09-16 19:55:46,066 INFO [train.py:1198] (1/2) Epoch 4, batch 3350, loss[loss=0.3232, ctc_loss=0.2613, cr_loss=0.4433, attn_decoder_loss=0.3203, over 28777.00 frames. ], tot_loss[loss=0.304, ctc_loss=0.2384, cr_loss=0.4394, attn_decoder_loss=0.3015, over 5774410.03 frames. ], batch size: 104, lr: 2.50e-02, grad_scale: 4.0 2024-09-16 19:55:52,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=67700.0, ans=0.025 2024-09-16 19:56:24,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=67780.0, ans=0.125 2024-09-16 19:56:24,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=67780.0, ans=0.0 2024-09-16 19:56:34,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=67820.0, ans=0.0 2024-09-16 19:56:53,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=67860.0, ans=0.0 2024-09-16 19:56:58,645 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.005e+02 1.186e+02 1.341e+02 1.622e+02 4.699e+02, threshold=2.682e+02, percent-clipped=3.0 2024-09-16 19:57:01,666 INFO [train.py:1198] (1/2) Epoch 4, batch 3400, loss[loss=0.2743, ctc_loss=0.211, cr_loss=0.4039, attn_decoder_loss=0.2724, over 29330.00 frames. ], tot_loss[loss=0.3038, ctc_loss=0.2382, cr_loss=0.439, attn_decoder_loss=0.3014, over 5766563.79 frames. ], batch size: 67, lr: 2.49e-02, grad_scale: 8.0 2024-09-16 19:57:11,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=67900.0, ans=0.2 2024-09-16 19:57:14,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-09-16 19:57:52,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-09-16 19:58:14,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=68060.0, ans=0.125 2024-09-16 19:58:17,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=68060.0, ans=0.125 2024-09-16 19:58:19,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-09-16 19:58:21,354 INFO [train.py:1198] (1/2) Epoch 4, batch 3450, loss[loss=0.3237, ctc_loss=0.2579, cr_loss=0.4411, attn_decoder_loss=0.3212, over 28314.00 frames. ], tot_loss[loss=0.3041, ctc_loss=0.238, cr_loss=0.4388, attn_decoder_loss=0.3017, over 5773947.80 frames. ], batch size: 111, lr: 2.49e-02, grad_scale: 4.0 2024-09-16 19:58:24,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=68100.0, ans=0.1 2024-09-16 19:58:32,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=68100.0, ans=0.125 2024-09-16 19:58:46,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.31 vs. limit=15.0 2024-09-16 19:58:47,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=68140.0, ans=0.2 2024-09-16 19:59:02,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68180.0, ans=0.1 2024-09-16 19:59:14,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.67 vs. limit=10.0 2024-09-16 19:59:32,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=68260.0, ans=0.125 2024-09-16 19:59:34,952 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.730e+01 1.161e+02 1.261e+02 1.469e+02 4.535e+02, threshold=2.521e+02, percent-clipped=3.0 2024-09-16 19:59:36,486 INFO [train.py:1198] (1/2) Epoch 4, batch 3500, loss[loss=0.2655, ctc_loss=0.2049, cr_loss=0.3728, attn_decoder_loss=0.264, over 29325.00 frames. ], tot_loss[loss=0.3031, ctc_loss=0.2371, cr_loss=0.4379, attn_decoder_loss=0.3007, over 5776273.31 frames. ], batch size: 71, lr: 2.49e-02, grad_scale: 8.0 2024-09-16 19:59:48,914 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:59:51,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=68340.0, ans=0.125 2024-09-16 20:00:02,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=68340.0, ans=0.2 2024-09-16 20:00:09,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=68380.0, ans=0.0 2024-09-16 20:00:38,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=22.5 2024-09-16 20:00:43,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=68460.0, ans=0.125 2024-09-16 20:00:49,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2024-09-16 20:00:50,563 INFO [train.py:1198] (1/2) Epoch 4, batch 3550, loss[loss=0.323, ctc_loss=0.2624, cr_loss=0.4565, attn_decoder_loss=0.3196, over 29703.00 frames. ], tot_loss[loss=0.3032, ctc_loss=0.2372, cr_loss=0.438, attn_decoder_loss=0.3008, over 5781749.03 frames. ], batch size: 89, lr: 2.48e-02, grad_scale: 4.0 2024-09-16 20:00:50,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=68500.0, ans=0.1 2024-09-16 20:01:05,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=68540.0, ans=0.0 2024-09-16 20:01:08,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=68540.0, ans=0.0 2024-09-16 20:01:29,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=68580.0, ans=0.1 2024-09-16 20:01:32,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=68580.0, ans=0.2 2024-09-16 20:01:36,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=68620.0, ans=0.05 2024-09-16 20:01:38,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=68620.0, ans=0.2 2024-09-16 20:01:38,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=68620.0, ans=0.2 2024-09-16 20:02:04,415 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.275e+02 1.372e+02 1.558e+02 6.376e+02, threshold=2.743e+02, percent-clipped=5.0 2024-09-16 20:02:04,436 INFO [train.py:1198] (1/2) Epoch 4, batch 3600, loss[loss=0.2868, ctc_loss=0.216, cr_loss=0.4322, attn_decoder_loss=0.2851, over 29505.00 frames. ], tot_loss[loss=0.3031, ctc_loss=0.2369, cr_loss=0.4378, attn_decoder_loss=0.3007, over 5791576.37 frames. ], batch size: 77, lr: 2.48e-02, grad_scale: 8.0 2024-09-16 20:02:27,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68740.0, ans=0.1 2024-09-16 20:02:29,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=68740.0, ans=0.125 2024-09-16 20:02:36,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=68780.0, ans=0.125 2024-09-16 20:02:38,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=68780.0, ans=0.2 2024-09-16 20:03:08,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68860.0, ans=0.1 2024-09-16 20:03:17,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=68860.0, ans=0.0 2024-09-16 20:03:23,382 INFO [train.py:1198] (1/2) Epoch 4, batch 3650, loss[loss=0.3099, ctc_loss=0.2408, cr_loss=0.4408, attn_decoder_loss=0.3078, over 29487.00 frames. ], tot_loss[loss=0.3022, ctc_loss=0.2361, cr_loss=0.4377, attn_decoder_loss=0.2999, over 5794090.99 frames. ], batch size: 90, lr: 2.48e-02, grad_scale: 4.0 2024-09-16 20:03:23,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68900.0, ans=0.1 2024-09-16 20:03:35,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=68900.0, ans=0.125 2024-09-16 20:04:11,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=69020.0, ans=0.125 2024-09-16 20:04:14,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=69020.0, ans=0.1 2024-09-16 20:04:26,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69060.0, ans=0.125 2024-09-16 20:04:30,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-09-16 20:04:37,603 INFO [train.py:1198] (1/2) Epoch 4, batch 3700, loss[loss=0.321, ctc_loss=0.2546, cr_loss=0.4587, attn_decoder_loss=0.3182, over 29686.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2361, cr_loss=0.4383, attn_decoder_loss=0.3001, over 5804863.08 frames. ], batch size: 84, lr: 2.47e-02, grad_scale: 8.0 2024-09-16 20:04:39,096 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.266e+02 1.378e+02 1.578e+02 2.388e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-16 20:05:01,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=69140.0, ans=0.125 2024-09-16 20:05:10,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=69180.0, ans=0.125 2024-09-16 20:05:16,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=69180.0, ans=0.0 2024-09-16 20:05:17,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=69180.0, ans=0.125 2024-09-16 20:05:22,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69220.0, ans=0.1 2024-09-16 20:05:45,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-16 20:05:51,396 INFO [train.py:1198] (1/2) Epoch 4, batch 3750, loss[loss=0.2707, ctc_loss=0.2145, cr_loss=0.3927, attn_decoder_loss=0.2683, over 29328.00 frames. ], tot_loss[loss=0.3022, ctc_loss=0.2361, cr_loss=0.4388, attn_decoder_loss=0.2998, over 5807959.29 frames. ], batch size: 67, lr: 2.47e-02, grad_scale: 4.0 2024-09-16 20:06:09,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69340.0, ans=0.1 2024-09-16 20:06:13,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2024-09-16 20:06:13,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=69340.0, ans=0.125 2024-09-16 20:06:30,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=69380.0, ans=0.0 2024-09-16 20:06:43,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=69420.0, ans=0.025 2024-09-16 20:06:48,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=69420.0, ans=0.2 2024-09-16 20:06:54,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=69460.0, ans=0.0 2024-09-16 20:06:54,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.27 vs. limit=22.5 2024-09-16 20:07:05,692 INFO [train.py:1198] (1/2) Epoch 4, batch 3800, loss[loss=0.309, ctc_loss=0.231, cr_loss=0.4503, attn_decoder_loss=0.3076, over 29626.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.2354, cr_loss=0.4373, attn_decoder_loss=0.2994, over 5799244.20 frames. ], batch size: 86, lr: 2.47e-02, grad_scale: 8.0 2024-09-16 20:07:05,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=69500.0, ans=0.125 2024-09-16 20:07:08,685 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.301e+02 1.423e+02 1.744e+02 6.965e+02, threshold=2.846e+02, percent-clipped=5.0 2024-09-16 20:07:25,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=69540.0, ans=0.125 2024-09-16 20:07:28,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=69540.0, ans=0.04949747468305833 2024-09-16 20:08:05,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2024-09-16 20:08:10,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=69660.0, ans=0.1 2024-09-16 20:08:20,401 INFO [train.py:1198] (1/2) Epoch 4, batch 3850, loss[loss=0.3164, ctc_loss=0.2465, cr_loss=0.4404, attn_decoder_loss=0.3144, over 29267.00 frames. ], tot_loss[loss=0.302, ctc_loss=0.2353, cr_loss=0.4392, attn_decoder_loss=0.2996, over 5813614.84 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 4.0 2024-09-16 20:08:20,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=69700.0, ans=0.125 2024-09-16 20:08:25,648 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.08 vs. limit=22.5 2024-09-16 20:08:41,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.54 vs. limit=15.0 2024-09-16 20:08:42,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=69740.0, ans=0.1 2024-09-16 20:08:42,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=69740.0, ans=0.125 2024-09-16 20:09:06,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=69820.0, ans=0.125 2024-09-16 20:09:17,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.47 vs. limit=12.0 2024-09-16 20:09:25,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=69860.0, ans=0.0 2024-09-16 20:09:28,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=69860.0, ans=0.0 2024-09-16 20:09:36,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=69900.0, ans=0.125 2024-09-16 20:09:36,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.66 vs. limit=15.0 2024-09-16 20:09:37,507 INFO [train.py:1198] (1/2) Epoch 4, batch 3900, loss[loss=0.3104, ctc_loss=0.2464, cr_loss=0.4297, attn_decoder_loss=0.308, over 29616.00 frames. ], tot_loss[loss=0.3023, ctc_loss=0.2353, cr_loss=0.4393, attn_decoder_loss=0.3, over 5818111.86 frames. ], batch size: 86, lr: 2.46e-02, grad_scale: 8.0 2024-09-16 20:09:40,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=69900.0, ans=0.0 2024-09-16 20:09:41,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.044e+02 1.221e+02 1.343e+02 1.520e+02 2.719e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-16 20:09:46,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=69900.0, ans=0.0 2024-09-16 20:09:56,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=69940.0, ans=0.025 2024-09-16 20:09:57,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=69940.0, ans=0.05 2024-09-16 20:10:25,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=70020.0, ans=0.125 2024-09-16 20:10:48,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=70060.0, ans=0.125 2024-09-16 20:10:51,474 INFO [train.py:1198] (1/2) Epoch 4, batch 3950, loss[loss=0.316, ctc_loss=0.2469, cr_loss=0.4614, attn_decoder_loss=0.3134, over 29437.00 frames. ], tot_loss[loss=0.302, ctc_loss=0.2348, cr_loss=0.4394, attn_decoder_loss=0.2997, over 5837414.51 frames. ], batch size: 97, lr: 2.46e-02, grad_scale: 4.0 2024-09-16 20:11:02,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=70100.0, ans=0.125 2024-09-16 20:11:27,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=70180.0, ans=0.07 2024-09-16 20:11:37,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=70220.0, ans=0.95 2024-09-16 20:11:58,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=70260.0, ans=0.125 2024-09-16 20:12:05,016 INFO [train.py:1198] (1/2) Epoch 4, batch 4000, loss[loss=0.2832, ctc_loss=0.2012, cr_loss=0.3972, attn_decoder_loss=0.2835, over 29515.00 frames. ], tot_loss[loss=0.3025, ctc_loss=0.2358, cr_loss=0.4397, attn_decoder_loss=0.3002, over 5814149.78 frames. ], batch size: 74, lr: 2.46e-02, grad_scale: 8.0 2024-09-16 20:12:12,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.309e+02 1.435e+02 1.653e+02 3.484e+02, threshold=2.870e+02, percent-clipped=1.0 2024-09-16 20:12:19,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=70340.0, ans=0.125 2024-09-16 20:12:23,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.37 vs. limit=15.0 2024-09-16 20:12:30,266 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:12:37,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2024-09-16 20:12:37,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=70380.0, ans=0.125 2024-09-16 20:12:42,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=70380.0, ans=0.125 2024-09-16 20:13:06,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=70460.0, ans=0.125 2024-09-16 20:13:20,973 INFO [train.py:1198] (1/2) Epoch 4, batch 4050, loss[loss=0.3464, ctc_loss=0.3111, cr_loss=0.4424, attn_decoder_loss=0.3405, over 20579.00 frames. ], tot_loss[loss=0.3027, ctc_loss=0.236, cr_loss=0.4394, attn_decoder_loss=0.3003, over 5797748.21 frames. ], batch size: 210, lr: 2.45e-02, grad_scale: 4.0 2024-09-16 20:13:32,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70500.0, ans=0.1 2024-09-16 20:13:39,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=70540.0, ans=0.0 2024-09-16 20:13:44,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=70540.0, ans=0.2 2024-09-16 20:13:59,205 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:14:26,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=70660.0, ans=0.125 2024-09-16 20:14:29,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=70660.0, ans=0.025 2024-09-16 20:14:29,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=70660.0, ans=0.0 2024-09-16 20:14:35,708 INFO [train.py:1198] (1/2) Epoch 4, batch 4100, loss[loss=0.3208, ctc_loss=0.248, cr_loss=0.4739, attn_decoder_loss=0.3184, over 29515.00 frames. ], tot_loss[loss=0.3031, ctc_loss=0.2365, cr_loss=0.4403, attn_decoder_loss=0.3007, over 5792828.96 frames. ], batch size: 90, lr: 2.45e-02, grad_scale: 8.0 2024-09-16 20:14:42,855 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.753e+01 1.273e+02 1.617e+02 1.999e+02 3.514e+02, threshold=3.235e+02, percent-clipped=2.0 2024-09-16 20:14:45,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.14 vs. limit=22.5 2024-09-16 20:14:57,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=70740.0, ans=0.025 2024-09-16 20:15:06,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=70780.0, ans=0.125 2024-09-16 20:15:13,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=70780.0, ans=10.0 2024-09-16 20:15:48,990 INFO [train.py:1198] (1/2) Epoch 4, batch 4150, loss[loss=0.2849, ctc_loss=0.2136, cr_loss=0.4022, attn_decoder_loss=0.2839, over 29517.00 frames. ], tot_loss[loss=0.3028, ctc_loss=0.236, cr_loss=0.4399, attn_decoder_loss=0.3004, over 5797596.93 frames. ], batch size: 77, lr: 2.45e-02, grad_scale: 4.0 2024-09-16 20:16:00,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.64 vs. limit=15.0 2024-09-16 20:16:16,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=15.0 2024-09-16 20:16:17,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=70980.0, ans=0.0 2024-09-16 20:17:02,272 INFO [train.py:1198] (1/2) Epoch 4, batch 4200, loss[loss=0.3162, ctc_loss=0.2451, cr_loss=0.4474, attn_decoder_loss=0.3141, over 29495.00 frames. ], tot_loss[loss=0.3029, ctc_loss=0.2358, cr_loss=0.44, attn_decoder_loss=0.3005, over 5799087.07 frames. ], batch size: 90, lr: 2.44e-02, grad_scale: 8.0 2024-09-16 20:17:11,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=71100.0, ans=0.025 2024-09-16 20:17:12,670 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.228e+02 1.369e+02 1.579e+02 3.524e+02, threshold=2.737e+02, percent-clipped=1.0 2024-09-16 20:17:16,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=71140.0, ans=0.0 2024-09-16 20:17:18,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=71140.0, ans=0.2 2024-09-16 20:17:41,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=71180.0, ans=0.125 2024-09-16 20:17:42,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2024-09-16 20:17:43,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=71180.0, ans=0.125 2024-09-16 20:18:08,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=71260.0, ans=0.04949747468305833 2024-09-16 20:18:17,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2024-09-16 20:18:18,317 INFO [train.py:1198] (1/2) Epoch 4, batch 4250, loss[loss=0.2805, ctc_loss=0.2164, cr_loss=0.4152, attn_decoder_loss=0.2784, over 29501.00 frames. ], tot_loss[loss=0.3026, ctc_loss=0.235, cr_loss=0.4394, attn_decoder_loss=0.3004, over 5804489.61 frames. ], batch size: 74, lr: 2.44e-02, grad_scale: 4.0 2024-09-16 20:18:25,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71300.0, ans=0.1 2024-09-16 20:18:38,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2024-09-16 20:18:40,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.74 vs. limit=15.0 2024-09-16 20:18:50,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=71380.0, ans=0.125 2024-09-16 20:18:52,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=71380.0, ans=0.0 2024-09-16 20:19:29,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.85 vs. limit=15.0 2024-09-16 20:19:31,887 INFO [train.py:1198] (1/2) Epoch 4, batch 4300, loss[loss=0.317, ctc_loss=0.2481, cr_loss=0.4737, attn_decoder_loss=0.3141, over 29546.00 frames. ], tot_loss[loss=0.3029, ctc_loss=0.2351, cr_loss=0.4399, attn_decoder_loss=0.3007, over 5793750.90 frames. ], batch size: 87, lr: 2.44e-02, grad_scale: 8.0 2024-09-16 20:19:33,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=71500.0, ans=0.2 2024-09-16 20:19:43,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.834e+01 1.271e+02 1.418e+02 1.620e+02 3.004e+02, threshold=2.836e+02, percent-clipped=2.0 2024-09-16 20:19:48,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71540.0, ans=0.1 2024-09-16 20:19:49,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=71540.0, ans=0.07 2024-09-16 20:20:25,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71620.0, ans=0.1 2024-09-16 20:20:47,040 INFO [train.py:1198] (1/2) Epoch 4, batch 4350, loss[loss=0.3317, ctc_loss=0.2681, cr_loss=0.4629, attn_decoder_loss=0.3285, over 29460.00 frames. ], tot_loss[loss=0.3067, ctc_loss=0.2387, cr_loss=0.4448, attn_decoder_loss=0.3044, over 5795868.27 frames. ], batch size: 97, lr: 2.44e-02, grad_scale: 4.0 2024-09-16 20:20:59,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71700.0, ans=0.1 2024-09-16 20:21:12,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=71740.0, ans=0.125 2024-09-16 20:21:24,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71780.0, ans=0.1 2024-09-16 20:21:25,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71780.0, ans=0.1 2024-09-16 20:21:30,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=12.0 2024-09-16 20:21:35,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-09-16 20:22:00,939 INFO [train.py:1198] (1/2) Epoch 4, batch 4400, loss[loss=0.3199, ctc_loss=0.2468, cr_loss=0.4497, attn_decoder_loss=0.318, over 27179.00 frames. ], tot_loss[loss=0.3093, ctc_loss=0.2417, cr_loss=0.4471, attn_decoder_loss=0.3069, over 5766928.43 frames. ], batch size: 124, lr: 2.43e-02, grad_scale: 8.0 2024-09-16 20:22:01,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=71900.0, ans=0.0 2024-09-16 20:22:07,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.63 vs. limit=15.0 2024-09-16 20:22:08,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=71900.0, ans=0.125 2024-09-16 20:22:14,006 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.841e+01 1.227e+02 1.349e+02 1.608e+02 3.095e+02, threshold=2.698e+02, percent-clipped=2.0 2024-09-16 20:22:22,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.98 vs. limit=15.0 2024-09-16 20:22:27,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=71940.0, ans=0.125 2024-09-16 20:22:36,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=71980.0, ans=0.125 2024-09-16 20:22:41,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=71980.0, ans=10.0 2024-09-16 20:22:41,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=71980.0, ans=0.125 2024-09-16 20:23:16,016 INFO [train.py:1198] (1/2) Epoch 4, batch 4450, loss[loss=0.3421, ctc_loss=0.2993, cr_loss=0.4556, attn_decoder_loss=0.3367, over 20483.00 frames. ], tot_loss[loss=0.3132, ctc_loss=0.2484, cr_loss=0.4496, attn_decoder_loss=0.3104, over 5575591.55 frames. ], batch size: 209, lr: 2.43e-02, grad_scale: 4.0 2024-09-16 20:23:16,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=72100.0, ans=0.125 2024-09-16 20:23:40,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=72140.0, ans=0.0 2024-09-16 20:23:49,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=72180.0, ans=0.125 2024-09-16 20:24:02,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=72220.0, ans=0.125 2024-09-16 20:24:29,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=72300.0, ans=0.0 2024-09-16 20:24:31,032 INFO [train.py:1198] (1/2) Epoch 4, batch 4500, loss[loss=0.332, ctc_loss=0.2911, cr_loss=0.4231, attn_decoder_loss=0.3271, over 20184.00 frames. ], tot_loss[loss=0.3179, ctc_loss=0.2579, cr_loss=0.4503, attn_decoder_loss=0.3145, over 5237045.34 frames. ], batch size: 209, lr: 2.43e-02, grad_scale: 8.0 2024-09-16 20:24:46,108 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.245e+02 1.357e+02 1.541e+02 2.817e+02, threshold=2.714e+02, percent-clipped=1.0 2024-09-16 20:25:04,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72380.0, ans=0.1 2024-09-16 20:26:05,473 INFO [train.py:1198] (1/2) Epoch 5, batch 0, loss[loss=0.3362, ctc_loss=0.2134, cr_loss=0.4003, attn_decoder_loss=0.341, over 29603.00 frames. ], tot_loss[loss=0.3362, ctc_loss=0.2134, cr_loss=0.4003, attn_decoder_loss=0.341, over 29603.00 frames. ], batch size: 73, lr: 2.26e-02, grad_scale: 4.0 2024-09-16 20:26:05,473 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 20:26:24,506 INFO [train.py:1230] (1/2) Epoch 5, validation: loss=0.2407, ctc_loss=0.07934, cr_loss=4.486e-15, attn_decoder_loss=0.2587, over 944034.00 frames. 2024-09-16 20:26:24,507 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 20:26:25,968 WARNING [optim.py:503] (1/2) Scaling gradients by 0.06828752905130386, model_norm_threshold=271.39923095703125 2024-09-16 20:26:26,177 WARNING [optim.py:575] (1/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.embed.weight with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.372e+06, grad_sumsq=1.717e+06, orig_rms_sq=2.546e+00 2024-09-16 20:26:38,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=72440.0, ans=0.025 2024-09-16 20:26:43,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72440.0, ans=0.1 2024-09-16 20:26:55,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=72480.0, ans=0.125 2024-09-16 20:27:00,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2024-09-16 20:27:35,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-09-16 20:27:36,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2024-09-16 20:27:40,506 INFO [train.py:1198] (1/2) Epoch 5, batch 50, loss[loss=0.2771, ctc_loss=0.213, cr_loss=0.3604, attn_decoder_loss=0.2762, over 29438.00 frames. ], tot_loss[loss=0.3081, ctc_loss=0.2405, cr_loss=0.4418, attn_decoder_loss=0.3058, over 1267212.28 frames. ], batch size: 70, lr: 2.26e-02, grad_scale: 4.0 2024-09-16 20:27:44,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-09-16 20:27:46,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=72600.0, ans=0.0 2024-09-16 20:27:48,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=72600.0, ans=0.125 2024-09-16 20:27:49,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=72600.0, ans=0.0 2024-09-16 20:27:58,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=72640.0, ans=0.2 2024-09-16 20:28:05,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=72640.0, ans=0.2 2024-09-16 20:28:05,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=72640.0, ans=0.0 2024-09-16 20:28:26,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=72720.0, ans=0.125 2024-09-16 20:28:37,066 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.241e+02 1.473e+02 1.722e+02 3.974e+03, threshold=2.946e+02, percent-clipped=9.0 2024-09-16 20:28:38,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=72720.0, ans=0.0 2024-09-16 20:28:40,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=72720.0, ans=0.0 2024-09-16 20:28:42,009 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:28:58,584 INFO [train.py:1198] (1/2) Epoch 5, batch 100, loss[loss=0.2898, ctc_loss=0.2207, cr_loss=0.4276, attn_decoder_loss=0.288, over 29527.00 frames. ], tot_loss[loss=0.3062, ctc_loss=0.2373, cr_loss=0.4408, attn_decoder_loss=0.3041, over 2251916.75 frames. ], batch size: 76, lr: 2.25e-02, grad_scale: 8.0 2024-09-16 20:28:58,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=72800.0, ans=0.125 2024-09-16 20:29:07,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=72800.0, ans=0.0 2024-09-16 20:29:10,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=72800.0, ans=0.2 2024-09-16 20:29:11,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.72 vs. limit=15.0 2024-09-16 20:29:58,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=72960.0, ans=0.125 2024-09-16 20:30:00,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2024-09-16 20:30:01,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=72960.0, ans=0.0 2024-09-16 20:30:14,749 INFO [train.py:1198] (1/2) Epoch 5, batch 150, loss[loss=0.2609, ctc_loss=0.1913, cr_loss=0.3819, attn_decoder_loss=0.2601, over 29421.00 frames. ], tot_loss[loss=0.3027, ctc_loss=0.2341, cr_loss=0.4389, attn_decoder_loss=0.3005, over 3047127.16 frames. ], batch size: 70, lr: 2.25e-02, grad_scale: 4.0 2024-09-16 20:30:39,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.45 vs. limit=10.0 2024-09-16 20:31:09,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.201e+01 1.170e+02 1.302e+02 1.516e+02 3.725e+02, threshold=2.604e+02, percent-clipped=3.0 2024-09-16 20:31:16,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=73160.0, ans=0.125 2024-09-16 20:31:29,493 INFO [train.py:1198] (1/2) Epoch 5, batch 200, loss[loss=0.3185, ctc_loss=0.2572, cr_loss=0.4506, attn_decoder_loss=0.3153, over 27649.00 frames. ], tot_loss[loss=0.3009, ctc_loss=0.2328, cr_loss=0.4389, attn_decoder_loss=0.2987, over 3659830.64 frames. ], batch size: 125, lr: 2.25e-02, grad_scale: 8.0 2024-09-16 20:31:29,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=73200.0, ans=0.025 2024-09-16 20:31:30,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.97 vs. limit=15.0 2024-09-16 20:31:42,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=73200.0, ans=0.125 2024-09-16 20:32:17,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=73320.0, ans=0.025 2024-09-16 20:32:22,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2024-09-16 20:32:32,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=73360.0, ans=0.125 2024-09-16 20:32:35,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=73360.0, ans=0.1 2024-09-16 20:32:41,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=73360.0, ans=0.125 2024-09-16 20:32:46,878 INFO [train.py:1198] (1/2) Epoch 5, batch 250, loss[loss=0.3226, ctc_loss=0.2636, cr_loss=0.4511, attn_decoder_loss=0.3191, over 29228.00 frames. ], tot_loss[loss=0.3003, ctc_loss=0.2318, cr_loss=0.4389, attn_decoder_loss=0.2982, over 4140844.65 frames. ], batch size: 100, lr: 2.25e-02, grad_scale: 4.0 2024-09-16 20:33:21,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=73480.0, ans=0.0 2024-09-16 20:33:22,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2024-09-16 20:33:23,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=73480.0, ans=0.1 2024-09-16 20:33:28,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=73480.0, ans=15.0 2024-09-16 20:33:44,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.445e+01 1.169e+02 1.336e+02 1.491e+02 2.357e+02, threshold=2.672e+02, percent-clipped=0.0 2024-09-16 20:33:44,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=73520.0, ans=0.0 2024-09-16 20:33:54,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=73560.0, ans=0.125 2024-09-16 20:33:57,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2024-09-16 20:34:04,726 INFO [train.py:1198] (1/2) Epoch 5, batch 300, loss[loss=0.3256, ctc_loss=0.2637, cr_loss=0.4847, attn_decoder_loss=0.3217, over 29534.00 frames. ], tot_loss[loss=0.2997, ctc_loss=0.2313, cr_loss=0.4379, attn_decoder_loss=0.2976, over 4510347.54 frames. ], batch size: 92, lr: 2.24e-02, grad_scale: 8.0 2024-09-16 20:34:12,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=73600.0, ans=0.125 2024-09-16 20:34:17,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.03 vs. limit=10.0 2024-09-16 20:34:20,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73640.0, ans=0.1 2024-09-16 20:34:32,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=73640.0, ans=0.0 2024-09-16 20:34:35,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-09-16 20:34:48,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73720.0, ans=0.1 2024-09-16 20:35:01,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2024-09-16 20:35:06,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=73760.0, ans=0.09899494936611666 2024-09-16 20:35:19,692 INFO [train.py:1198] (1/2) Epoch 5, batch 350, loss[loss=0.2614, ctc_loss=0.1858, cr_loss=0.3824, attn_decoder_loss=0.2613, over 29337.00 frames. ], tot_loss[loss=0.2996, ctc_loss=0.2302, cr_loss=0.4379, attn_decoder_loss=0.2976, over 4797181.79 frames. ], batch size: 71, lr: 2.24e-02, grad_scale: 4.0 2024-09-16 20:35:20,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.48 vs. limit=10.0 2024-09-16 20:35:37,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=73840.0, ans=15.0 2024-09-16 20:35:52,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=73880.0, ans=10.0 2024-09-16 20:35:59,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=73880.0, ans=0.125 2024-09-16 20:36:16,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=73920.0, ans=0.1 2024-09-16 20:36:20,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.525e+01 1.174e+02 1.354e+02 1.521e+02 2.144e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-16 20:36:20,925 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:36:37,083 INFO [train.py:1198] (1/2) Epoch 5, batch 400, loss[loss=0.2948, ctc_loss=0.2188, cr_loss=0.451, attn_decoder_loss=0.2932, over 29713.00 frames. ], tot_loss[loss=0.2993, ctc_loss=0.2299, cr_loss=0.4374, attn_decoder_loss=0.2973, over 5026078.87 frames. ], batch size: 82, lr: 2.24e-02, grad_scale: 8.0 2024-09-16 20:36:42,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=74000.0, ans=0.07 2024-09-16 20:36:43,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=74000.0, ans=0.125 2024-09-16 20:37:00,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=74040.0, ans=0.0 2024-09-16 20:37:09,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=74080.0, ans=0.125 2024-09-16 20:37:46,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=74160.0, ans=0.0 2024-09-16 20:37:55,366 INFO [train.py:1198] (1/2) Epoch 5, batch 450, loss[loss=0.304, ctc_loss=0.225, cr_loss=0.4358, attn_decoder_loss=0.3031, over 29686.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2295, cr_loss=0.4377, attn_decoder_loss=0.2973, over 5189200.01 frames. ], batch size: 83, lr: 2.24e-02, grad_scale: 4.0 2024-09-16 20:37:58,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=74200.0, ans=0.125 2024-09-16 20:38:27,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=74280.0, ans=0.0 2024-09-16 20:38:32,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.21 vs. limit=15.0 2024-09-16 20:38:47,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=74320.0, ans=0.0 2024-09-16 20:38:52,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=74320.0, ans=0.025 2024-09-16 20:38:56,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.507e+01 1.148e+02 1.317e+02 1.480e+02 2.097e+02, threshold=2.634e+02, percent-clipped=0.0 2024-09-16 20:38:58,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=74360.0, ans=0.0 2024-09-16 20:38:59,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=74360.0, ans=0.125 2024-09-16 20:39:11,820 INFO [train.py:1198] (1/2) Epoch 5, batch 500, loss[loss=0.3213, ctc_loss=0.2413, cr_loss=0.479, attn_decoder_loss=0.3196, over 29442.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2283, cr_loss=0.4371, attn_decoder_loss=0.296, over 5331754.84 frames. ], batch size: 94, lr: 2.23e-02, grad_scale: 8.0 2024-09-16 20:39:21,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=74400.0, ans=0.0 2024-09-16 20:39:39,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=74440.0, ans=0.125 2024-09-16 20:39:41,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=74440.0, ans=0.0 2024-09-16 20:39:44,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=74480.0, ans=0.125 2024-09-16 20:39:54,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=74480.0, ans=0.125 2024-09-16 20:40:00,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=74520.0, ans=0.125 2024-09-16 20:40:00,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=74520.0, ans=0.0 2024-09-16 20:40:29,198 INFO [train.py:1198] (1/2) Epoch 5, batch 550, loss[loss=0.3057, ctc_loss=0.2367, cr_loss=0.4495, attn_decoder_loss=0.3034, over 28849.00 frames. ], tot_loss[loss=0.2981, ctc_loss=0.2289, cr_loss=0.437, attn_decoder_loss=0.2961, over 5423945.32 frames. ], batch size: 104, lr: 2.23e-02, grad_scale: 2.0 2024-09-16 20:40:39,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.04 vs. limit=15.0 2024-09-16 20:40:46,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=74640.0, ans=0.125 2024-09-16 20:40:49,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=74640.0, ans=0.125 2024-09-16 20:41:32,712 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.970e+01 1.190e+02 1.363e+02 1.590e+02 5.102e+02, threshold=2.726e+02, percent-clipped=4.0 2024-09-16 20:41:36,189 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:41:44,817 INFO [train.py:1198] (1/2) Epoch 5, batch 600, loss[loss=0.3156, ctc_loss=0.2471, cr_loss=0.4535, attn_decoder_loss=0.3131, over 29238.00 frames. ], tot_loss[loss=0.2981, ctc_loss=0.2284, cr_loss=0.4369, attn_decoder_loss=0.2961, over 5510764.29 frames. ], batch size: 100, lr: 2.23e-02, grad_scale: 4.0 2024-09-16 20:41:50,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.87 vs. limit=15.0 2024-09-16 20:41:51,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=74800.0, ans=0.0 2024-09-16 20:42:06,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=74840.0, ans=0.05 2024-09-16 20:42:11,362 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:42:17,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.84 vs. limit=15.0 2024-09-16 20:42:23,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=74880.0, ans=0.125 2024-09-16 20:42:51,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=74960.0, ans=0.2 2024-09-16 20:42:57,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=74960.0, ans=0.125 2024-09-16 20:43:00,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=75000.0, ans=0.125 2024-09-16 20:43:02,179 INFO [train.py:1198] (1/2) Epoch 5, batch 650, loss[loss=0.2998, ctc_loss=0.2315, cr_loss=0.4467, attn_decoder_loss=0.2975, over 29762.00 frames. ], tot_loss[loss=0.2968, ctc_loss=0.2268, cr_loss=0.4347, attn_decoder_loss=0.2949, over 5588035.79 frames. ], batch size: 81, lr: 2.23e-02, grad_scale: 4.0 2024-09-16 20:43:02,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=75000.0, ans=0.125 2024-09-16 20:43:06,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=75000.0, ans=0.025 2024-09-16 20:43:19,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=75040.0, ans=0.125 2024-09-16 20:43:20,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=75040.0, ans=0.125 2024-09-16 20:43:31,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=75040.0, ans=0.0 2024-09-16 20:43:33,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=75080.0, ans=0.07 2024-09-16 20:43:33,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=75080.0, ans=0.0 2024-09-16 20:43:45,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=75080.0, ans=0.025 2024-09-16 20:43:50,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=75120.0, ans=10.0 2024-09-16 20:44:07,668 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.399e+01 1.145e+02 1.260e+02 1.468e+02 2.396e+02, threshold=2.520e+02, percent-clipped=0.0 2024-09-16 20:44:13,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2024-09-16 20:44:20,245 INFO [train.py:1198] (1/2) Epoch 5, batch 700, loss[loss=0.2795, ctc_loss=0.2083, cr_loss=0.4069, attn_decoder_loss=0.2784, over 29537.00 frames. ], tot_loss[loss=0.2973, ctc_loss=0.2271, cr_loss=0.4351, attn_decoder_loss=0.2954, over 5637889.38 frames. ], batch size: 76, lr: 2.22e-02, grad_scale: 8.0 2024-09-16 20:44:40,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=22.5 2024-09-16 20:44:44,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=75240.0, ans=0.1 2024-09-16 20:44:49,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=75280.0, ans=0.0 2024-09-16 20:44:54,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=75280.0, ans=0.125 2024-09-16 20:45:01,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=75280.0, ans=0.2 2024-09-16 20:45:12,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=75320.0, ans=0.1 2024-09-16 20:45:12,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-09-16 20:45:21,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=75360.0, ans=0.025 2024-09-16 20:45:26,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=75360.0, ans=0.125 2024-09-16 20:45:35,701 INFO [train.py:1198] (1/2) Epoch 5, batch 750, loss[loss=0.3009, ctc_loss=0.2267, cr_loss=0.423, attn_decoder_loss=0.2998, over 29711.00 frames. ], tot_loss[loss=0.297, ctc_loss=0.2269, cr_loss=0.4352, attn_decoder_loss=0.2952, over 5676273.37 frames. ], batch size: 82, lr: 2.22e-02, grad_scale: 4.0 2024-09-16 20:45:43,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=75400.0, ans=0.015 2024-09-16 20:45:55,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=6.08 vs. limit=12.0 2024-09-16 20:46:00,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=75440.0, ans=0.1 2024-09-16 20:46:06,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=75480.0, ans=0.2 2024-09-16 20:46:21,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=75520.0, ans=0.1 2024-09-16 20:46:26,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=75520.0, ans=0.0 2024-09-16 20:46:27,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=75520.0, ans=0.125 2024-09-16 20:46:42,563 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.394e+01 1.181e+02 1.291e+02 1.489e+02 2.242e+02, threshold=2.582e+02, percent-clipped=0.0 2024-09-16 20:46:42,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=75560.0, ans=0.0 2024-09-16 20:46:44,608 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:46:53,299 INFO [train.py:1198] (1/2) Epoch 5, batch 800, loss[loss=0.2793, ctc_loss=0.2089, cr_loss=0.4036, attn_decoder_loss=0.2781, over 29577.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.2266, cr_loss=0.4353, attn_decoder_loss=0.2948, over 5707609.49 frames. ], batch size: 73, lr: 2.22e-02, grad_scale: 8.0 2024-09-16 20:47:14,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=75640.0, ans=0.125 2024-09-16 20:47:45,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=75720.0, ans=0.09899494936611666 2024-09-16 20:48:10,225 INFO [train.py:1198] (1/2) Epoch 5, batch 850, loss[loss=0.3086, ctc_loss=0.2305, cr_loss=0.4548, attn_decoder_loss=0.3072, over 29716.00 frames. ], tot_loss[loss=0.2963, ctc_loss=0.2259, cr_loss=0.435, attn_decoder_loss=0.2944, over 5736531.19 frames. ], batch size: 89, lr: 2.22e-02, grad_scale: 4.0 2024-09-16 20:48:18,019 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:48:30,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=6.52 vs. limit=12.0 2024-09-16 20:48:32,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=75840.0, ans=0.04949747468305833 2024-09-16 20:48:35,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=75840.0, ans=0.0 2024-09-16 20:48:54,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-09-16 20:49:07,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=75920.0, ans=0.2 2024-09-16 20:49:12,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=75960.0, ans=0.025 2024-09-16 20:49:16,706 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.020e+02 1.208e+02 1.339e+02 1.559e+02 5.118e+02, threshold=2.679e+02, percent-clipped=4.0 2024-09-16 20:49:17,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-09-16 20:49:24,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=76000.0, ans=0.125 2024-09-16 20:49:25,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2024-09-16 20:49:26,159 INFO [train.py:1198] (1/2) Epoch 5, batch 900, loss[loss=0.2688, ctc_loss=0.1969, cr_loss=0.4223, attn_decoder_loss=0.2674, over 29601.00 frames. ], tot_loss[loss=0.2968, ctc_loss=0.2266, cr_loss=0.4353, attn_decoder_loss=0.2949, over 5741510.24 frames. ], batch size: 73, lr: 2.21e-02, grad_scale: 8.0 2024-09-16 20:49:52,619 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:50:04,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=76080.0, ans=0.0 2024-09-16 20:50:26,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.54 vs. limit=15.0 2024-09-16 20:50:31,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=76160.0, ans=0.125 2024-09-16 20:50:33,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=76160.0, ans=0.1 2024-09-16 20:50:40,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=76160.0, ans=0.0 2024-09-16 20:50:43,185 INFO [train.py:1198] (1/2) Epoch 5, batch 950, loss[loss=0.2856, ctc_loss=0.2143, cr_loss=0.4221, attn_decoder_loss=0.2842, over 29497.00 frames. ], tot_loss[loss=0.2971, ctc_loss=0.227, cr_loss=0.4358, attn_decoder_loss=0.2952, over 5743342.75 frames. ], batch size: 74, lr: 2.21e-02, grad_scale: 4.0 2024-09-16 20:50:47,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=12.0 2024-09-16 20:51:12,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=76240.0, ans=0.125 2024-09-16 20:51:49,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=76360.0, ans=12.0 2024-09-16 20:51:52,953 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.976e+01 1.191e+02 1.361e+02 1.638e+02 5.772e+02, threshold=2.722e+02, percent-clipped=5.0 2024-09-16 20:51:54,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=76360.0, ans=0.2 2024-09-16 20:51:56,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=76360.0, ans=0.0 2024-09-16 20:51:57,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=76360.0, ans=0.125 2024-09-16 20:51:59,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=76400.0, ans=0.125 2024-09-16 20:52:00,419 INFO [train.py:1198] (1/2) Epoch 5, batch 1000, loss[loss=0.2903, ctc_loss=0.2187, cr_loss=0.4357, attn_decoder_loss=0.2885, over 29499.00 frames. ], tot_loss[loss=0.2981, ctc_loss=0.228, cr_loss=0.4367, attn_decoder_loss=0.2962, over 5738279.96 frames. ], batch size: 77, lr: 2.21e-02, grad_scale: 8.0 2024-09-16 20:52:01,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.19 vs. limit=15.0 2024-09-16 20:52:58,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=76520.0, ans=0.125 2024-09-16 20:53:15,781 INFO [train.py:1198] (1/2) Epoch 5, batch 1050, loss[loss=0.3053, ctc_loss=0.2297, cr_loss=0.4418, attn_decoder_loss=0.3039, over 29658.00 frames. ], tot_loss[loss=0.2968, ctc_loss=0.2266, cr_loss=0.435, attn_decoder_loss=0.295, over 5745881.51 frames. ], batch size: 85, lr: 2.21e-02, grad_scale: 4.0 2024-09-16 20:53:26,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=76600.0, ans=0.0 2024-09-16 20:53:34,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-16 20:53:42,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=76640.0, ans=0.09899494936611666 2024-09-16 20:53:57,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=76680.0, ans=0.125 2024-09-16 20:54:21,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-09-16 20:54:27,137 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.620e+01 1.158e+02 1.276e+02 1.580e+02 2.597e+02, threshold=2.552e+02, percent-clipped=0.0 2024-09-16 20:54:27,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=76760.0, ans=0.0 2024-09-16 20:54:33,690 INFO [train.py:1198] (1/2) Epoch 5, batch 1100, loss[loss=0.288, ctc_loss=0.2203, cr_loss=0.4151, attn_decoder_loss=0.2863, over 29473.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.227, cr_loss=0.4356, attn_decoder_loss=0.2953, over 5758659.38 frames. ], batch size: 78, lr: 2.20e-02, grad_scale: 8.0 2024-09-16 20:54:39,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=76800.0, ans=0.2 2024-09-16 20:54:57,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=76840.0, ans=0.025 2024-09-16 20:55:00,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=76840.0, ans=0.2 2024-09-16 20:55:06,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=24.65 vs. limit=22.5 2024-09-16 20:55:11,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=76880.0, ans=0.0 2024-09-16 20:55:27,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=76920.0, ans=0.125 2024-09-16 20:55:30,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=76920.0, ans=0.1 2024-09-16 20:55:41,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=76960.0, ans=0.2 2024-09-16 20:55:50,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=22.5 2024-09-16 20:55:51,249 INFO [train.py:1198] (1/2) Epoch 5, batch 1150, loss[loss=0.2917, ctc_loss=0.2241, cr_loss=0.4525, attn_decoder_loss=0.2892, over 29464.00 frames. ], tot_loss[loss=0.2968, ctc_loss=0.2265, cr_loss=0.435, attn_decoder_loss=0.2949, over 5756476.11 frames. ], batch size: 78, lr: 2.20e-02, grad_scale: 4.0 2024-09-16 20:56:12,698 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:56:21,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=77080.0, ans=0.125 2024-09-16 20:56:22,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=77080.0, ans=0.0 2024-09-16 20:56:29,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77080.0, ans=0.1 2024-09-16 20:56:45,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2024-09-16 20:56:50,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=77160.0, ans=0.07 2024-09-16 20:57:02,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.643e+01 1.177e+02 1.305e+02 1.494e+02 2.713e+02, threshold=2.610e+02, percent-clipped=1.0 2024-09-16 20:57:06,990 INFO [train.py:1198] (1/2) Epoch 5, batch 1200, loss[loss=0.2957, ctc_loss=0.2223, cr_loss=0.4383, attn_decoder_loss=0.2941, over 29651.00 frames. ], tot_loss[loss=0.2983, ctc_loss=0.2281, cr_loss=0.4364, attn_decoder_loss=0.2964, over 5749349.11 frames. ], batch size: 85, lr: 2.20e-02, grad_scale: 8.0 2024-09-16 20:57:13,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=77200.0, ans=0.025 2024-09-16 20:58:24,346 INFO [train.py:1198] (1/2) Epoch 5, batch 1250, loss[loss=0.3047, ctc_loss=0.2264, cr_loss=0.4517, attn_decoder_loss=0.3033, over 29531.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.2276, cr_loss=0.4372, attn_decoder_loss=0.2966, over 5776536.46 frames. ], batch size: 92, lr: 2.20e-02, grad_scale: 4.0 2024-09-16 20:58:24,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=77400.0, ans=0.07 2024-09-16 20:58:29,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=77400.0, ans=0.2 2024-09-16 20:58:30,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=77400.0, ans=0.0 2024-09-16 20:58:43,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=77440.0, ans=0.125 2024-09-16 20:58:52,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=77440.0, ans=0.5 2024-09-16 20:58:59,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77480.0, ans=0.1 2024-09-16 20:59:02,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=77480.0, ans=0.125 2024-09-16 20:59:10,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-09-16 20:59:31,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=77560.0, ans=0.1 2024-09-16 20:59:34,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=77560.0, ans=0.0 2024-09-16 20:59:38,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.175e+02 1.290e+02 1.503e+02 2.372e+02, threshold=2.579e+02, percent-clipped=0.0 2024-09-16 20:59:42,115 INFO [train.py:1198] (1/2) Epoch 5, batch 1300, loss[loss=0.3097, ctc_loss=0.24, cr_loss=0.4231, attn_decoder_loss=0.308, over 28260.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.2262, cr_loss=0.4354, attn_decoder_loss=0.2954, over 5781267.43 frames. ], batch size: 111, lr: 2.19e-02, grad_scale: 8.0 2024-09-16 20:59:45,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=77600.0, ans=0.2 2024-09-16 20:59:47,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=77600.0, ans=0.0 2024-09-16 20:59:52,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.34 vs. limit=15.0 2024-09-16 21:00:07,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=77640.0, ans=0.125 2024-09-16 21:00:27,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=77720.0, ans=0.125 2024-09-16 21:00:27,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=77720.0, ans=0.0 2024-09-16 21:00:36,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=77720.0, ans=0.2 2024-09-16 21:00:36,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=77720.0, ans=0.125 2024-09-16 21:00:39,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77720.0, ans=0.1 2024-09-16 21:00:53,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.24 vs. limit=15.0 2024-09-16 21:00:57,173 INFO [train.py:1198] (1/2) Epoch 5, batch 1350, loss[loss=0.3049, ctc_loss=0.2266, cr_loss=0.463, attn_decoder_loss=0.3033, over 29764.00 frames. ], tot_loss[loss=0.2962, ctc_loss=0.2247, cr_loss=0.4343, attn_decoder_loss=0.2945, over 5796458.46 frames. ], batch size: 81, lr: 2.19e-02, grad_scale: 4.0 2024-09-16 21:01:07,996 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:01:18,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2024-09-16 21:01:24,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=77840.0, ans=0.0 2024-09-16 21:01:33,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2024-09-16 21:01:50,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=77920.0, ans=0.0 2024-09-16 21:02:10,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=77960.0, ans=0.0 2024-09-16 21:02:12,849 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.348e+01 1.157e+02 1.281e+02 1.429e+02 2.166e+02, threshold=2.563e+02, percent-clipped=0.0 2024-09-16 21:02:14,435 INFO [train.py:1198] (1/2) Epoch 5, batch 1400, loss[loss=0.2703, ctc_loss=0.2072, cr_loss=0.424, attn_decoder_loss=0.2678, over 29593.00 frames. ], tot_loss[loss=0.2956, ctc_loss=0.2242, cr_loss=0.4337, attn_decoder_loss=0.2939, over 5807438.21 frames. ], batch size: 69, lr: 2.19e-02, grad_scale: 8.0 2024-09-16 21:02:14,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=78000.0, ans=0.2 2024-09-16 21:02:34,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=78040.0, ans=0.09899494936611666 2024-09-16 21:02:58,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=78080.0, ans=0.2 2024-09-16 21:03:01,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=78120.0, ans=0.2 2024-09-16 21:03:03,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=78120.0, ans=0.125 2024-09-16 21:03:09,348 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:03:10,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=78120.0, ans=0.125 2024-09-16 21:03:26,136 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:03:31,877 INFO [train.py:1198] (1/2) Epoch 5, batch 1450, loss[loss=0.3091, ctc_loss=0.2328, cr_loss=0.47, attn_decoder_loss=0.3071, over 29442.00 frames. ], tot_loss[loss=0.2964, ctc_loss=0.2249, cr_loss=0.4358, attn_decoder_loss=0.2947, over 5804563.19 frames. ], batch size: 94, lr: 2.19e-02, grad_scale: 4.0 2024-09-16 21:03:34,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=16.43 vs. limit=15.0 2024-09-16 21:03:41,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=78200.0, ans=0.0 2024-09-16 21:03:42,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=78200.0, ans=0.09899494936611666 2024-09-16 21:03:51,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=78240.0, ans=0.125 2024-09-16 21:04:21,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=78320.0, ans=0.125 2024-09-16 21:04:47,756 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.240e+02 1.382e+02 1.635e+02 6.361e+02, threshold=2.763e+02, percent-clipped=6.0 2024-09-16 21:04:47,777 INFO [train.py:1198] (1/2) Epoch 5, batch 1500, loss[loss=0.3056, ctc_loss=0.2337, cr_loss=0.4392, attn_decoder_loss=0.3039, over 29633.00 frames. ], tot_loss[loss=0.297, ctc_loss=0.2253, cr_loss=0.4368, attn_decoder_loss=0.2952, over 5805142.05 frames. ], batch size: 86, lr: 2.18e-02, grad_scale: 8.0 2024-09-16 21:04:48,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=78400.0, ans=0.125 2024-09-16 21:04:51,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=78400.0, ans=0.125 2024-09-16 21:05:35,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=78520.0, ans=0.125 2024-09-16 21:05:35,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=78520.0, ans=0.125 2024-09-16 21:05:36,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2024-09-16 21:05:39,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2024-09-16 21:05:41,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=78520.0, ans=0.125 2024-09-16 21:06:05,488 INFO [train.py:1198] (1/2) Epoch 5, batch 1550, loss[loss=0.317, ctc_loss=0.2461, cr_loss=0.4816, attn_decoder_loss=0.3142, over 29498.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.2252, cr_loss=0.4367, attn_decoder_loss=0.295, over 5780551.28 frames. ], batch size: 90, lr: 2.18e-02, grad_scale: 4.0 2024-09-16 21:06:11,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=78600.0, ans=0.0 2024-09-16 21:06:37,463 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:06:52,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=78720.0, ans=0.125 2024-09-16 21:07:00,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=78720.0, ans=0.2 2024-09-16 21:07:08,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2024-09-16 21:07:16,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=78760.0, ans=0.025 2024-09-16 21:07:18,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.54 vs. limit=22.5 2024-09-16 21:07:22,409 INFO [train.py:1198] (1/2) Epoch 5, batch 1600, loss[loss=0.3047, ctc_loss=0.2264, cr_loss=0.4508, attn_decoder_loss=0.3034, over 29679.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.2254, cr_loss=0.4359, attn_decoder_loss=0.2949, over 5763835.71 frames. ], batch size: 85, lr: 2.18e-02, grad_scale: 8.0 2024-09-16 21:07:23,867 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.848e+01 1.266e+02 1.474e+02 1.762e+02 4.006e+02, threshold=2.948e+02, percent-clipped=2.0 2024-09-16 21:07:30,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=78800.0, ans=0.0 2024-09-16 21:07:34,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=78800.0, ans=0.125 2024-09-16 21:07:38,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2024-09-16 21:07:46,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=78840.0, ans=0.025 2024-09-16 21:07:47,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.00 vs. limit=22.5 2024-09-16 21:07:51,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=78880.0, ans=0.125 2024-09-16 21:07:51,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=78880.0, ans=0.0 2024-09-16 21:08:21,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=78960.0, ans=0.0 2024-09-16 21:08:37,889 INFO [train.py:1198] (1/2) Epoch 5, batch 1650, loss[loss=0.3272, ctc_loss=0.2533, cr_loss=0.4703, attn_decoder_loss=0.3249, over 29728.00 frames. ], tot_loss[loss=0.2966, ctc_loss=0.2258, cr_loss=0.4357, attn_decoder_loss=0.2948, over 5758359.28 frames. ], batch size: 89, lr: 2.18e-02, grad_scale: 4.0 2024-09-16 21:08:38,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-09-16 21:08:51,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=79040.0, ans=0.125 2024-09-16 21:08:59,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=79040.0, ans=0.035 2024-09-16 21:09:03,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=15.0 2024-09-16 21:09:21,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79120.0, ans=0.1 2024-09-16 21:09:36,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=79120.0, ans=0.125 2024-09-16 21:09:55,746 INFO [train.py:1198] (1/2) Epoch 5, batch 1700, loss[loss=0.2633, ctc_loss=0.1992, cr_loss=0.3834, attn_decoder_loss=0.2619, over 29597.00 frames. ], tot_loss[loss=0.2961, ctc_loss=0.2247, cr_loss=0.4351, attn_decoder_loss=0.2943, over 5780138.55 frames. ], batch size: 69, lr: 2.17e-02, grad_scale: 8.0 2024-09-16 21:10:00,286 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.635e+01 1.159e+02 1.263e+02 1.450e+02 2.662e+02, threshold=2.527e+02, percent-clipped=0.0 2024-09-16 21:10:11,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=79240.0, ans=0.1 2024-09-16 21:10:14,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=79240.0, ans=0.035 2024-09-16 21:10:14,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=79240.0, ans=0.125 2024-09-16 21:10:25,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=79240.0, ans=0.125 2024-09-16 21:10:44,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=79320.0, ans=0.0 2024-09-16 21:11:07,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=79360.0, ans=0.125 2024-09-16 21:11:12,986 INFO [train.py:1198] (1/2) Epoch 5, batch 1750, loss[loss=0.2665, ctc_loss=0.1991, cr_loss=0.398, attn_decoder_loss=0.2651, over 29354.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2242, cr_loss=0.4349, attn_decoder_loss=0.294, over 5788561.03 frames. ], batch size: 67, lr: 2.17e-02, grad_scale: 4.0 2024-09-16 21:11:23,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=79400.0, ans=0.1 2024-09-16 21:11:42,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=79480.0, ans=0.1 2024-09-16 21:11:42,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=79480.0, ans=0.125 2024-09-16 21:11:48,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=79480.0, ans=0.0 2024-09-16 21:11:51,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=79480.0, ans=0.125 2024-09-16 21:11:51,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=79480.0, ans=0.125 2024-09-16 21:12:02,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-09-16 21:12:05,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2024-09-16 21:12:13,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.13 vs. limit=10.0 2024-09-16 21:12:16,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2024-09-16 21:12:28,476 INFO [train.py:1198] (1/2) Epoch 5, batch 1800, loss[loss=0.2979, ctc_loss=0.22, cr_loss=0.4189, attn_decoder_loss=0.2973, over 29690.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.224, cr_loss=0.4348, attn_decoder_loss=0.294, over 5790852.43 frames. ], batch size: 83, lr: 2.17e-02, grad_scale: 8.0 2024-09-16 21:12:34,617 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.637e+01 1.099e+02 1.224e+02 1.443e+02 2.616e+02, threshold=2.449e+02, percent-clipped=2.0 2024-09-16 21:12:53,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2024-09-16 21:13:06,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=79680.0, ans=0.125 2024-09-16 21:13:26,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=79720.0, ans=0.0 2024-09-16 21:13:29,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=18.04 vs. limit=15.0 2024-09-16 21:13:43,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=79760.0, ans=10.0 2024-09-16 21:13:46,150 INFO [train.py:1198] (1/2) Epoch 5, batch 1850, loss[loss=0.2992, ctc_loss=0.2216, cr_loss=0.4125, attn_decoder_loss=0.2987, over 29631.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2242, cr_loss=0.4355, attn_decoder_loss=0.2941, over 5794109.61 frames. ], batch size: 86, lr: 2.17e-02, grad_scale: 4.0 2024-09-16 21:14:42,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=79920.0, ans=0.2 2024-09-16 21:14:44,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.28 vs. limit=10.0 2024-09-16 21:14:54,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=79960.0, ans=0.0 2024-09-16 21:14:59,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=12.0 2024-09-16 21:15:10,169 INFO [train.py:1198] (1/2) Epoch 5, batch 1900, loss[loss=0.3013, ctc_loss=0.2228, cr_loss=0.4309, attn_decoder_loss=0.3005, over 29732.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2238, cr_loss=0.4351, attn_decoder_loss=0.2941, over 5803764.62 frames. ], batch size: 89, lr: 2.16e-02, grad_scale: 8.0 2024-09-16 21:15:17,676 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.933e+01 1.145e+02 1.241e+02 1.387e+02 2.102e+02, threshold=2.481e+02, percent-clipped=0.0 2024-09-16 21:15:17,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=80000.0, ans=0.125 2024-09-16 21:15:27,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=80040.0, ans=0.1 2024-09-16 21:15:39,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=80080.0, ans=0.125 2024-09-16 21:15:58,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.85 vs. limit=15.0 2024-09-16 21:16:03,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=80120.0, ans=0.0 2024-09-16 21:16:26,173 INFO [train.py:1198] (1/2) Epoch 5, batch 1950, loss[loss=0.2956, ctc_loss=0.2204, cr_loss=0.4205, attn_decoder_loss=0.2946, over 29453.00 frames. ], tot_loss[loss=0.2974, ctc_loss=0.2252, cr_loss=0.4375, attn_decoder_loss=0.2957, over 5818778.34 frames. ], batch size: 78, lr: 2.16e-02, grad_scale: 4.0 2024-09-16 21:16:31,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=80200.0, ans=0.125 2024-09-16 21:16:40,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=80240.0, ans=0.125 2024-09-16 21:16:56,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=80280.0, ans=0.1 2024-09-16 21:17:04,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=80280.0, ans=0.125 2024-09-16 21:17:16,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=80320.0, ans=0.125 2024-09-16 21:17:27,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=80360.0, ans=0.025 2024-09-16 21:17:28,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=80360.0, ans=0.0 2024-09-16 21:17:38,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2024-09-16 21:17:40,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=80360.0, ans=0.0 2024-09-16 21:17:42,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.83 vs. limit=22.5 2024-09-16 21:17:43,615 INFO [train.py:1198] (1/2) Epoch 5, batch 2000, loss[loss=0.2621, ctc_loss=0.1984, cr_loss=0.4004, attn_decoder_loss=0.2603, over 29352.00 frames. ], tot_loss[loss=0.2981, ctc_loss=0.2263, cr_loss=0.438, attn_decoder_loss=0.2964, over 5797024.42 frames. ], batch size: 67, lr: 2.16e-02, grad_scale: 8.0 2024-09-16 21:17:52,704 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.796e+01 1.236e+02 1.402e+02 1.608e+02 2.421e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-16 21:17:53,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=80400.0, ans=0.0 2024-09-16 21:18:29,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=80520.0, ans=0.0 2024-09-16 21:18:35,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=80520.0, ans=0.0 2024-09-16 21:18:40,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80520.0, ans=0.1 2024-09-16 21:18:55,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=80560.0, ans=0.125 2024-09-16 21:18:58,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=80560.0, ans=0.125 2024-09-16 21:19:01,084 INFO [train.py:1198] (1/2) Epoch 5, batch 2050, loss[loss=0.2629, ctc_loss=0.1894, cr_loss=0.3951, attn_decoder_loss=0.2623, over 29406.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.225, cr_loss=0.4363, attn_decoder_loss=0.295, over 5789017.75 frames. ], batch size: 70, lr: 2.16e-02, grad_scale: 4.0 2024-09-16 21:19:04,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=80600.0, ans=0.0 2024-09-16 21:19:30,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=80680.0, ans=0.125 2024-09-16 21:19:43,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=80680.0, ans=0.0 2024-09-16 21:19:55,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-09-16 21:19:59,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2024-09-16 21:20:00,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=80760.0, ans=0.2 2024-09-16 21:20:07,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=80760.0, ans=0.0 2024-09-16 21:20:13,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=80760.0, ans=0.125 2024-09-16 21:20:15,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=80800.0, ans=0.125 2024-09-16 21:20:17,202 INFO [train.py:1198] (1/2) Epoch 5, batch 2100, loss[loss=0.2949, ctc_loss=0.2235, cr_loss=0.4564, attn_decoder_loss=0.2927, over 29758.00 frames. ], tot_loss[loss=0.2955, ctc_loss=0.2236, cr_loss=0.435, attn_decoder_loss=0.2938, over 5801450.23 frames. ], batch size: 81, lr: 2.15e-02, grad_scale: 8.0 2024-09-16 21:20:22,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80800.0, ans=0.1 2024-09-16 21:20:23,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=80800.0, ans=0.0 2024-09-16 21:20:27,408 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.761e+01 1.220e+02 1.373e+02 1.548e+02 8.609e+02, threshold=2.746e+02, percent-clipped=3.0 2024-09-16 21:20:55,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=12.0 2024-09-16 21:21:14,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.99 vs. limit=22.5 2024-09-16 21:21:31,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=80960.0, ans=0.125 2024-09-16 21:21:34,120 INFO [train.py:1198] (1/2) Epoch 5, batch 2150, loss[loss=0.2808, ctc_loss=0.2137, cr_loss=0.4592, attn_decoder_loss=0.278, over 29432.00 frames. ], tot_loss[loss=0.2943, ctc_loss=0.2219, cr_loss=0.4343, attn_decoder_loss=0.2927, over 5816314.16 frames. ], batch size: 78, lr: 2.15e-02, grad_scale: 4.0 2024-09-16 21:21:38,876 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:21:38,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=81000.0, ans=0.2 2024-09-16 21:21:54,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=81040.0, ans=0.2 2024-09-16 21:22:11,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=81080.0, ans=0.125 2024-09-16 21:22:18,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=81080.0, ans=0.2 2024-09-16 21:22:51,824 INFO [train.py:1198] (1/2) Epoch 5, batch 2200, loss[loss=0.2968, ctc_loss=0.2183, cr_loss=0.419, attn_decoder_loss=0.2962, over 29624.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2227, cr_loss=0.4354, attn_decoder_loss=0.2931, over 5812410.71 frames. ], batch size: 86, lr: 2.15e-02, grad_scale: 8.0 2024-09-16 21:23:02,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.674e+01 1.183e+02 1.300e+02 1.517e+02 2.352e+02, threshold=2.600e+02, percent-clipped=0.0 2024-09-16 21:23:08,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=81240.0, ans=0.0 2024-09-16 21:23:09,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81240.0, ans=0.125 2024-09-16 21:23:17,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=81240.0, ans=0.0 2024-09-16 21:23:21,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.99 vs. limit=15.0 2024-09-16 21:23:43,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81320.0, ans=0.1 2024-09-16 21:23:50,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=81360.0, ans=0.04949747468305833 2024-09-16 21:24:07,113 INFO [train.py:1198] (1/2) Epoch 5, batch 2250, loss[loss=0.2997, ctc_loss=0.2249, cr_loss=0.4246, attn_decoder_loss=0.2986, over 29712.00 frames. ], tot_loss[loss=0.2946, ctc_loss=0.2225, cr_loss=0.4346, attn_decoder_loss=0.2929, over 5811740.83 frames. ], batch size: 82, lr: 2.15e-02, grad_scale: 4.0 2024-09-16 21:24:11,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-09-16 21:24:15,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.50 vs. limit=22.5 2024-09-16 21:24:31,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=81440.0, ans=0.1 2024-09-16 21:24:32,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=81440.0, ans=0.2 2024-09-16 21:25:02,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=81520.0, ans=0.125 2024-09-16 21:25:18,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.30 vs. limit=15.0 2024-09-16 21:25:19,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=81560.0, ans=0.125 2024-09-16 21:25:23,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=81600.0, ans=0.125 2024-09-16 21:25:25,069 INFO [train.py:1198] (1/2) Epoch 5, batch 2300, loss[loss=0.2634, ctc_loss=0.189, cr_loss=0.3993, attn_decoder_loss=0.2628, over 29335.00 frames. ], tot_loss[loss=0.294, ctc_loss=0.2224, cr_loss=0.4339, attn_decoder_loss=0.2923, over 5799226.88 frames. ], batch size: 71, lr: 2.15e-02, grad_scale: 8.0 2024-09-16 21:25:38,307 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.849e+01 1.191e+02 1.337e+02 1.602e+02 2.823e+02, threshold=2.675e+02, percent-clipped=3.0 2024-09-16 21:25:46,302 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:26:42,309 INFO [train.py:1198] (1/2) Epoch 5, batch 2350, loss[loss=0.3109, ctc_loss=0.2432, cr_loss=0.4497, attn_decoder_loss=0.3084, over 29692.00 frames. ], tot_loss[loss=0.2944, ctc_loss=0.2228, cr_loss=0.4349, attn_decoder_loss=0.2927, over 5804218.41 frames. ], batch size: 83, lr: 2.14e-02, grad_scale: 4.0 2024-09-16 21:26:55,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=81840.0, ans=0.0 2024-09-16 21:27:01,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=81840.0, ans=0.0 2024-09-16 21:27:06,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=81840.0, ans=0.025 2024-09-16 21:27:21,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=81880.0, ans=0.07 2024-09-16 21:27:44,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=81960.0, ans=0.125 2024-09-16 21:27:47,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=81960.0, ans=0.0 2024-09-16 21:27:52,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-16 21:27:58,015 INFO [train.py:1198] (1/2) Epoch 5, batch 2400, loss[loss=0.2844, ctc_loss=0.212, cr_loss=0.4188, attn_decoder_loss=0.2831, over 29539.00 frames. ], tot_loss[loss=0.2951, ctc_loss=0.2233, cr_loss=0.4352, attn_decoder_loss=0.2934, over 5807577.23 frames. ], batch size: 76, lr: 2.14e-02, grad_scale: 8.0 2024-09-16 21:28:13,104 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.326e+01 1.225e+02 1.360e+02 1.581e+02 2.424e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-16 21:28:13,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=82040.0, ans=0.035 2024-09-16 21:28:14,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=82040.0, ans=0.125 2024-09-16 21:28:33,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.29 vs. limit=15.0 2024-09-16 21:28:39,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=82080.0, ans=0.0 2024-09-16 21:29:07,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=82160.0, ans=15.0 2024-09-16 21:29:09,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=82160.0, ans=0.125 2024-09-16 21:29:11,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=82160.0, ans=0.0 2024-09-16 21:29:14,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=82200.0, ans=0.95 2024-09-16 21:29:15,781 INFO [train.py:1198] (1/2) Epoch 5, batch 2450, loss[loss=0.3016, ctc_loss=0.2262, cr_loss=0.4549, attn_decoder_loss=0.2998, over 29719.00 frames. ], tot_loss[loss=0.2962, ctc_loss=0.2242, cr_loss=0.4356, attn_decoder_loss=0.2945, over 5784884.97 frames. ], batch size: 82, lr: 2.14e-02, grad_scale: 4.0 2024-09-16 21:29:17,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=82200.0, ans=0.0 2024-09-16 21:29:45,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=82240.0, ans=0.125 2024-09-16 21:29:49,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-09-16 21:29:52,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=82280.0, ans=0.0 2024-09-16 21:30:33,938 INFO [train.py:1198] (1/2) Epoch 5, batch 2500, loss[loss=0.3037, ctc_loss=0.2261, cr_loss=0.4433, attn_decoder_loss=0.3025, over 29655.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2238, cr_loss=0.436, attn_decoder_loss=0.2941, over 5795298.12 frames. ], batch size: 86, lr: 2.14e-02, grad_scale: 8.0 2024-09-16 21:30:50,578 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.347e+01 1.183e+02 1.324e+02 1.493e+02 3.213e+02, threshold=2.647e+02, percent-clipped=2.0 2024-09-16 21:30:59,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.04 vs. limit=22.5 2024-09-16 21:31:20,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.59 vs. limit=15.0 2024-09-16 21:31:49,664 INFO [train.py:1198] (1/2) Epoch 5, batch 2550, loss[loss=0.2695, ctc_loss=0.2029, cr_loss=0.3962, attn_decoder_loss=0.2681, over 29302.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2235, cr_loss=0.436, attn_decoder_loss=0.294, over 5798909.23 frames. ], batch size: 67, lr: 2.13e-02, grad_scale: 4.0 2024-09-16 21:32:04,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=82640.0, ans=0.2 2024-09-16 21:32:33,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=82720.0, ans=0.1 2024-09-16 21:32:36,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=82720.0, ans=0.125 2024-09-16 21:32:44,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=82720.0, ans=0.0 2024-09-16 21:32:44,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-09-16 21:32:46,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82720.0, ans=0.1 2024-09-16 21:33:04,970 INFO [train.py:1198] (1/2) Epoch 5, batch 2600, loss[loss=0.2849, ctc_loss=0.2042, cr_loss=0.4234, attn_decoder_loss=0.2845, over 29433.00 frames. ], tot_loss[loss=0.2961, ctc_loss=0.2238, cr_loss=0.4364, attn_decoder_loss=0.2944, over 5794567.13 frames. ], batch size: 78, lr: 2.13e-02, grad_scale: 8.0 2024-09-16 21:33:22,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=82840.0, ans=0.2 2024-09-16 21:33:25,239 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.991e+01 1.177e+02 1.349e+02 1.549e+02 3.059e+02, threshold=2.698e+02, percent-clipped=1.0 2024-09-16 21:33:25,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82840.0, ans=0.1 2024-09-16 21:33:31,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82840.0, ans=0.1 2024-09-16 21:33:37,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=82880.0, ans=0.0 2024-09-16 21:33:47,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=82880.0, ans=0.125 2024-09-16 21:33:53,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=82920.0, ans=0.5 2024-09-16 21:34:03,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82920.0, ans=0.1 2024-09-16 21:34:05,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82920.0, ans=0.1 2024-09-16 21:34:14,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82960.0, ans=0.1 2024-09-16 21:34:24,540 INFO [train.py:1198] (1/2) Epoch 5, batch 2650, loss[loss=0.314, ctc_loss=0.2459, cr_loss=0.4779, attn_decoder_loss=0.311, over 29250.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2233, cr_loss=0.4363, attn_decoder_loss=0.2941, over 5801238.42 frames. ], batch size: 100, lr: 2.13e-02, grad_scale: 4.0 2024-09-16 21:34:24,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=83000.0, ans=0.125 2024-09-16 21:34:34,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.83 vs. limit=15.0 2024-09-16 21:35:10,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-09-16 21:35:33,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.28 vs. limit=22.5 2024-09-16 21:35:38,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=83200.0, ans=0.125 2024-09-16 21:35:40,217 INFO [train.py:1198] (1/2) Epoch 5, batch 2700, loss[loss=0.3109, ctc_loss=0.2317, cr_loss=0.4501, attn_decoder_loss=0.3098, over 29550.00 frames. ], tot_loss[loss=0.2964, ctc_loss=0.2238, cr_loss=0.4377, attn_decoder_loss=0.2947, over 5796712.50 frames. ], batch size: 87, lr: 2.13e-02, grad_scale: 8.0 2024-09-16 21:35:40,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=83200.0, ans=0.0 2024-09-16 21:35:43,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=83200.0, ans=0.125 2024-09-16 21:35:55,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=83240.0, ans=0.2 2024-09-16 21:35:59,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.832e+01 1.218e+02 1.347e+02 1.527e+02 8.149e+02, threshold=2.695e+02, percent-clipped=3.0 2024-09-16 21:36:41,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=83360.0, ans=0.2 2024-09-16 21:36:45,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=83360.0, ans=0.0 2024-09-16 21:36:56,079 INFO [train.py:1198] (1/2) Epoch 5, batch 2750, loss[loss=0.2851, ctc_loss=0.2179, cr_loss=0.4061, attn_decoder_loss=0.2836, over 29544.00 frames. ], tot_loss[loss=0.2947, ctc_loss=0.2224, cr_loss=0.4353, attn_decoder_loss=0.2931, over 5795131.23 frames. ], batch size: 75, lr: 2.12e-02, grad_scale: 4.0 2024-09-16 21:37:24,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=83440.0, ans=0.035 2024-09-16 21:37:34,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2024-09-16 21:37:48,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=83520.0, ans=0.125 2024-09-16 21:37:50,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=83520.0, ans=0.0 2024-09-16 21:37:53,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-16 21:37:57,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=83520.0, ans=0.125 2024-09-16 21:38:02,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=83560.0, ans=0.2 2024-09-16 21:38:03,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=83560.0, ans=0.0 2024-09-16 21:38:15,623 INFO [train.py:1198] (1/2) Epoch 5, batch 2800, loss[loss=0.3413, ctc_loss=0.3019, cr_loss=0.4591, attn_decoder_loss=0.3355, over 20413.00 frames. ], tot_loss[loss=0.2951, ctc_loss=0.2228, cr_loss=0.4354, attn_decoder_loss=0.2934, over 5776059.39 frames. ], batch size: 210, lr: 2.12e-02, grad_scale: 8.0 2024-09-16 21:38:16,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=15.0 2024-09-16 21:38:26,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=83600.0, ans=0.125 2024-09-16 21:38:30,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=83640.0, ans=0.0 2024-09-16 21:38:36,673 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.137e+02 1.290e+02 1.487e+02 2.968e+02, threshold=2.580e+02, percent-clipped=1.0 2024-09-16 21:38:41,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=83640.0, ans=0.0 2024-09-16 21:38:46,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=83680.0, ans=0.125 2024-09-16 21:38:46,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=83680.0, ans=0.0 2024-09-16 21:38:49,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=83680.0, ans=0.09899494936611666 2024-09-16 21:38:53,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=83680.0, ans=0.125 2024-09-16 21:38:59,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=83720.0, ans=0.0 2024-09-16 21:39:02,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=83720.0, ans=0.0 2024-09-16 21:39:22,053 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:39:30,816 INFO [train.py:1198] (1/2) Epoch 5, batch 2850, loss[loss=0.2909, ctc_loss=0.2178, cr_loss=0.448, attn_decoder_loss=0.289, over 29506.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2232, cr_loss=0.4355, attn_decoder_loss=0.2941, over 5762380.63 frames. ], batch size: 77, lr: 2.12e-02, grad_scale: 4.0 2024-09-16 21:39:37,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-09-16 21:39:55,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=83840.0, ans=0.025 2024-09-16 21:40:20,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2024-09-16 21:40:22,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=83920.0, ans=0.125 2024-09-16 21:40:22,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=83920.0, ans=0.5 2024-09-16 21:40:23,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2024-09-16 21:40:40,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83960.0, ans=0.1 2024-09-16 21:40:47,090 INFO [train.py:1198] (1/2) Epoch 5, batch 2900, loss[loss=0.2957, ctc_loss=0.2264, cr_loss=0.4551, attn_decoder_loss=0.2933, over 29441.00 frames. ], tot_loss[loss=0.2969, ctc_loss=0.2242, cr_loss=0.4369, attn_decoder_loss=0.2952, over 5788279.91 frames. ], batch size: 79, lr: 2.12e-02, grad_scale: 8.0 2024-09-16 21:40:53,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=84000.0, ans=0.2 2024-09-16 21:41:12,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=84040.0, ans=0.025 2024-09-16 21:41:13,837 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.341e+01 1.106e+02 1.208e+02 1.366e+02 2.377e+02, threshold=2.415e+02, percent-clipped=0.0 2024-09-16 21:41:14,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=84040.0, ans=0.05 2024-09-16 21:41:20,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=84080.0, ans=0.125 2024-09-16 21:41:44,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=84120.0, ans=0.2 2024-09-16 21:41:47,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=84120.0, ans=0.0 2024-09-16 21:41:49,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.03 vs. limit=10.0 2024-09-16 21:41:50,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-09-16 21:41:57,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=84160.0, ans=0.95 2024-09-16 21:42:06,519 INFO [train.py:1198] (1/2) Epoch 5, batch 2950, loss[loss=0.2773, ctc_loss=0.1999, cr_loss=0.4101, attn_decoder_loss=0.2768, over 29519.00 frames. ], tot_loss[loss=0.2952, ctc_loss=0.2226, cr_loss=0.4345, attn_decoder_loss=0.2937, over 5782344.26 frames. ], batch size: 75, lr: 2.12e-02, grad_scale: 4.0 2024-09-16 21:42:11,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84200.0, ans=0.1 2024-09-16 21:42:40,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=84280.0, ans=0.2 2024-09-16 21:42:45,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.79 vs. limit=15.0 2024-09-16 21:42:47,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=84280.0, ans=0.125 2024-09-16 21:42:51,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=22.5 2024-09-16 21:42:52,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=84320.0, ans=0.025 2024-09-16 21:42:59,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=84320.0, ans=0.125 2024-09-16 21:43:02,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=84320.0, ans=0.125 2024-09-16 21:43:22,231 INFO [train.py:1198] (1/2) Epoch 5, batch 3000, loss[loss=0.3006, ctc_loss=0.2313, cr_loss=0.4611, attn_decoder_loss=0.2981, over 29761.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2221, cr_loss=0.4337, attn_decoder_loss=0.2932, over 5783036.95 frames. ], batch size: 81, lr: 2.11e-02, grad_scale: 8.0 2024-09-16 21:43:22,231 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 21:43:33,918 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.6347, 3.7815, 2.7621, 2.9047, 2.3242, 3.2492, 3.8772, 3.6092], device='cuda:1') 2024-09-16 21:43:40,543 INFO [train.py:1230] (1/2) Epoch 5, validation: loss=0.2221, ctc_loss=0.06863, cr_loss=4.342e-15, attn_decoder_loss=0.2392, over 944034.00 frames. 2024-09-16 21:43:40,544 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 21:43:49,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.32 vs. limit=10.0 2024-09-16 21:43:56,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-09-16 21:44:03,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=84440.0, ans=0.0 2024-09-16 21:44:04,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.812e+01 1.181e+02 1.340e+02 1.602e+02 4.120e+02, threshold=2.680e+02, percent-clipped=4.0 2024-09-16 21:44:09,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=84480.0, ans=0.0 2024-09-16 21:44:24,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=84520.0, ans=0.2 2024-09-16 21:44:35,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=12.0 2024-09-16 21:44:49,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=84560.0, ans=0.2 2024-09-16 21:45:00,282 INFO [train.py:1198] (1/2) Epoch 5, batch 3050, loss[loss=0.2966, ctc_loss=0.2274, cr_loss=0.4636, attn_decoder_loss=0.294, over 29519.00 frames. ], tot_loss[loss=0.2949, ctc_loss=0.2221, cr_loss=0.4343, attn_decoder_loss=0.2934, over 5777056.19 frames. ], batch size: 76, lr: 2.11e-02, grad_scale: 4.0 2024-09-16 21:45:05,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=84600.0, ans=0.2 2024-09-16 21:45:07,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=12.0 2024-09-16 21:45:20,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2024-09-16 21:45:23,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.80 vs. limit=10.0 2024-09-16 21:45:26,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=84640.0, ans=0.125 2024-09-16 21:45:35,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=84680.0, ans=0.125 2024-09-16 21:45:45,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.36 vs. limit=22.5 2024-09-16 21:45:47,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=84720.0, ans=0.125 2024-09-16 21:45:56,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2024-09-16 21:46:01,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=84760.0, ans=0.0 2024-09-16 21:46:02,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=84760.0, ans=0.0 2024-09-16 21:46:11,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=84760.0, ans=0.05 2024-09-16 21:46:16,180 INFO [train.py:1198] (1/2) Epoch 5, batch 3100, loss[loss=0.2985, ctc_loss=0.2212, cr_loss=0.4211, attn_decoder_loss=0.2977, over 29255.00 frames. ], tot_loss[loss=0.2942, ctc_loss=0.2211, cr_loss=0.4332, attn_decoder_loss=0.2927, over 5777239.34 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 8.0 2024-09-16 21:46:35,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84840.0, ans=0.1 2024-09-16 21:46:41,655 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.874e+01 1.199e+02 1.306e+02 1.594e+02 3.534e+02, threshold=2.612e+02, percent-clipped=1.0 2024-09-16 21:46:44,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=84880.0, ans=0.125 2024-09-16 21:46:48,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-09-16 21:47:07,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=84920.0, ans=0.125 2024-09-16 21:47:31,528 INFO [train.py:1198] (1/2) Epoch 5, batch 3150, loss[loss=0.3178, ctc_loss=0.2435, cr_loss=0.4778, attn_decoder_loss=0.3155, over 28841.00 frames. ], tot_loss[loss=0.294, ctc_loss=0.2206, cr_loss=0.4336, attn_decoder_loss=0.2925, over 5783936.66 frames. ], batch size: 104, lr: 2.11e-02, grad_scale: 4.0 2024-09-16 21:47:34,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2024-09-16 21:47:34,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=85000.0, ans=0.0 2024-09-16 21:47:51,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=85040.0, ans=0.0 2024-09-16 21:47:57,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=85040.0, ans=0.2 2024-09-16 21:48:13,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=85080.0, ans=0.0 2024-09-16 21:48:13,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=85080.0, ans=0.2 2024-09-16 21:48:18,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=85120.0, ans=0.0 2024-09-16 21:48:24,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=85120.0, ans=0.0 2024-09-16 21:48:28,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=85120.0, ans=0.125 2024-09-16 21:48:45,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=85160.0, ans=0.95 2024-09-16 21:48:50,891 INFO [train.py:1198] (1/2) Epoch 5, batch 3200, loss[loss=0.277, ctc_loss=0.1958, cr_loss=0.4129, attn_decoder_loss=0.2768, over 29782.00 frames. ], tot_loss[loss=0.2931, ctc_loss=0.2196, cr_loss=0.4329, attn_decoder_loss=0.2916, over 5794225.78 frames. ], batch size: 80, lr: 2.10e-02, grad_scale: 8.0 2024-09-16 21:49:09,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=85240.0, ans=22.5 2024-09-16 21:49:18,344 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.158e+01 1.087e+02 1.227e+02 1.343e+02 2.511e+02, threshold=2.453e+02, percent-clipped=0.0 2024-09-16 21:49:18,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=85240.0, ans=0.125 2024-09-16 21:49:20,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=85280.0, ans=0.0 2024-09-16 21:49:27,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=85280.0, ans=0.2 2024-09-16 21:49:53,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=85360.0, ans=0.125 2024-09-16 21:50:07,123 INFO [train.py:1198] (1/2) Epoch 5, batch 3250, loss[loss=0.3012, ctc_loss=0.2351, cr_loss=0.4185, attn_decoder_loss=0.2993, over 29700.00 frames. ], tot_loss[loss=0.2937, ctc_loss=0.2199, cr_loss=0.434, attn_decoder_loss=0.2922, over 5800014.28 frames. ], batch size: 84, lr: 2.10e-02, grad_scale: 4.0 2024-09-16 21:51:22,828 INFO [train.py:1198] (1/2) Epoch 5, batch 3300, loss[loss=0.3163, ctc_loss=0.2485, cr_loss=0.4695, attn_decoder_loss=0.3133, over 28276.00 frames. ], tot_loss[loss=0.2925, ctc_loss=0.2193, cr_loss=0.4328, attn_decoder_loss=0.2911, over 5798095.83 frames. ], batch size: 111, lr: 2.10e-02, grad_scale: 8.0 2024-09-16 21:51:33,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=85600.0, ans=0.04949747468305833 2024-09-16 21:51:36,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=85640.0, ans=0.125 2024-09-16 21:51:51,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.589e+01 1.170e+02 1.337e+02 1.496e+02 4.068e+02, threshold=2.673e+02, percent-clipped=4.0 2024-09-16 21:51:54,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=8.0 2024-09-16 21:52:17,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=85720.0, ans=0.125 2024-09-16 21:52:40,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.49 vs. limit=15.0 2024-09-16 21:52:42,494 INFO [train.py:1198] (1/2) Epoch 5, batch 3350, loss[loss=0.3172, ctc_loss=0.2472, cr_loss=0.476, attn_decoder_loss=0.3144, over 28768.00 frames. ], tot_loss[loss=0.2938, ctc_loss=0.2209, cr_loss=0.4345, attn_decoder_loss=0.2923, over 5774814.93 frames. ], batch size: 104, lr: 2.10e-02, grad_scale: 4.0 2024-09-16 21:52:43,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2024-09-16 21:52:56,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=85840.0, ans=0.125 2024-09-16 21:52:56,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=85840.0, ans=0.125 2024-09-16 21:53:01,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=85840.0, ans=0.1 2024-09-16 21:53:06,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=85840.0, ans=0.125 2024-09-16 21:53:13,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=85880.0, ans=0.0 2024-09-16 21:53:30,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=85920.0, ans=0.0 2024-09-16 21:53:35,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=85920.0, ans=0.05 2024-09-16 21:53:50,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=85960.0, ans=0.125 2024-09-16 21:53:58,049 INFO [train.py:1198] (1/2) Epoch 5, batch 3400, loss[loss=0.2598, ctc_loss=0.191, cr_loss=0.3909, attn_decoder_loss=0.2588, over 29350.00 frames. ], tot_loss[loss=0.2936, ctc_loss=0.2209, cr_loss=0.4331, attn_decoder_loss=0.292, over 5767598.84 frames. ], batch size: 67, lr: 2.10e-02, grad_scale: 4.0 2024-09-16 21:53:58,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=86000.0, ans=0.125 2024-09-16 21:54:13,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=86040.0, ans=0.2 2024-09-16 21:54:16,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=86040.0, ans=0.125 2024-09-16 21:54:28,081 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.899e+01 1.163e+02 1.316e+02 1.513e+02 4.040e+02, threshold=2.631e+02, percent-clipped=2.0 2024-09-16 21:54:43,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=86120.0, ans=0.125 2024-09-16 21:54:50,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2024-09-16 21:54:55,568 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:55:04,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=86160.0, ans=0.125 2024-09-16 21:55:13,312 INFO [train.py:1198] (1/2) Epoch 5, batch 3450, loss[loss=0.3123, ctc_loss=0.2416, cr_loss=0.4461, attn_decoder_loss=0.3102, over 28215.00 frames. ], tot_loss[loss=0.2944, ctc_loss=0.2213, cr_loss=0.4348, attn_decoder_loss=0.2929, over 5775605.88 frames. ], batch size: 111, lr: 2.09e-02, grad_scale: 4.0 2024-09-16 21:55:13,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=86200.0, ans=0.09899494936611666 2024-09-16 21:55:25,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=86200.0, ans=0.0 2024-09-16 21:55:28,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=86240.0, ans=0.025 2024-09-16 21:55:40,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=86240.0, ans=0.0 2024-09-16 21:55:45,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=86280.0, ans=0.125 2024-09-16 21:56:00,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=86320.0, ans=0.0 2024-09-16 21:56:10,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86320.0, ans=0.1 2024-09-16 21:56:19,059 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:56:28,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2024-09-16 21:56:33,259 INFO [train.py:1198] (1/2) Epoch 5, batch 3500, loss[loss=0.2594, ctc_loss=0.1913, cr_loss=0.3963, attn_decoder_loss=0.2582, over 29318.00 frames. ], tot_loss[loss=0.2938, ctc_loss=0.2208, cr_loss=0.4344, attn_decoder_loss=0.2922, over 5776725.97 frames. ], batch size: 71, lr: 2.09e-02, grad_scale: 8.0 2024-09-16 21:56:37,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.61 vs. limit=12.0 2024-09-16 21:56:42,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=86400.0, ans=0.125 2024-09-16 21:56:48,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=86440.0, ans=0.1 2024-09-16 21:56:51,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=86440.0, ans=0.2 2024-09-16 21:57:04,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.011e+02 1.247e+02 1.357e+02 1.561e+02 2.944e+02, threshold=2.714e+02, percent-clipped=1.0 2024-09-16 21:57:21,433 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:57:27,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=86520.0, ans=0.07 2024-09-16 21:57:30,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=86520.0, ans=0.035 2024-09-16 21:57:40,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=86560.0, ans=0.125 2024-09-16 21:57:47,998 INFO [train.py:1198] (1/2) Epoch 5, batch 3550, loss[loss=0.2987, ctc_loss=0.2188, cr_loss=0.4214, attn_decoder_loss=0.2982, over 29680.00 frames. ], tot_loss[loss=0.294, ctc_loss=0.221, cr_loss=0.4347, attn_decoder_loss=0.2924, over 5782768.49 frames. ], batch size: 89, lr: 2.09e-02, grad_scale: 4.0 2024-09-16 21:57:48,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=86600.0, ans=0.2 2024-09-16 21:58:04,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=86640.0, ans=0.2 2024-09-16 21:58:13,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=86640.0, ans=0.125 2024-09-16 21:58:23,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=86680.0, ans=0.1 2024-09-16 21:58:48,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2024-09-16 21:59:02,138 INFO [train.py:1198] (1/2) Epoch 5, batch 3600, loss[loss=0.2845, ctc_loss=0.2073, cr_loss=0.3966, attn_decoder_loss=0.2842, over 29498.00 frames. ], tot_loss[loss=0.2937, ctc_loss=0.2204, cr_loss=0.4347, attn_decoder_loss=0.2922, over 5792168.95 frames. ], batch size: 77, lr: 2.09e-02, grad_scale: 8.0 2024-09-16 21:59:03,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=86800.0, ans=0.2 2024-09-16 21:59:14,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=86800.0, ans=0.125 2024-09-16 21:59:20,408 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:59:20,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.25 vs. limit=6.0 2024-09-16 21:59:22,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2024-09-16 21:59:27,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=86840.0, ans=0.0 2024-09-16 21:59:34,635 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.246e+01 1.105e+02 1.213e+02 1.386e+02 4.333e+02, threshold=2.426e+02, percent-clipped=4.0 2024-09-16 21:59:39,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=86880.0, ans=0.0 2024-09-16 21:59:55,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=86920.0, ans=0.125 2024-09-16 22:00:10,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=86960.0, ans=0.125 2024-09-16 22:00:16,127 INFO [train.py:1198] (1/2) Epoch 5, batch 3650, loss[loss=0.3187, ctc_loss=0.2546, cr_loss=0.4699, attn_decoder_loss=0.3154, over 29491.00 frames. ], tot_loss[loss=0.2927, ctc_loss=0.2194, cr_loss=0.4327, attn_decoder_loss=0.2913, over 5794262.29 frames. ], batch size: 90, lr: 2.08e-02, grad_scale: 4.0 2024-09-16 22:00:20,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=87000.0, ans=0.0 2024-09-16 22:00:25,252 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:00:29,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=87040.0, ans=0.0 2024-09-16 22:00:32,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=87040.0, ans=0.125 2024-09-16 22:00:35,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87040.0, ans=0.1 2024-09-16 22:00:41,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=87040.0, ans=0.0 2024-09-16 22:00:44,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=87080.0, ans=0.125 2024-09-16 22:00:58,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=87080.0, ans=0.0 2024-09-16 22:01:13,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=87120.0, ans=0.0 2024-09-16 22:01:22,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.68 vs. limit=15.0 2024-09-16 22:01:31,076 INFO [train.py:1198] (1/2) Epoch 5, batch 3700, loss[loss=0.3102, ctc_loss=0.2431, cr_loss=0.4677, attn_decoder_loss=0.3072, over 29714.00 frames. ], tot_loss[loss=0.2926, ctc_loss=0.2189, cr_loss=0.4326, attn_decoder_loss=0.2912, over 5804513.19 frames. ], batch size: 84, lr: 2.08e-02, grad_scale: 8.0 2024-09-16 22:01:40,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.70 vs. limit=22.5 2024-09-16 22:02:02,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=87280.0, ans=0.125 2024-09-16 22:02:05,209 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.891e+01 1.136e+02 1.234e+02 1.353e+02 4.194e+02, threshold=2.467e+02, percent-clipped=4.0 2024-09-16 22:02:08,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=87280.0, ans=0.125 2024-09-16 22:02:08,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.46 vs. limit=15.0 2024-09-16 22:02:25,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.24 vs. limit=15.0 2024-09-16 22:02:32,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=87360.0, ans=0.0 2024-09-16 22:02:47,086 INFO [train.py:1198] (1/2) Epoch 5, batch 3750, loss[loss=0.2518, ctc_loss=0.1792, cr_loss=0.3556, attn_decoder_loss=0.252, over 29358.00 frames. ], tot_loss[loss=0.2926, ctc_loss=0.2192, cr_loss=0.4327, attn_decoder_loss=0.2911, over 5807861.32 frames. ], batch size: 67, lr: 2.08e-02, grad_scale: 4.0 2024-09-16 22:02:47,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=87400.0, ans=0.125 2024-09-16 22:02:57,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=87400.0, ans=0.125 2024-09-16 22:03:02,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=87440.0, ans=0.125 2024-09-16 22:03:24,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=87480.0, ans=0.0 2024-09-16 22:03:41,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.58 vs. limit=10.0 2024-09-16 22:03:42,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2024-09-16 22:04:01,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=87600.0, ans=0.125 2024-09-16 22:04:02,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-16 22:04:02,943 INFO [train.py:1198] (1/2) Epoch 5, batch 3800, loss[loss=0.3144, ctc_loss=0.2359, cr_loss=0.4659, attn_decoder_loss=0.3128, over 29642.00 frames. ], tot_loss[loss=0.2924, ctc_loss=0.2192, cr_loss=0.4323, attn_decoder_loss=0.2909, over 5798241.71 frames. ], batch size: 86, lr: 2.08e-02, grad_scale: 4.0 2024-09-16 22:04:12,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87600.0, ans=0.1 2024-09-16 22:04:28,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=87640.0, ans=0.5 2024-09-16 22:04:33,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=87680.0, ans=0.125 2024-09-16 22:04:37,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87680.0, ans=0.1 2024-09-16 22:04:38,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.009e+02 1.217e+02 1.354e+02 1.572e+02 4.220e+02, threshold=2.708e+02, percent-clipped=3.0 2024-09-16 22:04:53,609 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:05:11,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=87760.0, ans=0.125 2024-09-16 22:05:13,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.93 vs. limit=22.5 2024-09-16 22:05:17,042 INFO [train.py:1198] (1/2) Epoch 5, batch 3850, loss[loss=0.3113, ctc_loss=0.2328, cr_loss=0.4569, attn_decoder_loss=0.3099, over 29246.00 frames. ], tot_loss[loss=0.2919, ctc_loss=0.2183, cr_loss=0.4316, attn_decoder_loss=0.2905, over 5812945.15 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 4.0 2024-09-16 22:05:17,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=87800.0, ans=0.025 2024-09-16 22:05:38,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=87840.0, ans=0.125 2024-09-16 22:05:49,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2024-09-16 22:06:07,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=87920.0, ans=0.125 2024-09-16 22:06:31,870 INFO [train.py:1198] (1/2) Epoch 5, batch 3900, loss[loss=0.303, ctc_loss=0.2125, cr_loss=0.4363, attn_decoder_loss=0.3033, over 29643.00 frames. ], tot_loss[loss=0.2927, ctc_loss=0.2186, cr_loss=0.4327, attn_decoder_loss=0.2913, over 5816267.64 frames. ], batch size: 86, lr: 2.07e-02, grad_scale: 8.0 2024-09-16 22:06:33,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=88000.0, ans=0.0 2024-09-16 22:06:33,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=88000.0, ans=0.125 2024-09-16 22:06:49,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=88040.0, ans=0.2 2024-09-16 22:06:52,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=88040.0, ans=0.0 2024-09-16 22:07:03,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=88080.0, ans=0.0 2024-09-16 22:07:08,814 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.846e+01 1.141e+02 1.281e+02 1.435e+02 2.843e+02, threshold=2.562e+02, percent-clipped=1.0 2024-09-16 22:07:19,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=88120.0, ans=0.2 2024-09-16 22:07:30,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.12 vs. limit=22.5 2024-09-16 22:07:31,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.43 vs. limit=15.0 2024-09-16 22:07:35,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=88160.0, ans=0.125 2024-09-16 22:07:47,174 INFO [train.py:1198] (1/2) Epoch 5, batch 3950, loss[loss=0.3013, ctc_loss=0.2224, cr_loss=0.4371, attn_decoder_loss=0.3003, over 29511.00 frames. ], tot_loss[loss=0.2923, ctc_loss=0.2176, cr_loss=0.4325, attn_decoder_loss=0.291, over 5835706.89 frames. ], batch size: 97, lr: 2.07e-02, grad_scale: 4.0 2024-09-16 22:07:56,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=88200.0, ans=0.125 2024-09-16 22:08:06,690 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:08:12,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-09-16 22:08:25,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=88280.0, ans=0.125 2024-09-16 22:08:28,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=88280.0, ans=0.0 2024-09-16 22:08:29,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2024-09-16 22:08:40,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=88320.0, ans=0.025 2024-09-16 22:08:47,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=88360.0, ans=0.2 2024-09-16 22:09:02,280 INFO [train.py:1198] (1/2) Epoch 5, batch 4000, loss[loss=0.2737, ctc_loss=0.1969, cr_loss=0.4087, attn_decoder_loss=0.2732, over 29504.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.2186, cr_loss=0.4329, attn_decoder_loss=0.2914, over 5812322.81 frames. ], batch size: 74, lr: 2.07e-02, grad_scale: 8.0 2024-09-16 22:09:22,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=88440.0, ans=0.125 2024-09-16 22:09:24,399 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:09:40,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.385e+01 1.172e+02 1.271e+02 1.397e+02 4.120e+02, threshold=2.542e+02, percent-clipped=3.0 2024-09-16 22:09:50,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2024-09-16 22:10:04,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=88560.0, ans=0.0 2024-09-16 22:10:16,072 INFO [train.py:1198] (1/2) Epoch 5, batch 4050, loss[loss=0.3427, ctc_loss=0.3141, cr_loss=0.4686, attn_decoder_loss=0.3355, over 20403.00 frames. ], tot_loss[loss=0.2924, ctc_loss=0.2185, cr_loss=0.4324, attn_decoder_loss=0.291, over 5796004.76 frames. ], batch size: 209, lr: 2.07e-02, grad_scale: 4.0 2024-09-16 22:10:16,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2024-09-16 22:10:26,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=88600.0, ans=0.125 2024-09-16 22:10:49,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=88680.0, ans=0.125 2024-09-16 22:11:09,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=88720.0, ans=0.125 2024-09-16 22:11:17,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88760.0, ans=0.1 2024-09-16 22:11:19,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=88760.0, ans=0.04949747468305833 2024-09-16 22:11:30,103 INFO [train.py:1198] (1/2) Epoch 5, batch 4100, loss[loss=0.3125, ctc_loss=0.2467, cr_loss=0.4484, attn_decoder_loss=0.3098, over 29528.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.219, cr_loss=0.4334, attn_decoder_loss=0.2914, over 5791712.97 frames. ], batch size: 90, lr: 2.07e-02, grad_scale: 8.0 2024-09-16 22:11:30,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=88800.0, ans=0.125 2024-09-16 22:11:41,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.43 vs. limit=15.0 2024-09-16 22:12:11,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.506e+01 1.179e+02 1.301e+02 1.533e+02 3.400e+02, threshold=2.603e+02, percent-clipped=2.0 2024-09-16 22:12:20,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=88920.0, ans=0.0 2024-09-16 22:12:21,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2024-09-16 22:12:29,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=88960.0, ans=0.0 2024-09-16 22:12:36,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=88960.0, ans=0.2 2024-09-16 22:12:45,302 INFO [train.py:1198] (1/2) Epoch 5, batch 4150, loss[loss=0.2871, ctc_loss=0.2042, cr_loss=0.4154, attn_decoder_loss=0.2871, over 29518.00 frames. ], tot_loss[loss=0.2922, ctc_loss=0.2182, cr_loss=0.4326, attn_decoder_loss=0.2908, over 5797954.37 frames. ], batch size: 77, lr: 2.06e-02, grad_scale: 4.0 2024-09-16 22:13:04,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=89040.0, ans=0.125 2024-09-16 22:13:20,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=89080.0, ans=0.0 2024-09-16 22:13:30,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=89120.0, ans=0.125 2024-09-16 22:13:40,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.81 vs. limit=10.0 2024-09-16 22:13:59,802 INFO [train.py:1198] (1/2) Epoch 5, batch 4200, loss[loss=0.3172, ctc_loss=0.2453, cr_loss=0.4635, attn_decoder_loss=0.3149, over 29537.00 frames. ], tot_loss[loss=0.2929, ctc_loss=0.2188, cr_loss=0.4331, attn_decoder_loss=0.2915, over 5798692.67 frames. ], batch size: 90, lr: 2.06e-02, grad_scale: 8.0 2024-09-16 22:14:00,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.14 vs. limit=22.5 2024-09-16 22:14:02,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-16 22:14:03,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=89200.0, ans=0.125 2024-09-16 22:14:10,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=89200.0, ans=0.125 2024-09-16 22:14:17,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=89240.0, ans=0.125 2024-09-16 22:14:22,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-09-16 22:14:23,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=89240.0, ans=0.125 2024-09-16 22:14:41,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.772e+01 1.118e+02 1.246e+02 1.404e+02 2.463e+02, threshold=2.492e+02, percent-clipped=0.0 2024-09-16 22:14:41,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=89280.0, ans=0.0 2024-09-16 22:14:42,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89320.0, ans=0.1 2024-09-16 22:14:46,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2024-09-16 22:15:12,898 INFO [train.py:1198] (1/2) Epoch 5, batch 4250, loss[loss=0.2734, ctc_loss=0.1983, cr_loss=0.3865, attn_decoder_loss=0.2731, over 29506.00 frames. ], tot_loss[loss=0.2933, ctc_loss=0.219, cr_loss=0.4327, attn_decoder_loss=0.2919, over 5805055.94 frames. ], batch size: 74, lr: 2.06e-02, grad_scale: 4.0 2024-09-16 22:15:26,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=89440.0, ans=0.125 2024-09-16 22:15:49,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=89480.0, ans=0.125 2024-09-16 22:15:58,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.31 vs. limit=12.0 2024-09-16 22:16:27,380 INFO [train.py:1198] (1/2) Epoch 5, batch 4300, loss[loss=0.3038, ctc_loss=0.22, cr_loss=0.4442, attn_decoder_loss=0.3033, over 29536.00 frames. ], tot_loss[loss=0.2935, ctc_loss=0.2191, cr_loss=0.4335, attn_decoder_loss=0.2921, over 5795323.37 frames. ], batch size: 87, lr: 2.06e-02, grad_scale: 8.0 2024-09-16 22:16:47,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=89640.0, ans=0.07 2024-09-16 22:16:53,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=89640.0, ans=0.2 2024-09-16 22:16:55,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=89640.0, ans=0.07 2024-09-16 22:17:05,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=89680.0, ans=0.2 2024-09-16 22:17:11,160 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.609e+01 1.163e+02 1.276e+02 1.524e+02 3.260e+02, threshold=2.552e+02, percent-clipped=3.0 2024-09-16 22:17:33,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=89760.0, ans=0.125 2024-09-16 22:17:42,346 INFO [train.py:1198] (1/2) Epoch 5, batch 4350, loss[loss=0.3089, ctc_loss=0.2372, cr_loss=0.4265, attn_decoder_loss=0.3074, over 29498.00 frames. ], tot_loss[loss=0.2971, ctc_loss=0.2222, cr_loss=0.4388, attn_decoder_loss=0.2957, over 5798219.70 frames. ], batch size: 97, lr: 2.06e-02, grad_scale: 4.0 2024-09-16 22:17:43,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-09-16 22:17:44,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=89800.0, ans=0.025 2024-09-16 22:18:04,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2024-09-16 22:18:13,630 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:18:16,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=89880.0, ans=0.125 2024-09-16 22:18:19,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=89880.0, ans=0.0 2024-09-16 22:18:23,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=89880.0, ans=0.0 2024-09-16 22:18:42,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=89960.0, ans=0.125 2024-09-16 22:18:52,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=89960.0, ans=0.0 2024-09-16 22:18:56,339 INFO [train.py:1198] (1/2) Epoch 5, batch 4400, loss[loss=0.3149, ctc_loss=0.2476, cr_loss=0.4556, attn_decoder_loss=0.3122, over 27160.00 frames. ], tot_loss[loss=0.2997, ctc_loss=0.2248, cr_loss=0.4416, attn_decoder_loss=0.2982, over 5765677.09 frames. ], batch size: 124, lr: 2.05e-02, grad_scale: 8.0 2024-09-16 22:18:59,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=90000.0, ans=0.2 2024-09-16 22:19:01,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=90000.0, ans=0.125 2024-09-16 22:19:05,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=90000.0, ans=0.0 2024-09-16 22:19:06,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=90000.0, ans=0.2 2024-09-16 22:19:40,700 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.790e+01 1.097e+02 1.213e+02 1.417e+02 2.444e+02, threshold=2.426e+02, percent-clipped=0.0 2024-09-16 22:20:10,746 INFO [train.py:1198] (1/2) Epoch 5, batch 4450, loss[loss=0.3188, ctc_loss=0.2662, cr_loss=0.4269, attn_decoder_loss=0.3151, over 20304.00 frames. ], tot_loss[loss=0.3033, ctc_loss=0.2311, cr_loss=0.4441, attn_decoder_loss=0.3015, over 5574302.14 frames. ], batch size: 210, lr: 2.05e-02, grad_scale: 4.0 2024-09-16 22:20:13,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.62 vs. limit=15.0 2024-09-16 22:20:17,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=90200.0, ans=0.0 2024-09-16 22:20:23,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=90200.0, ans=0.2 2024-09-16 22:20:26,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=90240.0, ans=0.2 2024-09-16 22:20:37,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=90240.0, ans=0.07 2024-09-16 22:20:48,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=90280.0, ans=0.0 2024-09-16 22:21:13,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=90360.0, ans=0.125 2024-09-16 22:21:26,516 INFO [train.py:1198] (1/2) Epoch 5, batch 4500, loss[loss=0.3203, ctc_loss=0.2616, cr_loss=0.4592, attn_decoder_loss=0.3166, over 20925.00 frames. ], tot_loss[loss=0.3078, ctc_loss=0.2402, cr_loss=0.4459, attn_decoder_loss=0.3054, over 5228282.10 frames. ], batch size: 211, lr: 2.05e-02, grad_scale: 8.0 2024-09-16 22:21:39,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.67 vs. limit=15.0 2024-09-16 22:22:53,646 INFO [train.py:1198] (1/2) Epoch 6, batch 0, loss[loss=0.3095, ctc_loss=0.1874, cr_loss=0.4096, attn_decoder_loss=0.314, over 29625.00 frames. ], tot_loss[loss=0.3095, ctc_loss=0.1874, cr_loss=0.4096, attn_decoder_loss=0.314, over 29625.00 frames. ], batch size: 73, lr: 1.91e-02, grad_scale: 4.0 2024-09-16 22:22:53,646 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 22:22:59,477 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([4.5805, 4.9230, 5.1142, 5.1016], device='cuda:1') 2024-09-16 22:23:11,940 INFO [train.py:1230] (1/2) Epoch 6, validation: loss=0.2379, ctc_loss=0.06988, cr_loss=4.72e-15, attn_decoder_loss=0.2566, over 944034.00 frames. 2024-09-16 22:23:11,941 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 22:23:13,422 WARNING [optim.py:503] (1/2) Scaling gradients by 0.0589279979467392, model_norm_threshold=242.58145141601562 2024-09-16 22:23:13,624 WARNING [optim.py:575] (1/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.1.self_attn.linear_k.weight with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.469e+06, grad_sumsq=5.019e+05, orig_rms_sq=8.904e+00 2024-09-16 22:23:21,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90500.0, ans=0.125 2024-09-16 22:23:22,770 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.192e+02 1.351e+02 1.731e+02 4.117e+03, threshold=2.703e+02, percent-clipped=9.0 2024-09-16 22:23:35,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=90540.0, ans=0.2 2024-09-16 22:23:40,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=90540.0, ans=0.2 2024-09-16 22:23:49,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=90580.0, ans=0.0 2024-09-16 22:23:52,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.25 vs. limit=22.5 2024-09-16 22:23:59,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=90620.0, ans=0.04949747468305833 2024-09-16 22:24:04,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2024-09-16 22:24:17,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=90660.0, ans=0.2 2024-09-16 22:24:22,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=90660.0, ans=0.125 2024-09-16 22:24:28,167 INFO [train.py:1198] (1/2) Epoch 6, batch 50, loss[loss=0.257, ctc_loss=0.1846, cr_loss=0.396, attn_decoder_loss=0.2563, over 29400.00 frames. ], tot_loss[loss=0.2971, ctc_loss=0.224, cr_loss=0.4365, attn_decoder_loss=0.2955, over 1268462.96 frames. ], batch size: 70, lr: 1.91e-02, grad_scale: 4.0 2024-09-16 22:24:28,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=90700.0, ans=0.04949747468305833 2024-09-16 22:24:43,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-09-16 22:24:51,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=90740.0, ans=0.125 2024-09-16 22:25:04,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=15.0 2024-09-16 22:25:05,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90780.0, ans=0.1 2024-09-16 22:25:23,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=90820.0, ans=0.125 2024-09-16 22:25:25,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=90820.0, ans=0.125 2024-09-16 22:25:35,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=90860.0, ans=0.2 2024-09-16 22:25:41,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=90860.0, ans=0.125 2024-09-16 22:25:45,645 INFO [train.py:1198] (1/2) Epoch 6, batch 100, loss[loss=0.2808, ctc_loss=0.2038, cr_loss=0.4138, attn_decoder_loss=0.2801, over 29525.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.2226, cr_loss=0.4369, attn_decoder_loss=0.2952, over 2251598.36 frames. ], batch size: 76, lr: 1.91e-02, grad_scale: 8.0 2024-09-16 22:25:57,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.583e+01 1.187e+02 1.367e+02 1.634e+02 6.216e+02, threshold=2.735e+02, percent-clipped=2.0 2024-09-16 22:26:02,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=90940.0, ans=0.0 2024-09-16 22:26:51,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=91060.0, ans=0.125 2024-09-16 22:27:01,996 INFO [train.py:1198] (1/2) Epoch 6, batch 150, loss[loss=0.2535, ctc_loss=0.1789, cr_loss=0.4074, attn_decoder_loss=0.2527, over 29455.00 frames. ], tot_loss[loss=0.2929, ctc_loss=0.218, cr_loss=0.4337, attn_decoder_loss=0.2916, over 3046725.67 frames. ], batch size: 70, lr: 1.91e-02, grad_scale: 4.0 2024-09-16 22:27:11,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=91100.0, ans=0.0 2024-09-16 22:27:25,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=91140.0, ans=0.2 2024-09-16 22:27:47,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=91220.0, ans=0.0 2024-09-16 22:27:57,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=12.0 2024-09-16 22:28:06,030 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:28:17,516 INFO [train.py:1198] (1/2) Epoch 6, batch 200, loss[loss=0.3014, ctc_loss=0.2304, cr_loss=0.4259, attn_decoder_loss=0.2998, over 27334.00 frames. ], tot_loss[loss=0.2917, ctc_loss=0.2168, cr_loss=0.4324, attn_decoder_loss=0.2905, over 3657647.70 frames. ], batch size: 124, lr: 1.90e-02, grad_scale: 8.0 2024-09-16 22:28:29,577 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.192e+01 1.064e+02 1.171e+02 1.354e+02 3.116e+02, threshold=2.342e+02, percent-clipped=1.0 2024-09-16 22:28:41,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.97 vs. limit=15.0 2024-09-16 22:28:49,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2024-09-16 22:28:57,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=91380.0, ans=0.125 2024-09-16 22:29:15,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=91420.0, ans=0.0 2024-09-16 22:29:29,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=91460.0, ans=0.125 2024-09-16 22:29:31,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2024-09-16 22:29:35,030 INFO [train.py:1198] (1/2) Epoch 6, batch 250, loss[loss=0.3043, ctc_loss=0.2224, cr_loss=0.441, attn_decoder_loss=0.3036, over 29253.00 frames. ], tot_loss[loss=0.2912, ctc_loss=0.2159, cr_loss=0.4324, attn_decoder_loss=0.2899, over 4139965.73 frames. ], batch size: 100, lr: 1.90e-02, grad_scale: 4.0 2024-09-16 22:29:59,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=91540.0, ans=10.0 2024-09-16 22:30:17,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=91580.0, ans=0.125 2024-09-16 22:30:52,396 INFO [train.py:1198] (1/2) Epoch 6, batch 300, loss[loss=0.2983, ctc_loss=0.2092, cr_loss=0.4222, attn_decoder_loss=0.2988, over 29531.00 frames. ], tot_loss[loss=0.29, ctc_loss=0.2139, cr_loss=0.4304, attn_decoder_loss=0.2889, over 4508737.31 frames. ], batch size: 92, lr: 1.90e-02, grad_scale: 8.0 2024-09-16 22:31:07,397 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.038e+01 1.116e+02 1.244e+02 1.492e+02 2.099e+02, threshold=2.488e+02, percent-clipped=0.0 2024-09-16 22:31:10,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=91740.0, ans=0.0 2024-09-16 22:31:21,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2024-09-16 22:31:25,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=91780.0, ans=0.125 2024-09-16 22:31:29,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.37 vs. limit=10.0 2024-09-16 22:31:33,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-09-16 22:31:53,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2024-09-16 22:32:05,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=91860.0, ans=0.2 2024-09-16 22:32:07,881 INFO [train.py:1198] (1/2) Epoch 6, batch 350, loss[loss=0.2583, ctc_loss=0.1922, cr_loss=0.3723, attn_decoder_loss=0.2574, over 29308.00 frames. ], tot_loss[loss=0.2901, ctc_loss=0.2138, cr_loss=0.4303, attn_decoder_loss=0.289, over 4794585.18 frames. ], batch size: 71, lr: 1.90e-02, grad_scale: 4.0 2024-09-16 22:32:14,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=91900.0, ans=0.125 2024-09-16 22:32:16,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.10 vs. limit=15.0 2024-09-16 22:32:38,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=91980.0, ans=0.125 2024-09-16 22:32:47,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.20 vs. limit=15.0 2024-09-16 22:33:12,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.23 vs. limit=10.0 2024-09-16 22:33:23,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92060.0, ans=0.1 2024-09-16 22:33:25,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=92100.0, ans=0.0 2024-09-16 22:33:26,401 INFO [train.py:1198] (1/2) Epoch 6, batch 400, loss[loss=0.3063, ctc_loss=0.2266, cr_loss=0.48, attn_decoder_loss=0.3045, over 29688.00 frames. ], tot_loss[loss=0.2895, ctc_loss=0.2133, cr_loss=0.4301, attn_decoder_loss=0.2884, over 5024661.27 frames. ], batch size: 82, lr: 1.90e-02, grad_scale: 8.0 2024-09-16 22:33:28,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=92100.0, ans=0.125 2024-09-16 22:33:37,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=92100.0, ans=0.0 2024-09-16 22:33:43,127 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.703e+01 1.115e+02 1.264e+02 1.415e+02 3.594e+02, threshold=2.527e+02, percent-clipped=2.0 2024-09-16 22:33:43,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=92140.0, ans=0.125 2024-09-16 22:33:49,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=92140.0, ans=0.125 2024-09-16 22:33:51,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=92140.0, ans=0.2 2024-09-16 22:34:45,087 INFO [train.py:1198] (1/2) Epoch 6, batch 450, loss[loss=0.3031, ctc_loss=0.224, cr_loss=0.4659, attn_decoder_loss=0.3015, over 29710.00 frames. ], tot_loss[loss=0.2894, ctc_loss=0.2131, cr_loss=0.4299, attn_decoder_loss=0.2884, over 5184832.09 frames. ], batch size: 83, lr: 1.89e-02, grad_scale: 4.0 2024-09-16 22:34:49,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=92300.0, ans=0.125 2024-09-16 22:35:19,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-09-16 22:35:32,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=92420.0, ans=0.2 2024-09-16 22:35:47,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=92460.0, ans=0.125 2024-09-16 22:35:55,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=92460.0, ans=0.0 2024-09-16 22:36:01,411 INFO [train.py:1198] (1/2) Epoch 6, batch 500, loss[loss=0.3177, ctc_loss=0.2373, cr_loss=0.4646, attn_decoder_loss=0.3163, over 29462.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2122, cr_loss=0.4295, attn_decoder_loss=0.2876, over 5328401.61 frames. ], batch size: 94, lr: 1.89e-02, grad_scale: 8.0 2024-09-16 22:36:13,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2024-09-16 22:36:18,343 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.047e+01 1.094e+02 1.193e+02 1.318e+02 2.724e+02, threshold=2.387e+02, percent-clipped=2.0 2024-09-16 22:36:43,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=92580.0, ans=0.0 2024-09-16 22:36:44,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=92580.0, ans=0.0 2024-09-16 22:37:20,299 INFO [train.py:1198] (1/2) Epoch 6, batch 550, loss[loss=0.2975, ctc_loss=0.2255, cr_loss=0.4341, attn_decoder_loss=0.2958, over 28798.00 frames. ], tot_loss[loss=0.2886, ctc_loss=0.2122, cr_loss=0.4292, attn_decoder_loss=0.2876, over 5421188.43 frames. ], batch size: 104, lr: 1.89e-02, grad_scale: 4.0 2024-09-16 22:37:23,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2024-09-16 22:37:32,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=92700.0, ans=0.125 2024-09-16 22:38:13,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=92820.0, ans=0.0 2024-09-16 22:38:13,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92820.0, ans=0.1 2024-09-16 22:38:16,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=92820.0, ans=0.0 2024-09-16 22:38:18,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=92820.0, ans=0.125 2024-09-16 22:38:39,397 INFO [train.py:1198] (1/2) Epoch 6, batch 600, loss[loss=0.3176, ctc_loss=0.232, cr_loss=0.4621, attn_decoder_loss=0.3168, over 29251.00 frames. ], tot_loss[loss=0.2885, ctc_loss=0.2116, cr_loss=0.4292, attn_decoder_loss=0.2875, over 5510122.98 frames. ], batch size: 100, lr: 1.89e-02, grad_scale: 8.0 2024-09-16 22:38:42,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=92900.0, ans=0.125 2024-09-16 22:38:47,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=92900.0, ans=0.125 2024-09-16 22:38:50,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92900.0, ans=0.1 2024-09-16 22:38:59,012 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.669e+01 1.124e+02 1.276e+02 1.446e+02 7.170e+02, threshold=2.552e+02, percent-clipped=2.0 2024-09-16 22:39:01,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2024-09-16 22:39:12,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=92980.0, ans=0.04949747468305833 2024-09-16 22:39:13,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.20 vs. limit=15.0 2024-09-16 22:39:33,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.35 vs. limit=15.0 2024-09-16 22:39:37,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=93020.0, ans=0.0 2024-09-16 22:39:54,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=93100.0, ans=0.0 2024-09-16 22:39:55,417 INFO [train.py:1198] (1/2) Epoch 6, batch 650, loss[loss=0.2776, ctc_loss=0.2002, cr_loss=0.3967, attn_decoder_loss=0.2773, over 29763.00 frames. ], tot_loss[loss=0.2874, ctc_loss=0.2101, cr_loss=0.4278, attn_decoder_loss=0.2865, over 5587697.26 frames. ], batch size: 81, lr: 1.89e-02, grad_scale: 4.0 2024-09-16 22:40:15,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=93140.0, ans=0.125 2024-09-16 22:40:23,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=93140.0, ans=0.125 2024-09-16 22:40:32,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=93180.0, ans=0.125 2024-09-16 22:40:38,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=93180.0, ans=0.125 2024-09-16 22:41:06,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=93260.0, ans=0.125 2024-09-16 22:41:13,972 INFO [train.py:1198] (1/2) Epoch 6, batch 700, loss[loss=0.2821, ctc_loss=0.2121, cr_loss=0.4282, attn_decoder_loss=0.2804, over 29538.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2106, cr_loss=0.4279, attn_decoder_loss=0.2869, over 5638739.61 frames. ], batch size: 76, lr: 1.89e-02, grad_scale: 8.0 2024-09-16 22:41:14,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93300.0, ans=0.1 2024-09-16 22:41:29,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=93340.0, ans=0.2 2024-09-16 22:41:35,159 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.774e+01 1.081e+02 1.183e+02 1.296e+02 3.770e+02, threshold=2.365e+02, percent-clipped=2.0 2024-09-16 22:41:39,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=93340.0, ans=15.0 2024-09-16 22:41:58,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=93420.0, ans=0.125 2024-09-16 22:42:06,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=93420.0, ans=0.125 2024-09-16 22:42:27,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=93460.0, ans=0.2 2024-09-16 22:42:32,785 INFO [train.py:1198] (1/2) Epoch 6, batch 750, loss[loss=0.2917, ctc_loss=0.2086, cr_loss=0.4416, attn_decoder_loss=0.2911, over 29728.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.211, cr_loss=0.4287, attn_decoder_loss=0.2868, over 5674889.85 frames. ], batch size: 82, lr: 1.88e-02, grad_scale: 4.0 2024-09-16 22:42:33,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=93500.0, ans=0.0 2024-09-16 22:42:33,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=93500.0, ans=0.0 2024-09-16 22:42:33,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=93500.0, ans=0.2 2024-09-16 22:42:36,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=93500.0, ans=0.125 2024-09-16 22:42:50,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=93540.0, ans=0.125 2024-09-16 22:43:32,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2024-09-16 22:43:34,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=93660.0, ans=0.125 2024-09-16 22:43:49,718 INFO [train.py:1198] (1/2) Epoch 6, batch 800, loss[loss=0.2682, ctc_loss=0.1982, cr_loss=0.407, attn_decoder_loss=0.2669, over 29593.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2111, cr_loss=0.4288, attn_decoder_loss=0.2869, over 5705647.08 frames. ], batch size: 73, lr: 1.88e-02, grad_scale: 8.0 2024-09-16 22:43:53,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=93700.0, ans=0.2 2024-09-16 22:44:11,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93740.0, ans=0.1 2024-09-16 22:44:12,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.997e+01 1.068e+02 1.156e+02 1.307e+02 3.410e+02, threshold=2.312e+02, percent-clipped=1.0 2024-09-16 22:44:20,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93780.0, ans=0.1 2024-09-16 22:44:24,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-09-16 22:44:40,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.95 vs. limit=15.0 2024-09-16 22:44:46,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=93820.0, ans=0.0 2024-09-16 22:44:55,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=93860.0, ans=0.0 2024-09-16 22:45:08,086 INFO [train.py:1198] (1/2) Epoch 6, batch 850, loss[loss=0.3109, ctc_loss=0.2378, cr_loss=0.4582, attn_decoder_loss=0.3088, over 29695.00 frames. ], tot_loss[loss=0.2874, ctc_loss=0.2103, cr_loss=0.4282, attn_decoder_loss=0.2864, over 5735377.66 frames. ], batch size: 89, lr: 1.88e-02, grad_scale: 4.0 2024-09-16 22:45:34,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=93940.0, ans=0.0 2024-09-16 22:45:49,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=93980.0, ans=0.125 2024-09-16 22:45:53,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=94020.0, ans=0.2 2024-09-16 22:45:57,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=94020.0, ans=0.035 2024-09-16 22:45:59,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=94020.0, ans=0.0 2024-09-16 22:46:08,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=94060.0, ans=0.0 2024-09-16 22:46:16,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=94060.0, ans=0.125 2024-09-16 22:46:19,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=94060.0, ans=0.1 2024-09-16 22:46:24,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=94060.0, ans=0.0 2024-09-16 22:46:27,097 INFO [train.py:1198] (1/2) Epoch 6, batch 900, loss[loss=0.253, ctc_loss=0.176, cr_loss=0.3723, attn_decoder_loss=0.2533, over 29623.00 frames. ], tot_loss[loss=0.2876, ctc_loss=0.2105, cr_loss=0.4281, attn_decoder_loss=0.2867, over 5741270.94 frames. ], batch size: 73, lr: 1.88e-02, grad_scale: 8.0 2024-09-16 22:46:31,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=94100.0, ans=0.1 2024-09-16 22:46:33,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=94100.0, ans=0.0 2024-09-16 22:46:34,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94100.0, ans=0.1 2024-09-16 22:46:49,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.705e+01 1.096e+02 1.207e+02 1.371e+02 3.827e+02, threshold=2.414e+02, percent-clipped=1.0 2024-09-16 22:46:53,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2024-09-16 22:47:02,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.30 vs. limit=10.0 2024-09-16 22:47:22,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=94220.0, ans=0.0 2024-09-16 22:47:42,990 INFO [train.py:1198] (1/2) Epoch 6, batch 950, loss[loss=0.2772, ctc_loss=0.2026, cr_loss=0.431, attn_decoder_loss=0.2759, over 29495.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2108, cr_loss=0.4283, attn_decoder_loss=0.287, over 5742576.34 frames. ], batch size: 74, lr: 1.88e-02, grad_scale: 4.0 2024-09-16 22:47:53,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=94300.0, ans=0.125 2024-09-16 22:48:04,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=94340.0, ans=0.09899494936611666 2024-09-16 22:48:34,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94420.0, ans=0.1 2024-09-16 22:49:01,845 INFO [train.py:1198] (1/2) Epoch 6, batch 1000, loss[loss=0.2836, ctc_loss=0.202, cr_loss=0.4404, attn_decoder_loss=0.2828, over 29531.00 frames. ], tot_loss[loss=0.2892, ctc_loss=0.2123, cr_loss=0.4301, attn_decoder_loss=0.2881, over 5736387.07 frames. ], batch size: 77, lr: 1.87e-02, grad_scale: 8.0 2024-09-16 22:49:10,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-16 22:49:12,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=94500.0, ans=0.0 2024-09-16 22:49:17,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94540.0, ans=0.1 2024-09-16 22:49:26,355 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.001e+01 1.144e+02 1.278e+02 1.441e+02 2.268e+02, threshold=2.556e+02, percent-clipped=0.0 2024-09-16 22:49:51,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=94620.0, ans=0.125 2024-09-16 22:50:20,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=22.5 2024-09-16 22:50:20,455 INFO [train.py:1198] (1/2) Epoch 6, batch 1050, loss[loss=0.3035, ctc_loss=0.2212, cr_loss=0.4716, attn_decoder_loss=0.3022, over 29684.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.2113, cr_loss=0.4291, attn_decoder_loss=0.2874, over 5745978.52 frames. ], batch size: 85, lr: 1.87e-02, grad_scale: 4.0 2024-09-16 22:50:20,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=94700.0, ans=0.0 2024-09-16 22:50:26,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=94700.0, ans=0.0 2024-09-16 22:50:26,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=94700.0, ans=0.1 2024-09-16 22:50:34,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=94740.0, ans=0.09899494936611666 2024-09-16 22:50:42,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.16 vs. limit=15.0 2024-09-16 22:51:08,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=94820.0, ans=0.125 2024-09-16 22:51:09,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94820.0, ans=0.1 2024-09-16 22:51:30,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=94860.0, ans=0.2 2024-09-16 22:51:36,709 INFO [train.py:1198] (1/2) Epoch 6, batch 1100, loss[loss=0.2799, ctc_loss=0.1993, cr_loss=0.4294, attn_decoder_loss=0.2793, over 29457.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2111, cr_loss=0.4285, attn_decoder_loss=0.2871, over 5758876.69 frames. ], batch size: 78, lr: 1.87e-02, grad_scale: 8.0 2024-09-16 22:51:39,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=94900.0, ans=0.0 2024-09-16 22:51:42,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2024-09-16 22:52:03,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.796e+01 1.080e+02 1.185e+02 1.359e+02 3.091e+02, threshold=2.369e+02, percent-clipped=1.0 2024-09-16 22:52:25,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=95020.0, ans=0.0 2024-09-16 22:52:33,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=95020.0, ans=0.125 2024-09-16 22:52:42,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95060.0, ans=0.1 2024-09-16 22:52:49,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=95060.0, ans=0.125 2024-09-16 22:52:54,944 INFO [train.py:1198] (1/2) Epoch 6, batch 1150, loss[loss=0.2812, ctc_loss=0.2045, cr_loss=0.4462, attn_decoder_loss=0.2798, over 29455.00 frames. ], tot_loss[loss=0.2876, ctc_loss=0.2108, cr_loss=0.4284, attn_decoder_loss=0.2867, over 5755188.43 frames. ], batch size: 78, lr: 1.87e-02, grad_scale: 4.0 2024-09-16 22:52:58,415 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:52:59,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=95100.0, ans=0.5 2024-09-16 22:53:12,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=95140.0, ans=0.2 2024-09-16 22:53:45,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=1.99 vs. limit=15.0 2024-09-16 22:53:58,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=95260.0, ans=0.125 2024-09-16 22:54:01,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=95260.0, ans=0.0 2024-09-16 22:54:01,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2024-09-16 22:54:03,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.26 vs. limit=15.0 2024-09-16 22:54:14,543 INFO [train.py:1198] (1/2) Epoch 6, batch 1200, loss[loss=0.2909, ctc_loss=0.2038, cr_loss=0.4193, attn_decoder_loss=0.2913, over 29676.00 frames. ], tot_loss[loss=0.2888, ctc_loss=0.212, cr_loss=0.4295, attn_decoder_loss=0.2878, over 5746406.53 frames. ], batch size: 85, lr: 1.87e-02, grad_scale: 8.0 2024-09-16 22:54:26,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.22 vs. limit=15.0 2024-09-16 22:54:40,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=95340.0, ans=0.125 2024-09-16 22:54:43,659 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.331e+01 1.110e+02 1.224e+02 1.490e+02 4.215e+02, threshold=2.447e+02, percent-clipped=3.0 2024-09-16 22:55:13,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=95420.0, ans=0.0 2024-09-16 22:55:18,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.45 vs. limit=22.5 2024-09-16 22:55:31,298 INFO [train.py:1198] (1/2) Epoch 6, batch 1250, loss[loss=0.3084, ctc_loss=0.2333, cr_loss=0.4828, attn_decoder_loss=0.306, over 29524.00 frames. ], tot_loss[loss=0.289, ctc_loss=0.2118, cr_loss=0.4305, attn_decoder_loss=0.288, over 5774384.89 frames. ], batch size: 92, lr: 1.87e-02, grad_scale: 4.0 2024-09-16 22:55:31,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=95500.0, ans=0.125 2024-09-16 22:55:34,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=95500.0, ans=0.025 2024-09-16 22:56:47,922 INFO [train.py:1198] (1/2) Epoch 6, batch 1300, loss[loss=0.2961, ctc_loss=0.218, cr_loss=0.4396, attn_decoder_loss=0.295, over 28300.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2105, cr_loss=0.429, attn_decoder_loss=0.2871, over 5779417.99 frames. ], batch size: 111, lr: 1.86e-02, grad_scale: 8.0 2024-09-16 22:57:20,352 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.424e+01 1.067e+02 1.141e+02 1.259e+02 1.965e+02, threshold=2.283e+02, percent-clipped=0.0 2024-09-16 22:57:37,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=95820.0, ans=0.125 2024-09-16 22:57:43,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=95820.0, ans=0.125 2024-09-16 22:58:09,244 INFO [train.py:1198] (1/2) Epoch 6, batch 1350, loss[loss=0.2966, ctc_loss=0.2196, cr_loss=0.4362, attn_decoder_loss=0.2954, over 29776.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2093, cr_loss=0.4286, attn_decoder_loss=0.2863, over 5797206.34 frames. ], batch size: 81, lr: 1.86e-02, grad_scale: 4.0 2024-09-16 22:58:12,712 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:58:13,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-16 22:59:07,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=96020.0, ans=0.0 2024-09-16 22:59:16,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.81 vs. limit=10.0 2024-09-16 22:59:16,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.35 vs. limit=12.0 2024-09-16 22:59:27,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=96060.0, ans=0.125 2024-09-16 22:59:32,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=96100.0, ans=22.5 2024-09-16 22:59:33,055 INFO [train.py:1198] (1/2) Epoch 6, batch 1400, loss[loss=0.2621, ctc_loss=0.1842, cr_loss=0.402, attn_decoder_loss=0.2618, over 29584.00 frames. ], tot_loss[loss=0.2873, ctc_loss=0.2095, cr_loss=0.4288, attn_decoder_loss=0.2864, over 5808040.67 frames. ], batch size: 69, lr: 1.86e-02, grad_scale: 8.0 2024-09-16 22:59:36,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=96100.0, ans=0.125 2024-09-16 22:59:39,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=96100.0, ans=0.0 2024-09-16 22:59:51,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=96140.0, ans=0.125 2024-09-16 23:00:00,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=96140.0, ans=0.125 2024-09-16 23:00:04,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=96180.0, ans=0.125 2024-09-16 23:00:05,185 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.329e+01 1.115e+02 1.239e+02 1.357e+02 3.096e+02, threshold=2.478e+02, percent-clipped=1.0 2024-09-16 23:00:05,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=96180.0, ans=0.0 2024-09-16 23:00:22,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=96220.0, ans=0.125 2024-09-16 23:00:49,733 INFO [train.py:1198] (1/2) Epoch 6, batch 1450, loss[loss=0.3161, ctc_loss=0.2361, cr_loss=0.4401, attn_decoder_loss=0.3152, over 29470.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.2105, cr_loss=0.4297, attn_decoder_loss=0.2874, over 5805285.20 frames. ], batch size: 94, lr: 1.86e-02, grad_scale: 4.0 2024-09-16 23:00:50,051 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:00:53,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=96300.0, ans=0.0 2024-09-16 23:00:57,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=96300.0, ans=0.0 2024-09-16 23:00:59,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96300.0, ans=0.1 2024-09-16 23:01:06,083 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:01:27,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-09-16 23:01:42,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96420.0, ans=0.1 2024-09-16 23:01:42,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.42 vs. limit=15.0 2024-09-16 23:01:49,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=96420.0, ans=0.125 2024-09-16 23:02:02,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=96460.0, ans=0.125 2024-09-16 23:02:10,310 INFO [train.py:1198] (1/2) Epoch 6, batch 1500, loss[loss=0.3054, ctc_loss=0.2283, cr_loss=0.4394, attn_decoder_loss=0.3042, over 29625.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2101, cr_loss=0.4295, attn_decoder_loss=0.2872, over 5805155.18 frames. ], batch size: 86, lr: 1.86e-02, grad_scale: 8.0 2024-09-16 23:02:15,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=96500.0, ans=0.125 2024-09-16 23:02:28,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.93 vs. limit=12.0 2024-09-16 23:02:44,680 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.157e+01 1.117e+02 1.199e+02 1.410e+02 2.285e+02, threshold=2.399e+02, percent-clipped=0.0 2024-09-16 23:03:13,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=96660.0, ans=0.025 2024-09-16 23:03:22,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=96660.0, ans=0.125 2024-09-16 23:03:28,271 INFO [train.py:1198] (1/2) Epoch 6, batch 1550, loss[loss=0.3109, ctc_loss=0.2277, cr_loss=0.4585, attn_decoder_loss=0.31, over 29490.00 frames. ], tot_loss[loss=0.2885, ctc_loss=0.2109, cr_loss=0.4298, attn_decoder_loss=0.2875, over 5780556.12 frames. ], batch size: 90, lr: 1.85e-02, grad_scale: 4.0 2024-09-16 23:03:28,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=96700.0, ans=0.0 2024-09-16 23:03:34,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=96700.0, ans=0.125 2024-09-16 23:04:17,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.38 vs. limit=12.0 2024-09-16 23:04:23,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=12.0 2024-09-16 23:04:27,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=96820.0, ans=0.125 2024-09-16 23:04:42,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=96860.0, ans=0.125 2024-09-16 23:04:45,349 INFO [train.py:1198] (1/2) Epoch 6, batch 1600, loss[loss=0.3053, ctc_loss=0.226, cr_loss=0.4552, attn_decoder_loss=0.304, over 29670.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.2111, cr_loss=0.4294, attn_decoder_loss=0.2874, over 5763618.93 frames. ], batch size: 85, lr: 1.85e-02, grad_scale: 8.0 2024-09-16 23:05:03,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=96940.0, ans=0.125 2024-09-16 23:05:03,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96940.0, ans=0.1 2024-09-16 23:05:13,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=96940.0, ans=0.125 2024-09-16 23:05:22,799 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.461e+01 1.097e+02 1.251e+02 1.445e+02 2.140e+02, threshold=2.501e+02, percent-clipped=0.0 2024-09-16 23:05:23,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2024-09-16 23:05:26,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=96980.0, ans=0.0 2024-09-16 23:05:36,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=97020.0, ans=0.125 2024-09-16 23:05:44,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.85 vs. limit=15.0 2024-09-16 23:06:02,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=97060.0, ans=0.0 2024-09-16 23:06:02,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=97060.0, ans=0.125 2024-09-16 23:06:06,648 INFO [train.py:1198] (1/2) Epoch 6, batch 1650, loss[loss=0.2932, ctc_loss=0.2073, cr_loss=0.4235, attn_decoder_loss=0.2933, over 29668.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.2112, cr_loss=0.4292, attn_decoder_loss=0.2873, over 5759579.81 frames. ], batch size: 89, lr: 1.85e-02, grad_scale: 4.0 2024-09-16 23:06:06,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=97100.0, ans=0.035 2024-09-16 23:06:49,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.54 vs. limit=15.0 2024-09-16 23:06:57,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=97220.0, ans=0.0 2024-09-16 23:07:23,293 INFO [train.py:1198] (1/2) Epoch 6, batch 1700, loss[loss=0.2429, ctc_loss=0.1641, cr_loss=0.3598, attn_decoder_loss=0.2437, over 29588.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.2093, cr_loss=0.4273, attn_decoder_loss=0.2863, over 5781548.56 frames. ], batch size: 69, lr: 1.85e-02, grad_scale: 8.0 2024-09-16 23:07:31,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=97300.0, ans=0.125 2024-09-16 23:07:45,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=97340.0, ans=0.125 2024-09-16 23:07:51,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=97340.0, ans=0.2 2024-09-16 23:08:00,040 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.175e+01 1.040e+02 1.164e+02 1.267e+02 1.903e+02, threshold=2.329e+02, percent-clipped=0.0 2024-09-16 23:08:12,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=97420.0, ans=0.0 2024-09-16 23:08:39,835 INFO [train.py:1198] (1/2) Epoch 6, batch 1750, loss[loss=0.2638, ctc_loss=0.1953, cr_loss=0.4301, attn_decoder_loss=0.2618, over 29344.00 frames. ], tot_loss[loss=0.2868, ctc_loss=0.2089, cr_loss=0.4274, attn_decoder_loss=0.2859, over 5788915.21 frames. ], batch size: 67, lr: 1.85e-02, grad_scale: 4.0 2024-09-16 23:08:44,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=97500.0, ans=0.0 2024-09-16 23:08:52,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=97500.0, ans=0.0 2024-09-16 23:09:27,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=97620.0, ans=0.05 2024-09-16 23:09:43,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=97620.0, ans=0.0 2024-09-16 23:09:59,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.27 vs. limit=15.0 2024-09-16 23:10:00,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=97700.0, ans=0.125 2024-09-16 23:10:01,610 INFO [train.py:1198] (1/2) Epoch 6, batch 1800, loss[loss=0.2998, ctc_loss=0.2138, cr_loss=0.4424, attn_decoder_loss=0.2995, over 29685.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.2089, cr_loss=0.4273, attn_decoder_loss=0.2862, over 5791864.75 frames. ], batch size: 83, lr: 1.85e-02, grad_scale: 8.0 2024-09-16 23:10:09,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=97700.0, ans=0.125 2024-09-16 23:10:18,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=97740.0, ans=0.0 2024-09-16 23:10:29,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=97740.0, ans=0.125 2024-09-16 23:10:39,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.002e+01 1.085e+02 1.174e+02 1.306e+02 4.568e+02, threshold=2.348e+02, percent-clipped=1.0 2024-09-16 23:11:00,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.76 vs. limit=10.0 2024-09-16 23:11:18,207 INFO [train.py:1198] (1/2) Epoch 6, batch 1850, loss[loss=0.2971, ctc_loss=0.2253, cr_loss=0.4425, attn_decoder_loss=0.2953, over 29625.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.2085, cr_loss=0.4268, attn_decoder_loss=0.2859, over 5798475.34 frames. ], batch size: 86, lr: 1.84e-02, grad_scale: 4.0 2024-09-16 23:11:25,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.03 vs. limit=15.0 2024-09-16 23:11:26,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.43 vs. limit=6.0 2024-09-16 23:11:36,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=97940.0, ans=0.0 2024-09-16 23:11:44,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=97940.0, ans=0.125 2024-09-16 23:12:03,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=98020.0, ans=0.125 2024-09-16 23:12:34,396 INFO [train.py:1198] (1/2) Epoch 6, batch 1900, loss[loss=0.3004, ctc_loss=0.2249, cr_loss=0.4469, attn_decoder_loss=0.2988, over 29708.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2095, cr_loss=0.4286, attn_decoder_loss=0.287, over 5806356.91 frames. ], batch size: 89, lr: 1.84e-02, grad_scale: 8.0 2024-09-16 23:12:45,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=98100.0, ans=10.0 2024-09-16 23:12:51,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=98140.0, ans=0.125 2024-09-16 23:13:09,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=12.0 2024-09-16 23:13:15,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=98180.0, ans=0.125 2024-09-16 23:13:15,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.92 vs. limit=15.0 2024-09-16 23:13:16,236 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.164e+01 1.098e+02 1.206e+02 1.393e+02 1.994e+02, threshold=2.412e+02, percent-clipped=0.0 2024-09-16 23:13:55,789 INFO [train.py:1198] (1/2) Epoch 6, batch 1950, loss[loss=0.276, ctc_loss=0.196, cr_loss=0.4352, attn_decoder_loss=0.2752, over 29446.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2098, cr_loss=0.43, attn_decoder_loss=0.2879, over 5820219.80 frames. ], batch size: 78, lr: 1.84e-02, grad_scale: 4.0 2024-09-16 23:14:04,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=98300.0, ans=0.035 2024-09-16 23:14:24,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=98340.0, ans=0.1 2024-09-16 23:14:25,737 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:14:31,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=98380.0, ans=0.125 2024-09-16 23:14:31,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=98380.0, ans=0.125 2024-09-16 23:14:49,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=98420.0, ans=0.035 2024-09-16 23:14:49,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=98420.0, ans=0.025 2024-09-16 23:15:06,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.72 vs. limit=22.5 2024-09-16 23:15:07,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=98460.0, ans=15.0 2024-09-16 23:15:13,566 INFO [train.py:1198] (1/2) Epoch 6, batch 2000, loss[loss=0.2591, ctc_loss=0.1893, cr_loss=0.4261, attn_decoder_loss=0.2574, over 29381.00 frames. ], tot_loss[loss=0.2897, ctc_loss=0.211, cr_loss=0.4315, attn_decoder_loss=0.2888, over 5796499.08 frames. ], batch size: 67, lr: 1.84e-02, grad_scale: 8.0 2024-09-16 23:15:33,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98540.0, ans=0.1 2024-09-16 23:15:40,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2024-09-16 23:15:53,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=98580.0, ans=0.2 2024-09-16 23:15:55,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.337e+01 1.180e+02 1.301e+02 1.522e+02 2.715e+02, threshold=2.602e+02, percent-clipped=3.0 2024-09-16 23:15:59,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=8.0 2024-09-16 23:16:02,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.30 vs. limit=15.0 2024-09-16 23:16:02,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2024-09-16 23:16:30,540 INFO [train.py:1198] (1/2) Epoch 6, batch 2050, loss[loss=0.2695, ctc_loss=0.1939, cr_loss=0.4307, attn_decoder_loss=0.2683, over 29452.00 frames. ], tot_loss[loss=0.2885, ctc_loss=0.2102, cr_loss=0.4304, attn_decoder_loss=0.2877, over 5788696.00 frames. ], batch size: 70, lr: 1.84e-02, grad_scale: 4.0 2024-09-16 23:16:31,067 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:16:40,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=98700.0, ans=0.025 2024-09-16 23:16:54,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=12.0 2024-09-16 23:17:03,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=98780.0, ans=0.125 2024-09-16 23:17:36,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=98860.0, ans=0.0 2024-09-16 23:17:41,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=98860.0, ans=0.125 2024-09-16 23:17:43,260 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:17:44,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=98860.0, ans=0.125 2024-09-16 23:17:52,101 INFO [train.py:1198] (1/2) Epoch 6, batch 2100, loss[loss=0.2834, ctc_loss=0.2081, cr_loss=0.4141, attn_decoder_loss=0.2825, over 29761.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.2089, cr_loss=0.429, attn_decoder_loss=0.2868, over 5800824.44 frames. ], batch size: 81, lr: 1.84e-02, grad_scale: 8.0 2024-09-16 23:17:59,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=98900.0, ans=0.125 2024-09-16 23:18:04,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=98900.0, ans=0.125 2024-09-16 23:18:18,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=98940.0, ans=0.0 2024-09-16 23:18:30,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=98980.0, ans=0.125 2024-09-16 23:18:33,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=98980.0, ans=0.0 2024-09-16 23:18:34,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.717e+01 1.049e+02 1.121e+02 1.246e+02 2.037e+02, threshold=2.242e+02, percent-clipped=0.0 2024-09-16 23:19:01,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.20 vs. limit=10.0 2024-09-16 23:19:08,379 INFO [train.py:1198] (1/2) Epoch 6, batch 2150, loss[loss=0.279, ctc_loss=0.2016, cr_loss=0.4104, attn_decoder_loss=0.2785, over 29438.00 frames. ], tot_loss[loss=0.286, ctc_loss=0.2071, cr_loss=0.4269, attn_decoder_loss=0.2853, over 5815036.96 frames. ], batch size: 78, lr: 1.83e-02, grad_scale: 4.0 2024-09-16 23:19:25,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=99140.0, ans=0.0 2024-09-16 23:19:26,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-09-16 23:19:33,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=99140.0, ans=0.0 2024-09-16 23:20:09,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=99260.0, ans=0.0 2024-09-16 23:20:25,671 INFO [train.py:1198] (1/2) Epoch 6, batch 2200, loss[loss=0.3064, ctc_loss=0.2281, cr_loss=0.4522, attn_decoder_loss=0.305, over 29622.00 frames. ], tot_loss[loss=0.2866, ctc_loss=0.2078, cr_loss=0.4276, attn_decoder_loss=0.2858, over 5811094.70 frames. ], batch size: 86, lr: 1.83e-02, grad_scale: 8.0 2024-09-16 23:20:37,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2024-09-16 23:20:44,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=99340.0, ans=0.125 2024-09-16 23:21:08,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-16 23:21:12,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.295e+01 1.080e+02 1.191e+02 1.298e+02 2.659e+02, threshold=2.382e+02, percent-clipped=1.0 2024-09-16 23:21:34,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=99460.0, ans=0.2 2024-09-16 23:21:35,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=99460.0, ans=0.1 2024-09-16 23:21:46,855 INFO [train.py:1198] (1/2) Epoch 6, batch 2250, loss[loss=0.2997, ctc_loss=0.2257, cr_loss=0.4595, attn_decoder_loss=0.2978, over 29683.00 frames. ], tot_loss[loss=0.2866, ctc_loss=0.2077, cr_loss=0.4282, attn_decoder_loss=0.2859, over 5810797.18 frames. ], batch size: 82, lr: 1.83e-02, grad_scale: 4.0 2024-09-16 23:22:15,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99580.0, ans=0.1 2024-09-16 23:22:52,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=99660.0, ans=0.125 2024-09-16 23:22:52,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=99660.0, ans=0.125 2024-09-16 23:22:55,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=99660.0, ans=0.125 2024-09-16 23:23:02,895 INFO [train.py:1198] (1/2) Epoch 6, batch 2300, loss[loss=0.2674, ctc_loss=0.1983, cr_loss=0.4136, attn_decoder_loss=0.2659, over 29735.00 frames. ], tot_loss[loss=0.2857, ctc_loss=0.2074, cr_loss=0.427, attn_decoder_loss=0.2849, over 5799572.29 frames. ], batch size: 72, lr: 1.83e-02, grad_scale: 8.0 2024-09-16 23:23:13,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=99700.0, ans=0.0 2024-09-16 23:23:49,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.439e+01 1.127e+02 1.220e+02 1.323e+02 2.863e+02, threshold=2.441e+02, percent-clipped=2.0 2024-09-16 23:23:51,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=99820.0, ans=0.025 2024-09-16 23:23:51,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99820.0, ans=0.1 2024-09-16 23:24:12,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=99860.0, ans=0.1 2024-09-16 23:24:19,893 INFO [train.py:1198] (1/2) Epoch 6, batch 2350, loss[loss=0.2858, ctc_loss=0.2028, cr_loss=0.42, attn_decoder_loss=0.2857, over 29682.00 frames. ], tot_loss[loss=0.2858, ctc_loss=0.2075, cr_loss=0.4269, attn_decoder_loss=0.285, over 5804998.54 frames. ], batch size: 83, lr: 1.83e-02, grad_scale: 4.0 2024-09-16 23:24:28,334 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2024-09-16 23:24:29,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=99900.0, ans=0.0 2024-09-16 23:24:40,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2024-09-16 23:25:02,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=99980.0, ans=0.2 2024-09-16 23:25:08,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=100020.0, ans=0.0 2024-09-16 23:25:11,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=100020.0, ans=0.125 2024-09-16 23:25:20,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=100020.0, ans=0.0 2024-09-16 23:25:31,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=100060.0, ans=0.125 2024-09-16 23:25:41,940 INFO [train.py:1198] (1/2) Epoch 6, batch 2400, loss[loss=0.2798, ctc_loss=0.2015, cr_loss=0.3889, attn_decoder_loss=0.2798, over 29512.00 frames. ], tot_loss[loss=0.2861, ctc_loss=0.2075, cr_loss=0.427, attn_decoder_loss=0.2854, over 5808025.55 frames. ], batch size: 76, lr: 1.83e-02, grad_scale: 8.0 2024-09-16 23:25:43,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=100100.0, ans=0.09899494936611666 2024-09-16 23:26:04,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=100140.0, ans=0.07 2024-09-16 23:26:24,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=100180.0, ans=0.125 2024-09-16 23:26:29,679 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.345e+01 1.104e+02 1.208e+02 1.363e+02 5.197e+02, threshold=2.416e+02, percent-clipped=3.0 2024-09-16 23:26:58,938 INFO [train.py:1198] (1/2) Epoch 6, batch 2450, loss[loss=0.2882, ctc_loss=0.2088, cr_loss=0.4333, attn_decoder_loss=0.2874, over 29734.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2085, cr_loss=0.4281, attn_decoder_loss=0.2864, over 5784471.91 frames. ], batch size: 82, lr: 1.82e-02, grad_scale: 4.0 2024-09-16 23:26:59,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=100300.0, ans=0.0 2024-09-16 23:27:28,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=100380.0, ans=0.125 2024-09-16 23:27:33,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=100380.0, ans=0.0 2024-09-16 23:27:45,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=100420.0, ans=0.025 2024-09-16 23:28:08,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=100460.0, ans=0.1 2024-09-16 23:28:10,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.30 vs. limit=15.0 2024-09-16 23:28:14,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=100500.0, ans=0.0 2024-09-16 23:28:15,759 INFO [train.py:1198] (1/2) Epoch 6, batch 2500, loss[loss=0.2831, ctc_loss=0.1877, cr_loss=0.4032, attn_decoder_loss=0.2848, over 29629.00 frames. ], tot_loss[loss=0.287, ctc_loss=0.2083, cr_loss=0.4284, attn_decoder_loss=0.2863, over 5795429.64 frames. ], batch size: 86, lr: 1.82e-02, grad_scale: 8.0 2024-09-16 23:28:25,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=100500.0, ans=0.125 2024-09-16 23:28:34,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=100540.0, ans=0.025 2024-09-16 23:28:38,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=12.0 2024-09-16 23:28:53,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=100580.0, ans=0.04949747468305833 2024-09-16 23:28:53,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=100580.0, ans=0.0 2024-09-16 23:29:00,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100620.0, ans=0.1 2024-09-16 23:29:04,966 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.690e+01 1.098e+02 1.228e+02 1.415e+02 3.536e+02, threshold=2.457e+02, percent-clipped=1.0 2024-09-16 23:29:10,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=100620.0, ans=0.125 2024-09-16 23:29:36,830 INFO [train.py:1198] (1/2) Epoch 6, batch 2550, loss[loss=0.2439, ctc_loss=0.1673, cr_loss=0.3814, attn_decoder_loss=0.2439, over 29340.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.2077, cr_loss=0.4286, attn_decoder_loss=0.286, over 5797710.82 frames. ], batch size: 67, lr: 1.82e-02, grad_scale: 4.0 2024-09-16 23:29:37,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=100700.0, ans=0.1 2024-09-16 23:29:39,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.08 vs. limit=15.0 2024-09-16 23:29:42,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=11.18 vs. limit=12.0 2024-09-16 23:29:53,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=100740.0, ans=0.0 2024-09-16 23:30:01,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=100740.0, ans=0.0 2024-09-16 23:30:03,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.09 vs. limit=15.0 2024-09-16 23:30:06,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=100780.0, ans=0.125 2024-09-16 23:30:54,115 INFO [train.py:1198] (1/2) Epoch 6, batch 2600, loss[loss=0.2848, ctc_loss=0.205, cr_loss=0.4415, attn_decoder_loss=0.2839, over 29442.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.2083, cr_loss=0.4285, attn_decoder_loss=0.2864, over 5794472.45 frames. ], batch size: 78, lr: 1.82e-02, grad_scale: 8.0 2024-09-16 23:31:19,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.87 vs. limit=15.0 2024-09-16 23:31:20,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=100940.0, ans=0.125 2024-09-16 23:31:34,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100980.0, ans=0.1 2024-09-16 23:31:38,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=101020.0, ans=0.09899494936611666 2024-09-16 23:31:38,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101020.0, ans=0.1 2024-09-16 23:31:44,363 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.431e+01 1.047e+02 1.098e+02 1.263e+02 2.416e+02, threshold=2.197e+02, percent-clipped=0.0 2024-09-16 23:32:10,135 INFO [train.py:1198] (1/2) Epoch 6, batch 2650, loss[loss=0.2996, ctc_loss=0.2243, cr_loss=0.4374, attn_decoder_loss=0.2983, over 29270.00 frames. ], tot_loss[loss=0.2874, ctc_loss=0.2085, cr_loss=0.4289, attn_decoder_loss=0.2867, over 5801466.75 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 4.0 2024-09-16 23:32:34,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=101140.0, ans=0.125 2024-09-16 23:32:36,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2024-09-16 23:32:40,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=101180.0, ans=0.1 2024-09-16 23:32:55,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.62 vs. limit=15.0 2024-09-16 23:32:59,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101220.0, ans=0.1 2024-09-16 23:33:23,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101260.0, ans=0.1 2024-09-16 23:33:31,359 INFO [train.py:1198] (1/2) Epoch 6, batch 2700, loss[loss=0.2913, ctc_loss=0.2153, cr_loss=0.4394, attn_decoder_loss=0.29, over 29521.00 frames. ], tot_loss[loss=0.2876, ctc_loss=0.2086, cr_loss=0.4295, attn_decoder_loss=0.2869, over 5795686.26 frames. ], batch size: 87, lr: 1.82e-02, grad_scale: 8.0 2024-09-16 23:34:20,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.45 vs. limit=22.5 2024-09-16 23:34:23,727 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.471e+01 1.102e+02 1.222e+02 1.380e+02 2.898e+02, threshold=2.443e+02, percent-clipped=1.0 2024-09-16 23:34:28,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=101420.0, ans=0.2 2024-09-16 23:34:41,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101460.0, ans=0.1 2024-09-16 23:34:48,560 INFO [train.py:1198] (1/2) Epoch 6, batch 2750, loss[loss=0.2692, ctc_loss=0.1853, cr_loss=0.4036, attn_decoder_loss=0.2695, over 29521.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.2074, cr_loss=0.4271, attn_decoder_loss=0.2855, over 5794624.22 frames. ], batch size: 75, lr: 1.81e-02, grad_scale: 4.0 2024-09-16 23:34:59,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff3.min_abs, batch_count=101500.0, ans=0.2 2024-09-16 23:35:10,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=101540.0, ans=0.125 2024-09-16 23:35:15,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=101540.0, ans=0.125 2024-09-16 23:35:15,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-09-16 23:35:37,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.58 vs. limit=15.0 2024-09-16 23:35:46,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=101620.0, ans=0.0 2024-09-16 23:36:06,183 INFO [train.py:1198] (1/2) Epoch 6, batch 2800, loss[loss=0.3379, ctc_loss=0.3013, cr_loss=0.4519, attn_decoder_loss=0.3319, over 19808.00 frames. ], tot_loss[loss=0.2866, ctc_loss=0.2081, cr_loss=0.4273, attn_decoder_loss=0.2858, over 5774178.26 frames. ], batch size: 209, lr: 1.81e-02, grad_scale: 8.0 2024-09-16 23:36:23,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=101740.0, ans=0.125 2024-09-16 23:36:27,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=101740.0, ans=0.125 2024-09-16 23:36:38,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=101780.0, ans=0.0 2024-09-16 23:36:46,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=101780.0, ans=0.0 2024-09-16 23:37:04,396 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.339e+01 1.139e+02 1.318e+02 1.529e+02 2.693e+02, threshold=2.635e+02, percent-clipped=4.0 2024-09-16 23:37:17,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=101860.0, ans=0.125 2024-09-16 23:37:18,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=101860.0, ans=0.125 2024-09-16 23:37:27,408 INFO [train.py:1198] (1/2) Epoch 6, batch 2850, loss[loss=0.2797, ctc_loss=0.1979, cr_loss=0.4375, attn_decoder_loss=0.2791, over 29502.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2096, cr_loss=0.4289, attn_decoder_loss=0.2869, over 5758190.27 frames. ], batch size: 77, lr: 1.81e-02, grad_scale: 4.0 2024-09-16 23:37:43,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=101940.0, ans=0.0 2024-09-16 23:37:46,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=101940.0, ans=0.125 2024-09-16 23:37:58,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=101980.0, ans=0.0 2024-09-16 23:38:20,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-09-16 23:38:27,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=102060.0, ans=0.1 2024-09-16 23:38:41,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=102060.0, ans=0.125 2024-09-16 23:38:43,930 INFO [train.py:1198] (1/2) Epoch 6, batch 2900, loss[loss=0.2825, ctc_loss=0.1958, cr_loss=0.4154, attn_decoder_loss=0.2829, over 29428.00 frames. ], tot_loss[loss=0.2885, ctc_loss=0.2094, cr_loss=0.4299, attn_decoder_loss=0.2878, over 5785000.79 frames. ], batch size: 79, lr: 1.81e-02, grad_scale: 8.0 2024-09-16 23:38:58,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2024-09-16 23:39:04,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=15.0 2024-09-16 23:39:39,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.808e+01 1.155e+02 1.262e+02 1.445e+02 2.631e+02, threshold=2.524e+02, percent-clipped=0.0 2024-09-16 23:39:54,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=102260.0, ans=0.2 2024-09-16 23:39:57,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=102260.0, ans=0.0 2024-09-16 23:40:00,667 INFO [train.py:1198] (1/2) Epoch 6, batch 2950, loss[loss=0.2749, ctc_loss=0.1919, cr_loss=0.3991, attn_decoder_loss=0.2753, over 29534.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.208, cr_loss=0.4277, attn_decoder_loss=0.2859, over 5780008.10 frames. ], batch size: 75, lr: 1.81e-02, grad_scale: 4.0 2024-09-16 23:40:20,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-09-16 23:40:21,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.92 vs. limit=15.0 2024-09-16 23:40:22,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=102340.0, ans=0.125 2024-09-16 23:40:36,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=102380.0, ans=0.1 2024-09-16 23:40:40,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=102380.0, ans=0.125 2024-09-16 23:41:01,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=102420.0, ans=0.125 2024-09-16 23:41:07,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2024-09-16 23:41:09,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=102460.0, ans=0.125 2024-09-16 23:41:18,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2024-09-16 23:41:23,658 INFO [train.py:1198] (1/2) Epoch 6, batch 3000, loss[loss=0.2848, ctc_loss=0.2006, cr_loss=0.4269, attn_decoder_loss=0.2846, over 29750.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.2076, cr_loss=0.4277, attn_decoder_loss=0.2856, over 5780704.60 frames. ], batch size: 81, lr: 1.81e-02, grad_scale: 8.0 2024-09-16 23:41:23,659 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 23:41:31,512 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5027, 4.4299, 3.9014, 2.4920], device='cuda:1') 2024-09-16 23:41:42,100 INFO [train.py:1230] (1/2) Epoch 6, validation: loss=0.2192, ctc_loss=0.0625, cr_loss=4.383e-15, attn_decoder_loss=0.2366, over 944034.00 frames. 2024-09-16 23:41:42,101 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-16 23:41:44,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=102500.0, ans=0.2 2024-09-16 23:42:04,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102540.0, ans=0.1 2024-09-16 23:42:14,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=102580.0, ans=0.0 2024-09-16 23:42:38,834 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.071e+01 1.057e+02 1.164e+02 1.320e+02 2.426e+02, threshold=2.327e+02, percent-clipped=0.0 2024-09-16 23:42:52,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-09-16 23:42:57,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=102700.0, ans=0.1 2024-09-16 23:42:58,906 INFO [train.py:1198] (1/2) Epoch 6, batch 3050, loss[loss=0.2747, ctc_loss=0.1994, cr_loss=0.4098, attn_decoder_loss=0.2739, over 29536.00 frames. ], tot_loss[loss=0.2873, ctc_loss=0.2086, cr_loss=0.4285, attn_decoder_loss=0.2865, over 5775407.32 frames. ], batch size: 76, lr: 1.80e-02, grad_scale: 4.0 2024-09-16 23:43:16,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=102740.0, ans=0.0 2024-09-16 23:43:19,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-09-16 23:43:33,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2024-09-16 23:43:42,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=102780.0, ans=0.0 2024-09-16 23:43:46,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=102820.0, ans=0.2 2024-09-16 23:43:57,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2024-09-16 23:44:00,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=102860.0, ans=0.125 2024-09-16 23:44:06,572 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:44:10,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=102860.0, ans=0.2 2024-09-16 23:44:15,193 INFO [train.py:1198] (1/2) Epoch 6, batch 3100, loss[loss=0.3017, ctc_loss=0.2196, cr_loss=0.4502, attn_decoder_loss=0.3008, over 29261.00 frames. ], tot_loss[loss=0.2863, ctc_loss=0.2076, cr_loss=0.4276, attn_decoder_loss=0.2856, over 5774551.44 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 8.0 2024-09-16 23:44:26,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-16 23:44:27,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=102900.0, ans=0.0 2024-09-16 23:44:38,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2024-09-16 23:44:41,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=102940.0, ans=0.025 2024-09-16 23:44:51,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=102980.0, ans=0.2 2024-09-16 23:45:00,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=102980.0, ans=0.125 2024-09-16 23:45:02,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=102980.0, ans=0.125 2024-09-16 23:45:17,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.176e+01 1.082e+02 1.229e+02 1.361e+02 4.744e+02, threshold=2.458e+02, percent-clipped=3.0 2024-09-16 23:45:19,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2024-09-16 23:45:23,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=103060.0, ans=0.125 2024-09-16 23:45:35,736 INFO [train.py:1198] (1/2) Epoch 6, batch 3150, loss[loss=0.3026, ctc_loss=0.215, cr_loss=0.4568, attn_decoder_loss=0.3021, over 28738.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.2073, cr_loss=0.4277, attn_decoder_loss=0.2855, over 5781921.90 frames. ], batch size: 104, lr: 1.80e-02, grad_scale: 4.0 2024-09-16 23:45:58,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-09-16 23:46:31,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=103220.0, ans=0.2 2024-09-16 23:46:47,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=103260.0, ans=0.125 2024-09-16 23:46:50,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=103260.0, ans=0.125 2024-09-16 23:46:52,912 INFO [train.py:1198] (1/2) Epoch 6, batch 3200, loss[loss=0.2895, ctc_loss=0.2156, cr_loss=0.4676, attn_decoder_loss=0.2873, over 29440.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.2062, cr_loss=0.4265, attn_decoder_loss=0.2846, over 5793545.72 frames. ], batch size: 79, lr: 1.80e-02, grad_scale: 8.0 2024-09-16 23:47:01,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.57 vs. limit=15.0 2024-09-16 23:47:09,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2024-09-16 23:47:49,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.76 vs. limit=22.5 2024-09-16 23:47:52,865 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.859e+01 1.056e+02 1.155e+02 1.311e+02 1.883e+02, threshold=2.309e+02, percent-clipped=0.0 2024-09-16 23:48:01,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.53 vs. limit=15.0 2024-09-16 23:48:07,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=103460.0, ans=0.125 2024-09-16 23:48:09,767 INFO [train.py:1198] (1/2) Epoch 6, batch 3250, loss[loss=0.2828, ctc_loss=0.2016, cr_loss=0.4247, attn_decoder_loss=0.2824, over 29714.00 frames. ], tot_loss[loss=0.2863, ctc_loss=0.2073, cr_loss=0.4282, attn_decoder_loss=0.2856, over 5799800.31 frames. ], batch size: 84, lr: 1.80e-02, grad_scale: 4.0 2024-09-16 23:48:13,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.83 vs. limit=15.0 2024-09-16 23:48:22,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=103500.0, ans=0.0 2024-09-16 23:48:44,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=103580.0, ans=0.125 2024-09-16 23:49:02,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=103620.0, ans=0.0 2024-09-16 23:49:18,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=103660.0, ans=0.125 2024-09-16 23:49:30,808 INFO [train.py:1198] (1/2) Epoch 6, batch 3300, loss[loss=0.2984, ctc_loss=0.2132, cr_loss=0.4475, attn_decoder_loss=0.2979, over 28298.00 frames. ], tot_loss[loss=0.2848, ctc_loss=0.2059, cr_loss=0.4262, attn_decoder_loss=0.2841, over 5797466.80 frames. ], batch size: 111, lr: 1.80e-02, grad_scale: 8.0 2024-09-16 23:49:40,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=103700.0, ans=0.025 2024-09-16 23:50:02,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=103780.0, ans=0.025 2024-09-16 23:50:08,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.49 vs. limit=15.0 2024-09-16 23:50:21,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=103820.0, ans=0.125 2024-09-16 23:50:21,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=103820.0, ans=0.125 2024-09-16 23:50:25,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2024-09-16 23:50:32,022 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.684e+01 1.121e+02 1.244e+02 1.460e+02 3.755e+02, threshold=2.488e+02, percent-clipped=2.0 2024-09-16 23:50:47,251 INFO [train.py:1198] (1/2) Epoch 6, batch 3350, loss[loss=0.296, ctc_loss=0.2226, cr_loss=0.4253, attn_decoder_loss=0.2947, over 28799.00 frames. ], tot_loss[loss=0.2857, ctc_loss=0.2068, cr_loss=0.4268, attn_decoder_loss=0.285, over 5773932.12 frames. ], batch size: 104, lr: 1.79e-02, grad_scale: 4.0 2024-09-16 23:51:03,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2024-09-16 23:51:04,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=103940.0, ans=0.125 2024-09-16 23:51:13,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=103940.0, ans=0.125 2024-09-16 23:51:30,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=103980.0, ans=0.125 2024-09-16 23:52:04,370 INFO [train.py:1198] (1/2) Epoch 6, batch 3400, loss[loss=0.2427, ctc_loss=0.1643, cr_loss=0.3684, attn_decoder_loss=0.2432, over 29346.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.2064, cr_loss=0.4256, attn_decoder_loss=0.2845, over 5766020.49 frames. ], batch size: 67, lr: 1.79e-02, grad_scale: 8.0 2024-09-16 23:52:19,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.72 vs. limit=22.5 2024-09-16 23:52:53,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=104180.0, ans=0.125 2024-09-16 23:53:12,531 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.970e+01 1.077e+02 1.207e+02 1.405e+02 5.237e+02, threshold=2.415e+02, percent-clipped=2.0 2024-09-16 23:53:26,208 INFO [train.py:1198] (1/2) Epoch 6, batch 3450, loss[loss=0.2931, ctc_loss=0.2177, cr_loss=0.4064, attn_decoder_loss=0.2925, over 28118.00 frames. ], tot_loss[loss=0.2858, ctc_loss=0.207, cr_loss=0.4265, attn_decoder_loss=0.2851, over 5774585.17 frames. ], batch size: 111, lr: 1.79e-02, grad_scale: 4.0 2024-09-16 23:53:49,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104340.0, ans=0.1 2024-09-16 23:54:00,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=104380.0, ans=0.0 2024-09-16 23:54:00,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=104380.0, ans=0.125 2024-09-16 23:54:11,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-09-16 23:54:17,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=104420.0, ans=0.125 2024-09-16 23:54:17,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-09-16 23:54:33,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2024-09-16 23:54:37,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=104460.0, ans=0.025 2024-09-16 23:54:43,086 INFO [train.py:1198] (1/2) Epoch 6, batch 3500, loss[loss=0.2534, ctc_loss=0.18, cr_loss=0.37, attn_decoder_loss=0.2533, over 29315.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.2065, cr_loss=0.4269, attn_decoder_loss=0.2845, over 5777773.96 frames. ], batch size: 71, lr: 1.79e-02, grad_scale: 8.0 2024-09-16 23:55:31,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=104620.0, ans=0.125 2024-09-16 23:55:34,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.24 vs. limit=22.5 2024-09-16 23:55:46,679 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.951e+01 1.039e+02 1.144e+02 1.274e+02 4.432e+02, threshold=2.289e+02, percent-clipped=1.0 2024-09-16 23:55:48,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=104660.0, ans=0.1 2024-09-16 23:55:50,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=104660.0, ans=0.125 2024-09-16 23:55:58,734 INFO [train.py:1198] (1/2) Epoch 6, batch 3550, loss[loss=0.2976, ctc_loss=0.2111, cr_loss=0.4395, attn_decoder_loss=0.2974, over 29719.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.206, cr_loss=0.427, attn_decoder_loss=0.2845, over 5783166.74 frames. ], batch size: 89, lr: 1.79e-02, grad_scale: 4.0 2024-09-16 23:56:16,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104740.0, ans=0.1 2024-09-16 23:56:57,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=104860.0, ans=0.125 2024-09-16 23:57:16,976 INFO [train.py:1198] (1/2) Epoch 6, batch 3600, loss[loss=0.2805, ctc_loss=0.1958, cr_loss=0.4132, attn_decoder_loss=0.2807, over 29513.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.206, cr_loss=0.4276, attn_decoder_loss=0.2846, over 5792343.68 frames. ], batch size: 77, lr: 1.79e-02, grad_scale: 8.0 2024-09-16 23:57:17,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=104900.0, ans=0.025 2024-09-16 23:57:34,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-16 23:58:16,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=105020.0, ans=0.125 2024-09-16 23:58:23,466 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.018e+01 1.117e+02 1.191e+02 1.328e+02 4.381e+02, threshold=2.382e+02, percent-clipped=2.0 2024-09-16 23:58:24,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=12.0 2024-09-16 23:58:26,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=105060.0, ans=0.0 2024-09-16 23:58:34,009 INFO [train.py:1198] (1/2) Epoch 6, batch 3650, loss[loss=0.3095, ctc_loss=0.2336, cr_loss=0.4551, attn_decoder_loss=0.3078, over 29522.00 frames. ], tot_loss[loss=0.2843, ctc_loss=0.2048, cr_loss=0.4259, attn_decoder_loss=0.2837, over 5793973.76 frames. ], batch size: 90, lr: 1.79e-02, grad_scale: 4.0 2024-09-16 23:58:34,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=105100.0, ans=0.125 2024-09-16 23:58:35,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=105100.0, ans=0.125 2024-09-16 23:58:43,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=105100.0, ans=0.0 2024-09-16 23:58:49,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=105140.0, ans=0.125 2024-09-16 23:58:55,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=105140.0, ans=0.2 2024-09-16 23:59:04,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=105180.0, ans=15.0 2024-09-16 23:59:21,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=105220.0, ans=0.0 2024-09-16 23:59:49,425 INFO [train.py:1198] (1/2) Epoch 6, batch 3700, loss[loss=0.2979, ctc_loss=0.2169, cr_loss=0.4451, attn_decoder_loss=0.297, over 29704.00 frames. ], tot_loss[loss=0.2847, ctc_loss=0.2048, cr_loss=0.4264, attn_decoder_loss=0.2841, over 5803452.83 frames. ], batch size: 84, lr: 1.78e-02, grad_scale: 8.0 2024-09-17 00:00:11,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=105340.0, ans=0.125 2024-09-17 00:00:24,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=105380.0, ans=0.125 2024-09-17 00:00:41,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=105420.0, ans=0.025 2024-09-17 00:00:45,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=105420.0, ans=0.025 2024-09-17 00:00:48,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=105460.0, ans=0.2 2024-09-17 00:00:55,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.176e+01 1.057e+02 1.159e+02 1.295e+02 2.172e+02, threshold=2.318e+02, percent-clipped=0.0 2024-09-17 00:00:56,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=105460.0, ans=0.125 2024-09-17 00:00:59,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=105460.0, ans=0.0 2024-09-17 00:01:00,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=105460.0, ans=0.125 2024-09-17 00:01:04,782 INFO [train.py:1198] (1/2) Epoch 6, batch 3750, loss[loss=0.2447, ctc_loss=0.1683, cr_loss=0.3781, attn_decoder_loss=0.2448, over 29316.00 frames. ], tot_loss[loss=0.2847, ctc_loss=0.2048, cr_loss=0.4267, attn_decoder_loss=0.2841, over 5806843.88 frames. ], batch size: 67, lr: 1.78e-02, grad_scale: 4.0 2024-09-17 00:01:05,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=105500.0, ans=0.125 2024-09-17 00:01:23,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=105540.0, ans=0.0 2024-09-17 00:01:29,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=105540.0, ans=0.125 2024-09-17 00:01:36,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=105580.0, ans=0.025 2024-09-17 00:01:42,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=105580.0, ans=0.0 2024-09-17 00:02:20,319 INFO [train.py:1198] (1/2) Epoch 6, batch 3800, loss[loss=0.285, ctc_loss=0.1969, cr_loss=0.4293, attn_decoder_loss=0.2853, over 29641.00 frames. ], tot_loss[loss=0.2846, ctc_loss=0.2047, cr_loss=0.4263, attn_decoder_loss=0.284, over 5797090.39 frames. ], batch size: 86, lr: 1.78e-02, grad_scale: 8.0 2024-09-17 00:02:26,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=105700.0, ans=0.125 2024-09-17 00:02:40,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=105740.0, ans=0.025 2024-09-17 00:02:50,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=105780.0, ans=0.07 2024-09-17 00:02:56,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=105780.0, ans=0.0 2024-09-17 00:03:15,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=105820.0, ans=0.0 2024-09-17 00:03:22,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=105860.0, ans=0.2 2024-09-17 00:03:29,920 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.809e+01 1.097e+02 1.194e+02 1.336e+02 2.111e+02, threshold=2.388e+02, percent-clipped=0.0 2024-09-17 00:03:31,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=105860.0, ans=0.2 2024-09-17 00:03:36,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=105900.0, ans=0.1 2024-09-17 00:03:37,377 INFO [train.py:1198] (1/2) Epoch 6, batch 3850, loss[loss=0.3062, ctc_loss=0.2225, cr_loss=0.4527, attn_decoder_loss=0.3054, over 29288.00 frames. ], tot_loss[loss=0.2844, ctc_loss=0.2044, cr_loss=0.4263, attn_decoder_loss=0.2838, over 5811175.92 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 4.0 2024-09-17 00:03:39,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.65 vs. limit=15.0 2024-09-17 00:03:46,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=105900.0, ans=0.0 2024-09-17 00:03:55,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=105940.0, ans=0.1 2024-09-17 00:04:25,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=106020.0, ans=0.125 2024-09-17 00:04:42,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=106060.0, ans=0.0 2024-09-17 00:04:48,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=106060.0, ans=0.125 2024-09-17 00:04:54,336 INFO [train.py:1198] (1/2) Epoch 6, batch 3900, loss[loss=0.2832, ctc_loss=0.194, cr_loss=0.4142, attn_decoder_loss=0.284, over 29638.00 frames. ], tot_loss[loss=0.2854, ctc_loss=0.2054, cr_loss=0.4278, attn_decoder_loss=0.2848, over 5815855.43 frames. ], batch size: 86, lr: 1.78e-02, grad_scale: 8.0 2024-09-17 00:05:01,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2024-09-17 00:05:05,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=106100.0, ans=0.125 2024-09-17 00:05:13,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=22.5 2024-09-17 00:05:40,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-09-17 00:05:45,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=106220.0, ans=0.0 2024-09-17 00:05:51,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=106220.0, ans=0.125 2024-09-17 00:05:58,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=106260.0, ans=0.035 2024-09-17 00:06:03,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.054e+01 1.064e+02 1.152e+02 1.217e+02 1.852e+02, threshold=2.304e+02, percent-clipped=0.0 2024-09-17 00:06:03,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=106260.0, ans=0.125 2024-09-17 00:06:09,221 INFO [train.py:1198] (1/2) Epoch 6, batch 3950, loss[loss=0.3056, ctc_loss=0.2293, cr_loss=0.4591, attn_decoder_loss=0.3038, over 29464.00 frames. ], tot_loss[loss=0.2854, ctc_loss=0.2053, cr_loss=0.4284, attn_decoder_loss=0.2848, over 5835269.43 frames. ], batch size: 97, lr: 1.78e-02, grad_scale: 4.0 2024-09-17 00:06:20,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=106300.0, ans=0.2 2024-09-17 00:06:44,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-17 00:07:24,789 INFO [train.py:1198] (1/2) Epoch 6, batch 4000, loss[loss=0.2787, ctc_loss=0.2038, cr_loss=0.4253, attn_decoder_loss=0.2775, over 29487.00 frames. ], tot_loss[loss=0.2857, ctc_loss=0.206, cr_loss=0.4287, attn_decoder_loss=0.2851, over 5811032.02 frames. ], batch size: 74, lr: 1.77e-02, grad_scale: 8.0 2024-09-17 00:07:41,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=106540.0, ans=0.05 2024-09-17 00:07:41,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=106540.0, ans=0.125 2024-09-17 00:07:58,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.87 vs. limit=22.5 2024-09-17 00:07:59,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=106580.0, ans=0.125 2024-09-17 00:08:01,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.30 vs. limit=15.0 2024-09-17 00:08:36,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.741e+01 1.103e+02 1.193e+02 1.340e+02 9.903e+02, threshold=2.386e+02, percent-clipped=2.0 2024-09-17 00:08:40,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=106700.0, ans=0.125 2024-09-17 00:08:41,378 INFO [train.py:1198] (1/2) Epoch 6, batch 4050, loss[loss=0.3372, ctc_loss=0.2953, cr_loss=0.4654, attn_decoder_loss=0.3315, over 19315.00 frames. ], tot_loss[loss=0.2857, ctc_loss=0.2063, cr_loss=0.4285, attn_decoder_loss=0.285, over 5793706.14 frames. ], batch size: 209, lr: 1.77e-02, grad_scale: 4.0 2024-09-17 00:08:59,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106740.0, ans=0.1 2024-09-17 00:09:33,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=106820.0, ans=0.2 2024-09-17 00:09:56,855 INFO [train.py:1198] (1/2) Epoch 6, batch 4100, loss[loss=0.2904, ctc_loss=0.2099, cr_loss=0.4568, attn_decoder_loss=0.2891, over 29505.00 frames. ], tot_loss[loss=0.2859, ctc_loss=0.2064, cr_loss=0.4285, attn_decoder_loss=0.2852, over 5790025.12 frames. ], batch size: 90, lr: 1.77e-02, grad_scale: 8.0 2024-09-17 00:10:00,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=106900.0, ans=0.0 2024-09-17 00:10:12,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=22.5 2024-09-17 00:10:13,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=106940.0, ans=0.125 2024-09-17 00:10:19,205 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:10:27,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-09-17 00:10:46,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=107020.0, ans=0.05 2024-09-17 00:10:53,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=107020.0, ans=0.09899494936611666 2024-09-17 00:11:07,985 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.979e+01 1.121e+02 1.241e+02 1.471e+02 3.510e+02, threshold=2.481e+02, percent-clipped=3.0 2024-09-17 00:11:11,062 INFO [train.py:1198] (1/2) Epoch 6, batch 4150, loss[loss=0.279, ctc_loss=0.1939, cr_loss=0.4102, attn_decoder_loss=0.2793, over 29513.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2054, cr_loss=0.4281, attn_decoder_loss=0.2845, over 5796548.85 frames. ], batch size: 77, lr: 1.77e-02, grad_scale: 4.0 2024-09-17 00:11:21,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=107100.0, ans=0.2 2024-09-17 00:11:27,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=107140.0, ans=0.0 2024-09-17 00:12:10,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2024-09-17 00:12:14,550 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:12:26,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=107300.0, ans=0.125 2024-09-17 00:12:27,412 INFO [train.py:1198] (1/2) Epoch 6, batch 4200, loss[loss=0.3154, ctc_loss=0.2379, cr_loss=0.4979, attn_decoder_loss=0.3129, over 29524.00 frames. ], tot_loss[loss=0.2854, ctc_loss=0.2057, cr_loss=0.4287, attn_decoder_loss=0.2848, over 5797811.35 frames. ], batch size: 90, lr: 1.77e-02, grad_scale: 8.0 2024-09-17 00:12:33,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=107300.0, ans=0.125 2024-09-17 00:12:37,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.06 vs. limit=15.0 2024-09-17 00:12:53,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=107340.0, ans=0.025 2024-09-17 00:13:07,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-09-17 00:13:18,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=107420.0, ans=0.2 2024-09-17 00:13:28,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=107460.0, ans=0.0 2024-09-17 00:13:38,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=107460.0, ans=0.125 2024-09-17 00:13:41,774 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.922e+01 1.074e+02 1.154e+02 1.259e+02 3.870e+02, threshold=2.307e+02, percent-clipped=1.0 2024-09-17 00:13:43,276 INFO [train.py:1198] (1/2) Epoch 6, batch 4250, loss[loss=0.2715, ctc_loss=0.1915, cr_loss=0.3941, attn_decoder_loss=0.2716, over 29525.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.205, cr_loss=0.4277, attn_decoder_loss=0.2846, over 5803779.69 frames. ], batch size: 74, lr: 1.77e-02, grad_scale: 4.0 2024-09-17 00:13:48,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.09 vs. limit=15.0 2024-09-17 00:13:51,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2024-09-17 00:13:56,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=107540.0, ans=0.125 2024-09-17 00:13:59,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=107540.0, ans=0.2 2024-09-17 00:14:50,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=107660.0, ans=22.5 2024-09-17 00:14:51,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=107660.0, ans=0.125 2024-09-17 00:14:53,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107660.0, ans=0.1 2024-09-17 00:14:57,392 INFO [train.py:1198] (1/2) Epoch 6, batch 4300, loss[loss=0.2884, ctc_loss=0.1968, cr_loss=0.4207, attn_decoder_loss=0.2893, over 29511.00 frames. ], tot_loss[loss=0.2856, ctc_loss=0.2056, cr_loss=0.4288, attn_decoder_loss=0.285, over 5792445.16 frames. ], batch size: 87, lr: 1.77e-02, grad_scale: 8.0 2024-09-17 00:14:59,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107700.0, ans=0.1 2024-09-17 00:15:17,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.73 vs. limit=22.5 2024-09-17 00:15:20,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=107740.0, ans=0.125 2024-09-17 00:15:43,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=107820.0, ans=0.025 2024-09-17 00:16:09,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=107860.0, ans=0.125 2024-09-17 00:16:10,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=107860.0, ans=0.125 2024-09-17 00:16:13,582 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.817e+01 1.068e+02 1.179e+02 1.314e+02 6.167e+02, threshold=2.359e+02, percent-clipped=2.0 2024-09-17 00:16:13,608 INFO [train.py:1198] (1/2) Epoch 6, batch 4350, loss[loss=0.3069, ctc_loss=0.2313, cr_loss=0.4436, attn_decoder_loss=0.3054, over 29477.00 frames. ], tot_loss[loss=0.2893, ctc_loss=0.2089, cr_loss=0.4334, attn_decoder_loss=0.2886, over 5795622.77 frames. ], batch size: 97, lr: 1.76e-02, grad_scale: 4.0 2024-09-17 00:16:32,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=107940.0, ans=0.125 2024-09-17 00:16:40,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107940.0, ans=0.1 2024-09-17 00:16:45,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=107980.0, ans=0.125 2024-09-17 00:17:05,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=108020.0, ans=0.125 2024-09-17 00:17:28,672 INFO [train.py:1198] (1/2) Epoch 6, batch 4400, loss[loss=0.3026, ctc_loss=0.2232, cr_loss=0.4766, attn_decoder_loss=0.3008, over 27257.00 frames. ], tot_loss[loss=0.2915, ctc_loss=0.2111, cr_loss=0.4358, attn_decoder_loss=0.2907, over 5766829.84 frames. ], batch size: 124, lr: 1.76e-02, grad_scale: 8.0 2024-09-17 00:17:39,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108100.0, ans=0.1 2024-09-17 00:18:13,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.00 vs. limit=15.0 2024-09-17 00:18:40,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2024-09-17 00:18:44,524 INFO [train.py:1198] (1/2) Epoch 6, batch 4450, loss[loss=0.3289, ctc_loss=0.2766, cr_loss=0.4318, attn_decoder_loss=0.3251, over 20173.00 frames. ], tot_loss[loss=0.2951, ctc_loss=0.2173, cr_loss=0.4386, attn_decoder_loss=0.294, over 5574788.60 frames. ], batch size: 210, lr: 1.76e-02, grad_scale: 4.0 2024-09-17 00:18:46,021 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.821e+01 1.110e+02 1.171e+02 1.331e+02 5.376e+02, threshold=2.342e+02, percent-clipped=1.0 2024-09-17 00:18:52,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=108300.0, ans=0.0 2024-09-17 00:18:58,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=108340.0, ans=0.1 2024-09-17 00:19:10,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=108340.0, ans=0.07 2024-09-17 00:19:38,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.05 vs. limit=22.5 2024-09-17 00:19:40,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2024-09-17 00:19:49,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=108460.0, ans=0.2 2024-09-17 00:19:56,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=108460.0, ans=0.0 2024-09-17 00:20:00,688 INFO [train.py:1198] (1/2) Epoch 6, batch 4500, loss[loss=0.3141, ctc_loss=0.262, cr_loss=0.4567, attn_decoder_loss=0.3097, over 20392.00 frames. ], tot_loss[loss=0.2994, ctc_loss=0.2255, cr_loss=0.4405, attn_decoder_loss=0.2978, over 5230716.98 frames. ], batch size: 210, lr: 1.76e-02, grad_scale: 8.0 2024-09-17 00:20:08,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=24.51 vs. limit=22.5 2024-09-17 00:20:21,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=108540.0, ans=0.0 2024-09-17 00:21:33,134 WARNING [optim.py:503] (1/2) Scaling gradients by 0.06814046949148178, model_norm_threshold=234.16368103027344 2024-09-17 00:21:33,339 WARNING [optim.py:575] (1/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.0.norm_self_attn.weight with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.188e+06, grad_sumsq=4.711e+10, orig_rms_sq=6.766e-05 2024-09-17 00:21:33,370 INFO [train.py:1198] (1/2) Epoch 7, batch 0, loss[loss=0.3051, ctc_loss=0.1955, cr_loss=0.4476, attn_decoder_loss=0.3073, over 29621.00 frames. ], tot_loss[loss=0.3051, ctc_loss=0.1955, cr_loss=0.4476, attn_decoder_loss=0.3073, over 29621.00 frames. ], batch size: 73, lr: 1.65e-02, grad_scale: 8.0 2024-09-17 00:21:33,371 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 00:21:51,797 INFO [train.py:1230] (1/2) Epoch 7, validation: loss=0.2253, ctc_loss=0.06341, cr_loss=4.598e-15, attn_decoder_loss=0.2433, over 944034.00 frames. 2024-09-17 00:21:51,797 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 00:21:53,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=108600.0, ans=0.125 2024-09-17 00:22:04,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=108600.0, ans=0.0 2024-09-17 00:22:11,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2024-09-17 00:22:36,024 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.794e+01 1.158e+02 1.328e+02 1.536e+02 3.436e+03, threshold=2.655e+02, percent-clipped=8.0 2024-09-17 00:22:44,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=108720.0, ans=0.125 2024-09-17 00:22:48,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=108720.0, ans=0.2 2024-09-17 00:22:50,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2024-09-17 00:23:11,662 INFO [train.py:1198] (1/2) Epoch 7, batch 50, loss[loss=0.255, ctc_loss=0.1762, cr_loss=0.3672, attn_decoder_loss=0.2556, over 29459.00 frames. ], tot_loss[loss=0.2904, ctc_loss=0.2122, cr_loss=0.4334, attn_decoder_loss=0.2895, over 1266693.78 frames. ], batch size: 70, lr: 1.65e-02, grad_scale: 4.0 2024-09-17 00:23:18,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=108800.0, ans=0.125 2024-09-17 00:23:21,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2024-09-17 00:23:42,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.30 vs. limit=22.5 2024-09-17 00:24:27,111 INFO [train.py:1198] (1/2) Epoch 7, batch 100, loss[loss=0.2757, ctc_loss=0.198, cr_loss=0.4182, attn_decoder_loss=0.275, over 29541.00 frames. ], tot_loss[loss=0.2904, ctc_loss=0.2114, cr_loss=0.4321, attn_decoder_loss=0.2896, over 2249266.25 frames. ], batch size: 76, lr: 1.65e-02, grad_scale: 8.0 2024-09-17 00:25:04,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109080.0, ans=0.1 2024-09-17 00:25:10,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.921e+01 1.059e+02 1.169e+02 1.320e+02 2.276e+02, threshold=2.339e+02, percent-clipped=0.0 2024-09-17 00:25:15,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=109120.0, ans=0.2 2024-09-17 00:25:22,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=109120.0, ans=0.125 2024-09-17 00:25:43,869 INFO [train.py:1198] (1/2) Epoch 7, batch 150, loss[loss=0.2547, ctc_loss=0.1725, cr_loss=0.4022, attn_decoder_loss=0.2549, over 29414.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.2062, cr_loss=0.4286, attn_decoder_loss=0.2856, over 3044683.85 frames. ], batch size: 70, lr: 1.64e-02, grad_scale: 4.0 2024-09-17 00:25:58,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2024-09-17 00:26:38,437 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:26:41,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=109320.0, ans=0.125 2024-09-17 00:27:00,803 INFO [train.py:1198] (1/2) Epoch 7, batch 200, loss[loss=0.2985, ctc_loss=0.2231, cr_loss=0.4267, attn_decoder_loss=0.2974, over 27269.00 frames. ], tot_loss[loss=0.2844, ctc_loss=0.204, cr_loss=0.4275, attn_decoder_loss=0.2838, over 3656987.51 frames. ], batch size: 124, lr: 1.64e-02, grad_scale: 8.0 2024-09-17 00:27:02,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=109400.0, ans=0.0 2024-09-17 00:27:06,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=109400.0, ans=0.0 2024-09-17 00:27:21,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109440.0, ans=0.1 2024-09-17 00:27:46,008 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.796e+01 1.025e+02 1.125e+02 1.234e+02 4.171e+02, threshold=2.251e+02, percent-clipped=1.0 2024-09-17 00:27:54,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=109520.0, ans=0.125 2024-09-17 00:27:57,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.98 vs. limit=22.5 2024-09-17 00:28:17,026 INFO [train.py:1198] (1/2) Epoch 7, batch 250, loss[loss=0.2959, ctc_loss=0.2173, cr_loss=0.4523, attn_decoder_loss=0.2946, over 29194.00 frames. ], tot_loss[loss=0.2839, ctc_loss=0.2032, cr_loss=0.4272, attn_decoder_loss=0.2833, over 4140529.60 frames. ], batch size: 100, lr: 1.64e-02, grad_scale: 4.0 2024-09-17 00:28:56,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=109680.0, ans=0.2 2024-09-17 00:29:06,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2024-09-17 00:29:12,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=109720.0, ans=0.125 2024-09-17 00:29:18,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2024-09-17 00:29:19,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=109760.0, ans=0.125 2024-09-17 00:29:19,846 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:29:33,125 INFO [train.py:1198] (1/2) Epoch 7, batch 300, loss[loss=0.3033, ctc_loss=0.2187, cr_loss=0.4473, attn_decoder_loss=0.3027, over 29521.00 frames. ], tot_loss[loss=0.283, ctc_loss=0.2017, cr_loss=0.4259, attn_decoder_loss=0.2826, over 4508851.44 frames. ], batch size: 92, lr: 1.64e-02, grad_scale: 8.0 2024-09-17 00:29:34,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=109800.0, ans=0.2 2024-09-17 00:29:39,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=109800.0, ans=0.025 2024-09-17 00:29:56,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=109840.0, ans=0.0 2024-09-17 00:29:56,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=109840.0, ans=0.1 2024-09-17 00:29:58,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=109840.0, ans=0.0 2024-09-17 00:30:09,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=109880.0, ans=0.125 2024-09-17 00:30:09,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.06 vs. limit=6.0 2024-09-17 00:30:15,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2024-09-17 00:30:19,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=109920.0, ans=0.0 2024-09-17 00:30:25,690 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.782e+01 1.034e+02 1.140e+02 1.272e+02 2.553e+02, threshold=2.279e+02, percent-clipped=1.0 2024-09-17 00:30:30,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=109920.0, ans=0.125 2024-09-17 00:30:35,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=109920.0, ans=0.2 2024-09-17 00:30:44,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=109960.0, ans=0.0 2024-09-17 00:30:54,851 INFO [train.py:1198] (1/2) Epoch 7, batch 350, loss[loss=0.2515, ctc_loss=0.1684, cr_loss=0.3941, attn_decoder_loss=0.2519, over 29330.00 frames. ], tot_loss[loss=0.2835, ctc_loss=0.2019, cr_loss=0.4269, attn_decoder_loss=0.2831, over 4793855.45 frames. ], batch size: 71, lr: 1.64e-02, grad_scale: 4.0 2024-09-17 00:31:08,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110040.0, ans=0.1 2024-09-17 00:31:25,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-09-17 00:31:34,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=110080.0, ans=0.125 2024-09-17 00:31:50,044 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-09-17 00:32:03,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=110160.0, ans=0.0 2024-09-17 00:32:10,454 INFO [train.py:1198] (1/2) Epoch 7, batch 400, loss[loss=0.2882, ctc_loss=0.1998, cr_loss=0.414, attn_decoder_loss=0.2888, over 29707.00 frames. ], tot_loss[loss=0.2829, ctc_loss=0.2008, cr_loss=0.4254, attn_decoder_loss=0.2825, over 5024103.71 frames. ], batch size: 82, lr: 1.64e-02, grad_scale: 8.0 2024-09-17 00:32:10,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=110200.0, ans=0.025 2024-09-17 00:32:12,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110200.0, ans=0.1 2024-09-17 00:32:13,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=110200.0, ans=0.0 2024-09-17 00:32:34,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.33 vs. limit=22.5 2024-09-17 00:32:46,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=110280.0, ans=0.0 2024-09-17 00:32:59,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.594e+01 1.062e+02 1.160e+02 1.275e+02 1.904e+02, threshold=2.320e+02, percent-clipped=0.0 2024-09-17 00:33:12,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=110360.0, ans=0.125 2024-09-17 00:33:18,317 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:33:27,322 INFO [train.py:1198] (1/2) Epoch 7, batch 450, loss[loss=0.2926, ctc_loss=0.2141, cr_loss=0.4276, attn_decoder_loss=0.2919, over 29685.00 frames. ], tot_loss[loss=0.2828, ctc_loss=0.2009, cr_loss=0.4245, attn_decoder_loss=0.2824, over 5186932.93 frames. ], batch size: 83, lr: 1.64e-02, grad_scale: 4.0 2024-09-17 00:33:27,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=110400.0, ans=0.125 2024-09-17 00:33:37,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=110400.0, ans=0.125 2024-09-17 00:34:29,460 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:34:48,895 INFO [train.py:1198] (1/2) Epoch 7, batch 500, loss[loss=0.2969, ctc_loss=0.2134, cr_loss=0.4371, attn_decoder_loss=0.2965, over 29413.00 frames. ], tot_loss[loss=0.2821, ctc_loss=0.2002, cr_loss=0.4249, attn_decoder_loss=0.2817, over 5328435.97 frames. ], batch size: 94, lr: 1.63e-02, grad_scale: 8.0 2024-09-17 00:34:49,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=110600.0, ans=0.125 2024-09-17 00:34:57,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.76 vs. limit=22.5 2024-09-17 00:34:58,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=110600.0, ans=0.0 2024-09-17 00:35:00,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=110600.0, ans=0.0 2024-09-17 00:35:03,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=110640.0, ans=0.2 2024-09-17 00:35:24,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.15 vs. limit=12.0 2024-09-17 00:35:27,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=110680.0, ans=0.125 2024-09-17 00:35:31,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110680.0, ans=0.1 2024-09-17 00:35:39,103 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.537e+01 1.048e+02 1.174e+02 1.330e+02 3.263e+02, threshold=2.347e+02, percent-clipped=4.0 2024-09-17 00:35:41,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=110720.0, ans=0.125 2024-09-17 00:35:44,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=110720.0, ans=0.0 2024-09-17 00:35:49,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2024-09-17 00:35:51,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=110760.0, ans=0.0 2024-09-17 00:35:58,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2024-09-17 00:35:59,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.50 vs. limit=10.0 2024-09-17 00:36:04,946 INFO [train.py:1198] (1/2) Epoch 7, batch 550, loss[loss=0.2911, ctc_loss=0.2024, cr_loss=0.4331, attn_decoder_loss=0.2914, over 28783.00 frames. ], tot_loss[loss=0.282, ctc_loss=0.2003, cr_loss=0.4244, attn_decoder_loss=0.2817, over 5421699.28 frames. ], batch size: 104, lr: 1.63e-02, grad_scale: 4.0 2024-09-17 00:36:06,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=110800.0, ans=0.0 2024-09-17 00:36:09,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2024-09-17 00:36:25,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=110840.0, ans=0.0 2024-09-17 00:36:32,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=110840.0, ans=0.125 2024-09-17 00:36:32,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110840.0, ans=0.1 2024-09-17 00:36:49,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=110920.0, ans=0.2 2024-09-17 00:36:59,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=110920.0, ans=0.1 2024-09-17 00:37:05,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110960.0, ans=0.1 2024-09-17 00:37:12,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=110960.0, ans=0.1 2024-09-17 00:37:14,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=110960.0, ans=0.95 2024-09-17 00:37:15,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110960.0, ans=0.1 2024-09-17 00:37:21,444 INFO [train.py:1198] (1/2) Epoch 7, batch 600, loss[loss=0.2967, ctc_loss=0.212, cr_loss=0.4357, attn_decoder_loss=0.2964, over 29202.00 frames. ], tot_loss[loss=0.2818, ctc_loss=0.1997, cr_loss=0.4238, attn_decoder_loss=0.2815, over 5507842.29 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 8.0 2024-09-17 00:37:37,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2024-09-17 00:37:49,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2024-09-17 00:37:50,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2024-09-17 00:37:58,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=111080.0, ans=0.2 2024-09-17 00:38:08,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=111120.0, ans=0.2 2024-09-17 00:38:14,464 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.423e+01 1.086e+02 1.156e+02 1.256e+02 2.672e+02, threshold=2.312e+02, percent-clipped=2.0 2024-09-17 00:38:17,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=111120.0, ans=0.125 2024-09-17 00:38:20,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=111120.0, ans=0.07 2024-09-17 00:38:26,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=111160.0, ans=0.125 2024-09-17 00:38:40,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=111200.0, ans=0.125 2024-09-17 00:38:41,960 INFO [train.py:1198] (1/2) Epoch 7, batch 650, loss[loss=0.2735, ctc_loss=0.1822, cr_loss=0.4022, attn_decoder_loss=0.2746, over 29778.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1984, cr_loss=0.4229, attn_decoder_loss=0.2806, over 5585216.94 frames. ], batch size: 81, lr: 1.63e-02, grad_scale: 4.0 2024-09-17 00:38:59,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=111240.0, ans=0.0 2024-09-17 00:39:16,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=111280.0, ans=0.2 2024-09-17 00:39:26,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=111320.0, ans=0.02 2024-09-17 00:39:35,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.44 vs. limit=12.0 2024-09-17 00:39:35,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=111320.0, ans=0.025 2024-09-17 00:39:38,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=111320.0, ans=0.125 2024-09-17 00:39:58,134 INFO [train.py:1198] (1/2) Epoch 7, batch 700, loss[loss=0.2688, ctc_loss=0.1869, cr_loss=0.4206, attn_decoder_loss=0.2686, over 29535.00 frames. ], tot_loss[loss=0.2819, ctc_loss=0.1993, cr_loss=0.4251, attn_decoder_loss=0.2817, over 5637072.48 frames. ], batch size: 76, lr: 1.63e-02, grad_scale: 8.0 2024-09-17 00:40:18,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111440.0, ans=0.1 2024-09-17 00:40:30,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=111480.0, ans=0.125 2024-09-17 00:40:35,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=111480.0, ans=0.125 2024-09-17 00:40:38,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=111480.0, ans=0.2 2024-09-17 00:40:45,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=111520.0, ans=0.125 2024-09-17 00:40:45,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=111520.0, ans=0.025 2024-09-17 00:40:51,488 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.298e+01 1.046e+02 1.126e+02 1.229e+02 1.906e+02, threshold=2.253e+02, percent-clipped=0.0 2024-09-17 00:40:56,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111520.0, ans=0.1 2024-09-17 00:41:02,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=111560.0, ans=0.125 2024-09-17 00:41:05,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=111560.0, ans=0.07 2024-09-17 00:41:10,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=111560.0, ans=0.1 2024-09-17 00:41:14,386 INFO [train.py:1198] (1/2) Epoch 7, batch 750, loss[loss=0.286, ctc_loss=0.1988, cr_loss=0.4353, attn_decoder_loss=0.286, over 29720.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.1986, cr_loss=0.4242, attn_decoder_loss=0.2809, over 5675617.36 frames. ], batch size: 82, lr: 1.63e-02, grad_scale: 4.0 2024-09-17 00:41:15,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-09-17 00:41:31,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=111640.0, ans=0.125 2024-09-17 00:41:36,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-09-17 00:41:42,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2024-09-17 00:41:43,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=22.5 2024-09-17 00:41:50,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2024-09-17 00:42:08,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=111720.0, ans=0.2 2024-09-17 00:42:15,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-17 00:42:23,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111760.0, ans=0.1 2024-09-17 00:42:35,180 INFO [train.py:1198] (1/2) Epoch 7, batch 800, loss[loss=0.252, ctc_loss=0.1696, cr_loss=0.385, attn_decoder_loss=0.2526, over 29604.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.1986, cr_loss=0.4244, attn_decoder_loss=0.2808, over 5707322.87 frames. ], batch size: 73, lr: 1.63e-02, grad_scale: 8.0 2024-09-17 00:42:36,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=111800.0, ans=0.125 2024-09-17 00:43:29,888 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.814e+01 1.067e+02 1.173e+02 1.326e+02 3.037e+02, threshold=2.345e+02, percent-clipped=2.0 2024-09-17 00:43:36,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=111960.0, ans=0.125 2024-09-17 00:43:58,047 INFO [train.py:1198] (1/2) Epoch 7, batch 850, loss[loss=0.2972, ctc_loss=0.2165, cr_loss=0.433, attn_decoder_loss=0.2966, over 29719.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.1982, cr_loss=0.4234, attn_decoder_loss=0.2803, over 5736392.31 frames. ], batch size: 89, lr: 1.62e-02, grad_scale: 4.0 2024-09-17 00:44:08,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=112000.0, ans=15.0 2024-09-17 00:44:11,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.20 vs. limit=10.0 2024-09-17 00:44:30,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=112080.0, ans=0.125 2024-09-17 00:44:42,661 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:45:14,256 INFO [train.py:1198] (1/2) Epoch 7, batch 900, loss[loss=0.2597, ctc_loss=0.1791, cr_loss=0.3961, attn_decoder_loss=0.2599, over 29607.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.1988, cr_loss=0.4236, attn_decoder_loss=0.2808, over 5740920.19 frames. ], batch size: 73, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:45:31,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=112240.0, ans=0.0 2024-09-17 00:45:37,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=112240.0, ans=0.2 2024-09-17 00:45:50,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=112280.0, ans=0.125 2024-09-17 00:46:02,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=112320.0, ans=10.0 2024-09-17 00:46:08,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=112320.0, ans=0.02 2024-09-17 00:46:12,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=112320.0, ans=0.125 2024-09-17 00:46:14,981 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.034e+01 1.106e+02 1.225e+02 1.378e+02 5.810e+02, threshold=2.450e+02, percent-clipped=7.0 2024-09-17 00:46:28,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=112360.0, ans=0.125 2024-09-17 00:46:34,339 INFO [train.py:1198] (1/2) Epoch 7, batch 950, loss[loss=0.2672, ctc_loss=0.1894, cr_loss=0.4215, attn_decoder_loss=0.2665, over 29522.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.1988, cr_loss=0.4241, attn_decoder_loss=0.2808, over 5743870.70 frames. ], batch size: 74, lr: 1.62e-02, grad_scale: 4.0 2024-09-17 00:46:57,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=112440.0, ans=0.125 2024-09-17 00:47:09,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=112480.0, ans=0.0 2024-09-17 00:47:18,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=112520.0, ans=0.025 2024-09-17 00:47:50,428 INFO [train.py:1198] (1/2) Epoch 7, batch 1000, loss[loss=0.2575, ctc_loss=0.1687, cr_loss=0.3832, attn_decoder_loss=0.2588, over 29507.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.1993, cr_loss=0.4239, attn_decoder_loss=0.2814, over 5736914.09 frames. ], batch size: 77, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:47:59,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112600.0, ans=0.1 2024-09-17 00:48:02,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=22.5 2024-09-17 00:48:12,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.99 vs. limit=10.0 2024-09-17 00:48:30,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=112680.0, ans=0.0 2024-09-17 00:48:39,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=112720.0, ans=0.025 2024-09-17 00:48:41,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=112720.0, ans=0.125 2024-09-17 00:48:46,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.415e+01 1.043e+02 1.127e+02 1.327e+02 3.931e+02, threshold=2.254e+02, percent-clipped=2.0 2024-09-17 00:48:51,629 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:48:53,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=112760.0, ans=10.0 2024-09-17 00:48:59,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=112760.0, ans=0.035 2024-09-17 00:49:02,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=112760.0, ans=0.2 2024-09-17 00:49:06,951 INFO [train.py:1198] (1/2) Epoch 7, batch 1050, loss[loss=0.2825, ctc_loss=0.1927, cr_loss=0.4227, attn_decoder_loss=0.2831, over 29687.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.1986, cr_loss=0.4231, attn_decoder_loss=0.2808, over 5745449.14 frames. ], batch size: 85, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:49:16,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=112800.0, ans=0.05 2024-09-17 00:49:18,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=112800.0, ans=0.125 2024-09-17 00:49:54,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=112920.0, ans=0.0 2024-09-17 00:50:07,613 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:50:09,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=112920.0, ans=0.0 2024-09-17 00:50:09,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112920.0, ans=0.1 2024-09-17 00:50:28,827 INFO [train.py:1198] (1/2) Epoch 7, batch 1100, loss[loss=0.2753, ctc_loss=0.1969, cr_loss=0.431, attn_decoder_loss=0.2744, over 29460.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1981, cr_loss=0.4229, attn_decoder_loss=0.2805, over 5758411.73 frames. ], batch size: 78, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:50:37,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-09-17 00:50:39,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=113000.0, ans=0.125 2024-09-17 00:50:56,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=113040.0, ans=0.125 2024-09-17 00:51:26,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=113120.0, ans=0.0 2024-09-17 00:51:28,966 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.665e+01 1.036e+02 1.119e+02 1.238e+02 1.913e+02, threshold=2.238e+02, percent-clipped=0.0 2024-09-17 00:51:31,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=113160.0, ans=0.0 2024-09-17 00:51:35,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113160.0, ans=0.1 2024-09-17 00:51:45,879 INFO [train.py:1198] (1/2) Epoch 7, batch 1150, loss[loss=0.2718, ctc_loss=0.1871, cr_loss=0.4005, attn_decoder_loss=0.2723, over 29436.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.1989, cr_loss=0.4233, attn_decoder_loss=0.2807, over 5755726.54 frames. ], batch size: 78, lr: 1.62e-02, grad_scale: 4.0 2024-09-17 00:51:47,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=113200.0, ans=0.125 2024-09-17 00:51:54,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2024-09-17 00:52:24,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.74 vs. limit=10.0 2024-09-17 00:52:25,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.56 vs. limit=15.0 2024-09-17 00:52:30,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=113320.0, ans=0.0 2024-09-17 00:52:32,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=113320.0, ans=0.0 2024-09-17 00:52:51,127 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:52:52,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=113360.0, ans=0.95 2024-09-17 00:53:02,764 INFO [train.py:1198] (1/2) Epoch 7, batch 1200, loss[loss=0.2659, ctc_loss=0.1706, cr_loss=0.3919, attn_decoder_loss=0.2678, over 29688.00 frames. ], tot_loss[loss=0.2819, ctc_loss=0.1998, cr_loss=0.4243, attn_decoder_loss=0.2816, over 5747494.48 frames. ], batch size: 85, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:53:12,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=113400.0, ans=0.1 2024-09-17 00:53:22,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=113440.0, ans=0.125 2024-09-17 00:53:24,321 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:53:27,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113440.0, ans=0.1 2024-09-17 00:53:28,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=113440.0, ans=0.07 2024-09-17 00:53:30,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=113440.0, ans=0.125 2024-09-17 00:53:39,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=113480.0, ans=0.09899494936611666 2024-09-17 00:54:08,731 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.382e+01 1.039e+02 1.128e+02 1.242e+02 2.195e+02, threshold=2.256e+02, percent-clipped=0.0 2024-09-17 00:54:12,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=113560.0, ans=0.125 2024-09-17 00:54:13,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=113560.0, ans=0.1 2024-09-17 00:54:18,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=113560.0, ans=0.0 2024-09-17 00:54:24,448 INFO [train.py:1198] (1/2) Epoch 7, batch 1250, loss[loss=0.2924, ctc_loss=0.1975, cr_loss=0.424, attn_decoder_loss=0.2935, over 29500.00 frames. ], tot_loss[loss=0.2823, ctc_loss=0.1999, cr_loss=0.4248, attn_decoder_loss=0.282, over 5775964.62 frames. ], batch size: 92, lr: 1.61e-02, grad_scale: 4.0 2024-09-17 00:54:26,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=113600.0, ans=0.0 2024-09-17 00:54:41,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=113640.0, ans=0.0 2024-09-17 00:54:43,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=113640.0, ans=0.1 2024-09-17 00:55:40,874 INFO [train.py:1198] (1/2) Epoch 7, batch 1300, loss[loss=0.2968, ctc_loss=0.217, cr_loss=0.4369, attn_decoder_loss=0.2959, over 28183.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.1993, cr_loss=0.4233, attn_decoder_loss=0.2815, over 5779154.72 frames. ], batch size: 111, lr: 1.61e-02, grad_scale: 8.0 2024-09-17 00:55:54,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113840.0, ans=0.1 2024-09-17 00:56:37,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=113920.0, ans=0.125 2024-09-17 00:56:43,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.471e+01 1.026e+02 1.124e+02 1.254e+02 2.028e+02, threshold=2.249e+02, percent-clipped=0.0 2024-09-17 00:56:55,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=114000.0, ans=0.05 2024-09-17 00:56:57,146 INFO [train.py:1198] (1/2) Epoch 7, batch 1350, loss[loss=0.2832, ctc_loss=0.1986, cr_loss=0.4311, attn_decoder_loss=0.283, over 29756.00 frames. ], tot_loss[loss=0.2814, ctc_loss=0.1985, cr_loss=0.4236, attn_decoder_loss=0.2812, over 5796439.18 frames. ], batch size: 81, lr: 1.61e-02, grad_scale: 4.0 2024-09-17 00:57:05,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2024-09-17 00:57:08,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=114000.0, ans=0.1 2024-09-17 00:57:15,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=114040.0, ans=0.0 2024-09-17 00:57:21,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=114040.0, ans=0.1 2024-09-17 00:57:27,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=114080.0, ans=0.125 2024-09-17 00:57:38,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-09-17 00:57:48,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=114120.0, ans=0.2 2024-09-17 00:58:01,165 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:58:04,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=114160.0, ans=0.1 2024-09-17 00:58:10,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=114160.0, ans=0.1 2024-09-17 00:58:18,160 INFO [train.py:1198] (1/2) Epoch 7, batch 1400, loss[loss=0.2539, ctc_loss=0.1816, cr_loss=0.3871, attn_decoder_loss=0.2533, over 29595.00 frames. ], tot_loss[loss=0.2812, ctc_loss=0.1983, cr_loss=0.4236, attn_decoder_loss=0.281, over 5807321.96 frames. ], batch size: 69, lr: 1.61e-02, grad_scale: 8.0 2024-09-17 00:58:20,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=114200.0, ans=0.07 2024-09-17 00:58:50,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=114280.0, ans=0.0 2024-09-17 00:59:14,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=114320.0, ans=0.125 2024-09-17 00:59:16,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=114320.0, ans=0.125 2024-09-17 00:59:21,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.456e+01 9.951e+01 1.071e+02 1.173e+02 2.370e+02, threshold=2.143e+02, percent-clipped=1.0 2024-09-17 00:59:31,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=114360.0, ans=0.07 2024-09-17 00:59:34,533 INFO [train.py:1198] (1/2) Epoch 7, batch 1450, loss[loss=0.2955, ctc_loss=0.2043, cr_loss=0.4622, attn_decoder_loss=0.2953, over 29442.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.1985, cr_loss=0.4237, attn_decoder_loss=0.2813, over 5804540.86 frames. ], batch size: 94, lr: 1.61e-02, grad_scale: 4.0 2024-09-17 00:59:34,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=114400.0, ans=0.025 2024-09-17 00:59:34,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=114400.0, ans=0.125 2024-09-17 00:59:48,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=114440.0, ans=0.125 2024-09-17 01:00:00,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=114440.0, ans=0.125 2024-09-17 01:00:06,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=114480.0, ans=0.04949747468305833 2024-09-17 01:00:08,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=114480.0, ans=0.125 2024-09-17 01:00:14,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-17 01:00:23,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2024-09-17 01:00:41,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=114560.0, ans=0.0 2024-09-17 01:00:41,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=114560.0, ans=0.0 2024-09-17 01:00:49,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=114600.0, ans=0.125 2024-09-17 01:00:50,574 INFO [train.py:1198] (1/2) Epoch 7, batch 1500, loss[loss=0.2847, ctc_loss=0.1942, cr_loss=0.4076, attn_decoder_loss=0.2857, over 29647.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.1982, cr_loss=0.4233, attn_decoder_loss=0.2814, over 5806535.69 frames. ], batch size: 86, lr: 1.61e-02, grad_scale: 8.0 2024-09-17 01:01:19,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=114680.0, ans=0.0 2024-09-17 01:01:29,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=114680.0, ans=0.0 2024-09-17 01:01:58,848 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.036e+01 1.083e+02 1.173e+02 1.293e+02 2.517e+02, threshold=2.346e+02, percent-clipped=1.0 2024-09-17 01:02:11,568 INFO [train.py:1198] (1/2) Epoch 7, batch 1550, loss[loss=0.3032, ctc_loss=0.2146, cr_loss=0.4651, attn_decoder_loss=0.3027, over 29497.00 frames. ], tot_loss[loss=0.2816, ctc_loss=0.1987, cr_loss=0.4241, attn_decoder_loss=0.2814, over 5781072.45 frames. ], batch size: 90, lr: 1.61e-02, grad_scale: 4.0 2024-09-17 01:02:11,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=114800.0, ans=0.0 2024-09-17 01:02:14,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=114800.0, ans=0.0 2024-09-17 01:02:20,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=114800.0, ans=0.2 2024-09-17 01:02:26,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=114840.0, ans=0.1 2024-09-17 01:02:30,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=114840.0, ans=0.1 2024-09-17 01:02:35,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten.whitening_limit, batch_count=114840.0, ans=15.0 2024-09-17 01:02:37,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=114840.0, ans=0.0 2024-09-17 01:02:43,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=114880.0, ans=0.125 2024-09-17 01:02:54,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=114880.0, ans=0.0 2024-09-17 01:03:12,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=114960.0, ans=0.0 2024-09-17 01:03:19,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2024-09-17 01:03:27,529 INFO [train.py:1198] (1/2) Epoch 7, batch 1600, loss[loss=0.287, ctc_loss=0.1898, cr_loss=0.4299, attn_decoder_loss=0.2882, over 29690.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.1993, cr_loss=0.4244, attn_decoder_loss=0.2815, over 5762970.39 frames. ], batch size: 85, lr: 1.60e-02, grad_scale: 8.0 2024-09-17 01:03:27,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=115000.0, ans=0.2 2024-09-17 01:03:41,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-09-17 01:03:43,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=115040.0, ans=0.1 2024-09-17 01:03:44,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=115040.0, ans=0.125 2024-09-17 01:03:49,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=115040.0, ans=0.1 2024-09-17 01:04:03,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=115080.0, ans=0.125 2024-09-17 01:04:15,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=115120.0, ans=0.2 2024-09-17 01:04:18,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=115120.0, ans=0.0 2024-09-17 01:04:18,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=115120.0, ans=0.125 2024-09-17 01:04:25,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=115120.0, ans=0.0 2024-09-17 01:04:28,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115160.0, ans=0.1 2024-09-17 01:04:34,567 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.309e+01 1.047e+02 1.116e+02 1.262e+02 4.085e+02, threshold=2.232e+02, percent-clipped=3.0 2024-09-17 01:04:43,949 INFO [train.py:1198] (1/2) Epoch 7, batch 1650, loss[loss=0.2839, ctc_loss=0.1958, cr_loss=0.4311, attn_decoder_loss=0.2841, over 29705.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.1987, cr_loss=0.4238, attn_decoder_loss=0.281, over 5756012.94 frames. ], batch size: 89, lr: 1.60e-02, grad_scale: 4.0 2024-09-17 01:05:01,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=115240.0, ans=0.1 2024-09-17 01:05:38,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=115320.0, ans=0.125 2024-09-17 01:05:44,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=115320.0, ans=0.2 2024-09-17 01:06:04,532 INFO [train.py:1198] (1/2) Epoch 7, batch 1700, loss[loss=0.2392, ctc_loss=0.161, cr_loss=0.3499, attn_decoder_loss=0.2401, over 29584.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1978, cr_loss=0.423, attn_decoder_loss=0.2806, over 5778600.67 frames. ], batch size: 69, lr: 1.60e-02, grad_scale: 8.0 2024-09-17 01:06:18,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=115440.0, ans=0.035 2024-09-17 01:06:25,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=115440.0, ans=0.125 2024-09-17 01:06:26,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115440.0, ans=0.1 2024-09-17 01:06:29,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=115440.0, ans=0.125 2024-09-17 01:06:43,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=115480.0, ans=0.125 2024-09-17 01:07:01,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=115520.0, ans=0.125 2024-09-17 01:07:01,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=115520.0, ans=0.125 2024-09-17 01:07:03,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=115520.0, ans=0.125 2024-09-17 01:07:06,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=115560.0, ans=0.125 2024-09-17 01:07:12,701 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:07:13,850 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.613e+01 9.967e+01 1.096e+02 1.177e+02 1.822e+02, threshold=2.192e+02, percent-clipped=0.0 2024-09-17 01:07:20,494 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:07:21,577 INFO [train.py:1198] (1/2) Epoch 7, batch 1750, loss[loss=0.2488, ctc_loss=0.1701, cr_loss=0.392, attn_decoder_loss=0.2489, over 29396.00 frames. ], tot_loss[loss=0.2803, ctc_loss=0.1972, cr_loss=0.4227, attn_decoder_loss=0.2801, over 5787852.39 frames. ], batch size: 67, lr: 1.60e-02, grad_scale: 4.0 2024-09-17 01:07:26,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=115600.0, ans=0.04949747468305833 2024-09-17 01:07:51,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=115680.0, ans=0.125 2024-09-17 01:08:05,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=115680.0, ans=0.125 2024-09-17 01:08:16,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=5.22 vs. limit=12.0 2024-09-17 01:08:35,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=115760.0, ans=0.07 2024-09-17 01:08:38,051 INFO [train.py:1198] (1/2) Epoch 7, batch 1800, loss[loss=0.2943, ctc_loss=0.208, cr_loss=0.4366, attn_decoder_loss=0.2942, over 29689.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1976, cr_loss=0.4229, attn_decoder_loss=0.2805, over 5790610.99 frames. ], batch size: 83, lr: 1.60e-02, grad_scale: 8.0 2024-09-17 01:08:43,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.74 vs. limit=5.0 2024-09-17 01:09:07,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=115880.0, ans=0.0 2024-09-17 01:09:16,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=115880.0, ans=0.125 2024-09-17 01:09:50,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.679e+01 1.033e+02 1.126e+02 1.316e+02 3.290e+02, threshold=2.253e+02, percent-clipped=1.0 2024-09-17 01:09:57,306 INFO [train.py:1198] (1/2) Epoch 7, batch 1850, loss[loss=0.2884, ctc_loss=0.2013, cr_loss=0.4209, attn_decoder_loss=0.2887, over 29654.00 frames. ], tot_loss[loss=0.28, ctc_loss=0.1968, cr_loss=0.4222, attn_decoder_loss=0.2799, over 5795160.35 frames. ], batch size: 86, lr: 1.60e-02, grad_scale: 4.0 2024-09-17 01:10:07,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=116000.0, ans=0.1 2024-09-17 01:10:26,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=12.0 2024-09-17 01:10:39,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=116080.0, ans=0.125 2024-09-17 01:10:39,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=116080.0, ans=0.125 2024-09-17 01:10:51,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.30 vs. limit=15.0 2024-09-17 01:10:57,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=116120.0, ans=0.025 2024-09-17 01:11:09,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=116160.0, ans=0.125 2024-09-17 01:11:11,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.35 vs. limit=12.0 2024-09-17 01:11:14,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.79 vs. limit=15.0 2024-09-17 01:11:14,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=15.0 2024-09-17 01:11:15,068 INFO [train.py:1198] (1/2) Epoch 7, batch 1900, loss[loss=0.2882, ctc_loss=0.1939, cr_loss=0.4034, attn_decoder_loss=0.2897, over 29688.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1968, cr_loss=0.4227, attn_decoder_loss=0.2804, over 5802719.11 frames. ], batch size: 89, lr: 1.60e-02, grad_scale: 8.0 2024-09-17 01:11:26,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=24.77 vs. limit=22.5 2024-09-17 01:11:29,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=116240.0, ans=0.125 2024-09-17 01:11:29,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.95 vs. limit=22.5 2024-09-17 01:11:40,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-09-17 01:11:42,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=116240.0, ans=0.125 2024-09-17 01:11:47,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=116280.0, ans=0.02 2024-09-17 01:12:26,915 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.068e+01 1.016e+02 1.078e+02 1.162e+02 1.899e+02, threshold=2.156e+02, percent-clipped=0.0 2024-09-17 01:12:31,489 INFO [train.py:1198] (1/2) Epoch 7, batch 1950, loss[loss=0.2819, ctc_loss=0.194, cr_loss=0.455, attn_decoder_loss=0.2816, over 29463.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.1968, cr_loss=0.4239, attn_decoder_loss=0.2811, over 5817850.65 frames. ], batch size: 78, lr: 1.60e-02, grad_scale: 4.0 2024-09-17 01:12:34,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=116400.0, ans=0.125 2024-09-17 01:12:35,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=116400.0, ans=0.0 2024-09-17 01:12:41,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=116400.0, ans=0.125 2024-09-17 01:12:42,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=116400.0, ans=0.125 2024-09-17 01:12:53,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=116440.0, ans=0.125 2024-09-17 01:12:54,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=116440.0, ans=0.125 2024-09-17 01:12:57,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=116440.0, ans=0.125 2024-09-17 01:13:22,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=116520.0, ans=0.0 2024-09-17 01:13:23,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=116520.0, ans=0.0 2024-09-17 01:13:38,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=116560.0, ans=0.0 2024-09-17 01:13:50,434 INFO [train.py:1198] (1/2) Epoch 7, batch 2000, loss[loss=0.2518, ctc_loss=0.1714, cr_loss=0.3832, attn_decoder_loss=0.2522, over 29372.00 frames. ], tot_loss[loss=0.2819, ctc_loss=0.1981, cr_loss=0.4247, attn_decoder_loss=0.2818, over 5796724.25 frames. ], batch size: 67, lr: 1.59e-02, grad_scale: 8.0 2024-09-17 01:13:57,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=116600.0, ans=0.125 2024-09-17 01:14:54,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116760.0, ans=0.1 2024-09-17 01:15:02,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=116760.0, ans=0.025 2024-09-17 01:15:06,318 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.446e+01 1.080e+02 1.203e+02 1.389e+02 2.597e+02, threshold=2.406e+02, percent-clipped=3.0 2024-09-17 01:15:09,741 INFO [train.py:1198] (1/2) Epoch 7, batch 2050, loss[loss=0.2647, ctc_loss=0.191, cr_loss=0.4184, attn_decoder_loss=0.2636, over 29426.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.1982, cr_loss=0.424, attn_decoder_loss=0.2811, over 5787173.29 frames. ], batch size: 70, lr: 1.59e-02, grad_scale: 4.0 2024-09-17 01:15:10,703 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2024-09-17 01:15:10,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-09-17 01:15:15,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.79 vs. limit=10.0 2024-09-17 01:15:28,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116840.0, ans=0.1 2024-09-17 01:15:31,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=116840.0, ans=0.0 2024-09-17 01:16:23,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=116960.0, ans=0.125 2024-09-17 01:16:25,959 INFO [train.py:1198] (1/2) Epoch 7, batch 2100, loss[loss=0.2693, ctc_loss=0.184, cr_loss=0.4087, attn_decoder_loss=0.2697, over 29773.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1978, cr_loss=0.4235, attn_decoder_loss=0.2806, over 5798056.55 frames. ], batch size: 81, lr: 1.59e-02, grad_scale: 8.0 2024-09-17 01:16:29,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=117000.0, ans=10.0 2024-09-17 01:16:30,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117000.0, ans=0.1 2024-09-17 01:16:34,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2024-09-17 01:16:47,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=117040.0, ans=0.0 2024-09-17 01:16:57,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=117080.0, ans=0.1 2024-09-17 01:16:59,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.44 vs. limit=15.0 2024-09-17 01:17:02,370 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:17:06,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=117080.0, ans=0.125 2024-09-17 01:17:13,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=117120.0, ans=0.1 2024-09-17 01:17:25,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.53 vs. limit=12.0 2024-09-17 01:17:32,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=117160.0, ans=0.0 2024-09-17 01:17:33,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=117160.0, ans=15.0 2024-09-17 01:17:41,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=117160.0, ans=0.125 2024-09-17 01:17:42,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.423e+01 1.035e+02 1.132e+02 1.239e+02 1.917e+02, threshold=2.264e+02, percent-clipped=0.0 2024-09-17 01:17:44,163 INFO [train.py:1198] (1/2) Epoch 7, batch 2150, loss[loss=0.281, ctc_loss=0.1943, cr_loss=0.4185, attn_decoder_loss=0.2814, over 29472.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.1971, cr_loss=0.4236, attn_decoder_loss=0.2799, over 5814052.40 frames. ], batch size: 78, lr: 1.59e-02, grad_scale: 4.0 2024-09-17 01:17:49,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=117200.0, ans=0.125 2024-09-17 01:18:33,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.57 vs. limit=22.5 2024-09-17 01:18:38,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117320.0, ans=0.1 2024-09-17 01:18:40,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=117320.0, ans=0.0 2024-09-17 01:18:41,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=117320.0, ans=0.1 2024-09-17 01:18:55,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-09-17 01:19:02,633 INFO [train.py:1198] (1/2) Epoch 7, batch 2200, loss[loss=0.2929, ctc_loss=0.2057, cr_loss=0.4157, attn_decoder_loss=0.2933, over 29642.00 frames. ], tot_loss[loss=0.28, ctc_loss=0.1972, cr_loss=0.423, attn_decoder_loss=0.2798, over 5810882.92 frames. ], batch size: 86, lr: 1.59e-02, grad_scale: 8.0 2024-09-17 01:19:04,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=117400.0, ans=0.1 2024-09-17 01:19:04,675 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:19:13,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=117400.0, ans=0.04949747468305833 2024-09-17 01:19:15,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2024-09-17 01:19:19,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.40 vs. limit=22.5 2024-09-17 01:19:32,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-09-17 01:19:38,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2024-09-17 01:19:45,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=22.5 2024-09-17 01:19:45,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2024-09-17 01:20:19,562 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.574e+01 1.056e+02 1.108e+02 1.251e+02 3.146e+02, threshold=2.216e+02, percent-clipped=2.0 2024-09-17 01:20:19,589 INFO [train.py:1198] (1/2) Epoch 7, batch 2250, loss[loss=0.2737, ctc_loss=0.1844, cr_loss=0.4167, attn_decoder_loss=0.2744, over 29716.00 frames. ], tot_loss[loss=0.2796, ctc_loss=0.1966, cr_loss=0.4228, attn_decoder_loss=0.2794, over 5811219.87 frames. ], batch size: 82, lr: 1.59e-02, grad_scale: 4.0 2024-09-17 01:20:24,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=117600.0, ans=0.0 2024-09-17 01:20:59,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2024-09-17 01:21:04,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=117720.0, ans=0.125 2024-09-17 01:21:07,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=117720.0, ans=0.0 2024-09-17 01:21:11,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=117720.0, ans=0.125 2024-09-17 01:21:37,947 INFO [train.py:1198] (1/2) Epoch 7, batch 2300, loss[loss=0.2509, ctc_loss=0.1701, cr_loss=0.3744, attn_decoder_loss=0.2516, over 29325.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1958, cr_loss=0.4212, attn_decoder_loss=0.2783, over 5799226.26 frames. ], batch size: 71, lr: 1.59e-02, grad_scale: 8.0 2024-09-17 01:21:39,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=22.5 2024-09-17 01:21:48,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2024-09-17 01:21:56,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=117840.0, ans=0.0 2024-09-17 01:22:21,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117880.0, ans=0.1 2024-09-17 01:22:39,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=117960.0, ans=0.125 2024-09-17 01:22:52,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=117960.0, ans=0.125 2024-09-17 01:22:56,188 INFO [train.py:1198] (1/2) Epoch 7, batch 2350, loss[loss=0.288, ctc_loss=0.2023, cr_loss=0.4531, attn_decoder_loss=0.2874, over 29684.00 frames. ], tot_loss[loss=0.279, ctc_loss=0.1962, cr_loss=0.4229, attn_decoder_loss=0.2788, over 5804630.54 frames. ], batch size: 83, lr: 1.59e-02, grad_scale: 4.0 2024-09-17 01:22:56,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=118000.0, ans=0.025 2024-09-17 01:22:57,669 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.765e+01 1.037e+02 1.131e+02 1.224e+02 2.356e+02, threshold=2.262e+02, percent-clipped=1.0 2024-09-17 01:22:59,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=12.0 2024-09-17 01:23:03,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=118000.0, ans=0.025 2024-09-17 01:23:13,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=118040.0, ans=0.125 2024-09-17 01:23:17,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=118040.0, ans=0.125 2024-09-17 01:23:58,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=118160.0, ans=0.0 2024-09-17 01:24:01,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=118160.0, ans=0.07 2024-09-17 01:24:04,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=118160.0, ans=0.125 2024-09-17 01:24:09,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=118160.0, ans=0.025 2024-09-17 01:24:12,090 INFO [train.py:1198] (1/2) Epoch 7, batch 2400, loss[loss=0.286, ctc_loss=0.2183, cr_loss=0.4411, attn_decoder_loss=0.2837, over 29541.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1968, cr_loss=0.4241, attn_decoder_loss=0.2795, over 5807959.52 frames. ], batch size: 76, lr: 1.58e-02, grad_scale: 8.0 2024-09-17 01:24:21,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=118200.0, ans=0.125 2024-09-17 01:25:32,289 INFO [train.py:1198] (1/2) Epoch 7, batch 2450, loss[loss=0.2919, ctc_loss=0.2098, cr_loss=0.4507, attn_decoder_loss=0.291, over 29697.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.198, cr_loss=0.4255, attn_decoder_loss=0.2808, over 5785995.13 frames. ], batch size: 82, lr: 1.58e-02, grad_scale: 4.0 2024-09-17 01:25:35,217 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.912e+01 1.061e+02 1.128e+02 1.247e+02 1.833e+02, threshold=2.256e+02, percent-clipped=0.0 2024-09-17 01:25:46,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-09-17 01:25:49,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.04 vs. limit=15.0 2024-09-17 01:26:00,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=118440.0, ans=0.1 2024-09-17 01:26:11,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=118480.0, ans=0.0 2024-09-17 01:26:17,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=118480.0, ans=0.125 2024-09-17 01:26:17,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.33 vs. limit=10.0 2024-09-17 01:26:37,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=118560.0, ans=0.2 2024-09-17 01:26:50,586 INFO [train.py:1198] (1/2) Epoch 7, batch 2500, loss[loss=0.287, ctc_loss=0.1965, cr_loss=0.4239, attn_decoder_loss=0.2876, over 29628.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1976, cr_loss=0.4248, attn_decoder_loss=0.2806, over 5796312.07 frames. ], batch size: 86, lr: 1.58e-02, grad_scale: 8.0 2024-09-17 01:27:23,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=118680.0, ans=0.125 2024-09-17 01:27:23,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.93 vs. limit=15.0 2024-09-17 01:27:25,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.04 vs. limit=15.0 2024-09-17 01:28:07,523 INFO [train.py:1198] (1/2) Epoch 7, batch 2550, loss[loss=0.2615, ctc_loss=0.1859, cr_loss=0.4151, attn_decoder_loss=0.2607, over 29362.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1975, cr_loss=0.4249, attn_decoder_loss=0.2805, over 5798979.43 frames. ], batch size: 67, lr: 1.58e-02, grad_scale: 4.0 2024-09-17 01:28:11,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.793e+01 1.007e+02 1.102e+02 1.293e+02 3.039e+02, threshold=2.204e+02, percent-clipped=2.0 2024-09-17 01:28:19,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118800.0, ans=0.1 2024-09-17 01:28:35,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=118840.0, ans=0.0 2024-09-17 01:28:50,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=118880.0, ans=0.0 2024-09-17 01:29:07,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=118920.0, ans=0.2 2024-09-17 01:29:14,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=118960.0, ans=0.2 2024-09-17 01:29:25,800 INFO [train.py:1198] (1/2) Epoch 7, batch 2600, loss[loss=0.2542, ctc_loss=0.1685, cr_loss=0.3748, attn_decoder_loss=0.2554, over 29462.00 frames. ], tot_loss[loss=0.2809, ctc_loss=0.1973, cr_loss=0.4243, attn_decoder_loss=0.2807, over 5794033.73 frames. ], batch size: 78, lr: 1.58e-02, grad_scale: 8.0 2024-09-17 01:29:29,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=119000.0, ans=22.5 2024-09-17 01:30:06,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=119080.0, ans=0.125 2024-09-17 01:30:08,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=119080.0, ans=0.0 2024-09-17 01:30:13,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=119120.0, ans=0.2 2024-09-17 01:30:13,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.24 vs. limit=15.0 2024-09-17 01:30:20,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119120.0, ans=0.1 2024-09-17 01:30:25,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=119120.0, ans=0.0 2024-09-17 01:30:32,716 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:30:43,406 INFO [train.py:1198] (1/2) Epoch 7, batch 2650, loss[loss=0.3077, ctc_loss=0.2214, cr_loss=0.4628, attn_decoder_loss=0.307, over 29266.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.1979, cr_loss=0.4251, attn_decoder_loss=0.2814, over 5800069.77 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 4.0 2024-09-17 01:30:43,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=119200.0, ans=0.015 2024-09-17 01:30:49,514 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.862e+01 1.038e+02 1.128e+02 1.278e+02 2.890e+02, threshold=2.256e+02, percent-clipped=2.0 2024-09-17 01:31:24,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=119280.0, ans=0.125 2024-09-17 01:31:28,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.32 vs. limit=12.0 2024-09-17 01:31:29,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119320.0, ans=0.1 2024-09-17 01:31:30,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=119320.0, ans=0.125 2024-09-17 01:31:32,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=119320.0, ans=0.125 2024-09-17 01:31:41,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=119320.0, ans=0.125 2024-09-17 01:31:52,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=119360.0, ans=0.025 2024-09-17 01:31:59,334 INFO [train.py:1198] (1/2) Epoch 7, batch 2700, loss[loss=0.2899, ctc_loss=0.1933, cr_loss=0.4356, attn_decoder_loss=0.291, over 29529.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.1977, cr_loss=0.425, attn_decoder_loss=0.2813, over 5795770.09 frames. ], batch size: 87, lr: 1.58e-02, grad_scale: 8.0 2024-09-17 01:32:19,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119440.0, ans=0.1 2024-09-17 01:32:43,921 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:32:56,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=12.0 2024-09-17 01:33:03,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=119560.0, ans=0.125 2024-09-17 01:33:05,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-09-17 01:33:18,443 INFO [train.py:1198] (1/2) Epoch 7, batch 2750, loss[loss=0.2549, ctc_loss=0.1689, cr_loss=0.395, attn_decoder_loss=0.2557, over 29527.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.1965, cr_loss=0.423, attn_decoder_loss=0.28, over 5794830.70 frames. ], batch size: 75, lr: 1.58e-02, grad_scale: 4.0 2024-09-17 01:33:22,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2024-09-17 01:33:26,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.208e+01 9.926e+01 1.072e+02 1.182e+02 2.176e+02, threshold=2.145e+02, percent-clipped=0.0 2024-09-17 01:33:34,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-09-17 01:33:58,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=119680.0, ans=0.0 2024-09-17 01:34:17,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=119720.0, ans=0.1 2024-09-17 01:34:37,085 INFO [train.py:1198] (1/2) Epoch 7, batch 2800, loss[loss=0.3237, ctc_loss=0.2752, cr_loss=0.4636, attn_decoder_loss=0.3188, over 20432.00 frames. ], tot_loss[loss=0.2802, ctc_loss=0.1965, cr_loss=0.4226, attn_decoder_loss=0.2801, over 5776448.88 frames. ], batch size: 210, lr: 1.57e-02, grad_scale: 8.0 2024-09-17 01:34:40,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119800.0, ans=0.1 2024-09-17 01:34:41,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=119800.0, ans=0.125 2024-09-17 01:34:48,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=119800.0, ans=0.125 2024-09-17 01:34:52,775 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:34:54,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=119840.0, ans=0.0 2024-09-17 01:35:13,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119880.0, ans=0.1 2024-09-17 01:35:24,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=119920.0, ans=0.125 2024-09-17 01:35:26,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=119920.0, ans=0.0 2024-09-17 01:35:28,001 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:35:29,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=119920.0, ans=0.125 2024-09-17 01:35:31,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=119920.0, ans=0.0 2024-09-17 01:35:31,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.72 vs. limit=6.0 2024-09-17 01:35:42,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2024-09-17 01:35:50,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=119960.0, ans=0.0 2024-09-17 01:35:53,908 INFO [train.py:1198] (1/2) Epoch 7, batch 2850, loss[loss=0.2748, ctc_loss=0.1916, cr_loss=0.4347, attn_decoder_loss=0.2744, over 29517.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.1975, cr_loss=0.4239, attn_decoder_loss=0.2808, over 5760844.14 frames. ], batch size: 77, lr: 1.57e-02, grad_scale: 4.0 2024-09-17 01:36:03,115 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.909e+01 1.092e+02 1.177e+02 1.435e+02 2.490e+02, threshold=2.355e+02, percent-clipped=3.0 2024-09-17 01:36:06,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=120000.0, ans=0.125 2024-09-17 01:36:08,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=120040.0, ans=0.95 2024-09-17 01:36:14,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=120040.0, ans=0.0 2024-09-17 01:36:24,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=120080.0, ans=0.1 2024-09-17 01:36:47,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=120120.0, ans=0.0 2024-09-17 01:36:48,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=120120.0, ans=0.0 2024-09-17 01:37:01,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=120160.0, ans=0.07 2024-09-17 01:37:02,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=120160.0, ans=0.07 2024-09-17 01:37:04,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=120160.0, ans=0.0 2024-09-17 01:37:13,106 INFO [train.py:1198] (1/2) Epoch 7, batch 2900, loss[loss=0.2757, ctc_loss=0.1903, cr_loss=0.4219, attn_decoder_loss=0.2758, over 29797.00 frames. ], tot_loss[loss=0.2821, ctc_loss=0.1979, cr_loss=0.4256, attn_decoder_loss=0.282, over 5787056.21 frames. ], batch size: 80, lr: 1.57e-02, grad_scale: 8.0 2024-09-17 01:37:32,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.19 vs. limit=15.0 2024-09-17 01:37:43,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=10.66 vs. limit=12.0 2024-09-17 01:38:07,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=120320.0, ans=0.0 2024-09-17 01:38:09,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=120320.0, ans=0.1 2024-09-17 01:38:15,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=120360.0, ans=0.2 2024-09-17 01:38:31,299 INFO [train.py:1198] (1/2) Epoch 7, batch 2950, loss[loss=0.2756, ctc_loss=0.1963, cr_loss=0.4423, attn_decoder_loss=0.2745, over 29535.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1964, cr_loss=0.4237, attn_decoder_loss=0.2804, over 5782191.44 frames. ], batch size: 75, lr: 1.57e-02, grad_scale: 4.0 2024-09-17 01:38:41,940 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.289e+01 1.032e+02 1.125e+02 1.263e+02 2.681e+02, threshold=2.250e+02, percent-clipped=2.0 2024-09-17 01:38:55,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=120440.0, ans=0.125 2024-09-17 01:39:36,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=12.0 2024-09-17 01:39:46,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=120600.0, ans=0.09899494936611666 2024-09-17 01:39:47,833 INFO [train.py:1198] (1/2) Epoch 7, batch 3000, loss[loss=0.2824, ctc_loss=0.1896, cr_loss=0.4285, attn_decoder_loss=0.2832, over 29765.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1965, cr_loss=0.4236, attn_decoder_loss=0.2804, over 5782805.36 frames. ], batch size: 81, lr: 1.57e-02, grad_scale: 8.0 2024-09-17 01:39:47,834 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 01:40:06,238 INFO [train.py:1230] (1/2) Epoch 7, validation: loss=0.2168, ctc_loss=0.05873, cr_loss=4.524e-15, attn_decoder_loss=0.2344, over 944034.00 frames. 2024-09-17 01:40:06,238 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 01:40:15,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=12.0 2024-09-17 01:40:30,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=120640.0, ans=0.2 2024-09-17 01:40:50,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=120680.0, ans=0.0 2024-09-17 01:40:55,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=120720.0, ans=0.1 2024-09-17 01:40:58,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=120720.0, ans=0.0 2024-09-17 01:41:01,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2024-09-17 01:41:02,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=120720.0, ans=0.0 2024-09-17 01:41:10,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=120760.0, ans=10.0 2024-09-17 01:41:12,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=120760.0, ans=0.0 2024-09-17 01:41:19,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=120760.0, ans=0.125 2024-09-17 01:41:25,956 INFO [train.py:1198] (1/2) Epoch 7, batch 3050, loss[loss=0.259, ctc_loss=0.1713, cr_loss=0.4013, attn_decoder_loss=0.2599, over 29542.00 frames. ], tot_loss[loss=0.2814, ctc_loss=0.1973, cr_loss=0.4246, attn_decoder_loss=0.2813, over 5776612.60 frames. ], batch size: 76, lr: 1.57e-02, grad_scale: 4.0 2024-09-17 01:41:37,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=120800.0, ans=0.0 2024-09-17 01:41:40,279 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.612e+01 1.070e+02 1.194e+02 1.343e+02 6.918e+02, threshold=2.387e+02, percent-clipped=4.0 2024-09-17 01:41:57,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=120880.0, ans=0.125 2024-09-17 01:42:06,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=120880.0, ans=0.125 2024-09-17 01:42:12,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=120920.0, ans=0.125 2024-09-17 01:42:19,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-09-17 01:42:44,144 INFO [train.py:1198] (1/2) Epoch 7, batch 3100, loss[loss=0.2976, ctc_loss=0.2125, cr_loss=0.4645, attn_decoder_loss=0.2968, over 29285.00 frames. ], tot_loss[loss=0.2803, ctc_loss=0.1961, cr_loss=0.4231, attn_decoder_loss=0.2803, over 5776339.91 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 8.0 2024-09-17 01:42:44,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=121000.0, ans=0.2 2024-09-17 01:42:48,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=121000.0, ans=0.0 2024-09-17 01:43:04,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=121040.0, ans=0.2 2024-09-17 01:43:25,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121080.0, ans=0.0 2024-09-17 01:43:29,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=121120.0, ans=0.125 2024-09-17 01:43:35,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=22.5 2024-09-17 01:43:55,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=121160.0, ans=0.125 2024-09-17 01:44:00,616 INFO [train.py:1198] (1/2) Epoch 7, batch 3150, loss[loss=0.298, ctc_loss=0.2133, cr_loss=0.4419, attn_decoder_loss=0.2976, over 28891.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.1955, cr_loss=0.4232, attn_decoder_loss=0.2801, over 5783172.68 frames. ], batch size: 104, lr: 1.57e-02, grad_scale: 4.0 2024-09-17 01:44:07,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.36 vs. limit=15.0 2024-09-17 01:44:08,527 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:44:14,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.541e+01 1.004e+02 1.097e+02 1.266e+02 2.300e+02, threshold=2.194e+02, percent-clipped=0.0 2024-09-17 01:44:18,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=121240.0, ans=0.04949747468305833 2024-09-17 01:44:46,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=121280.0, ans=0.0 2024-09-17 01:44:59,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=121320.0, ans=0.0 2024-09-17 01:45:10,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2024-09-17 01:45:16,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=121360.0, ans=0.0 2024-09-17 01:45:19,455 INFO [train.py:1198] (1/2) Epoch 7, batch 3200, loss[loss=0.2713, ctc_loss=0.1843, cr_loss=0.4172, attn_decoder_loss=0.2716, over 29415.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.1942, cr_loss=0.4219, attn_decoder_loss=0.279, over 5794116.42 frames. ], batch size: 79, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:45:25,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=121400.0, ans=0.0 2024-09-17 01:45:52,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=121480.0, ans=0.125 2024-09-17 01:46:03,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=121480.0, ans=0.125 2024-09-17 01:46:24,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=121560.0, ans=15.0 2024-09-17 01:46:27,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=121560.0, ans=0.125 2024-09-17 01:46:34,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=12.0 2024-09-17 01:46:38,569 INFO [train.py:1198] (1/2) Epoch 7, batch 3250, loss[loss=0.2958, ctc_loss=0.2161, cr_loss=0.4341, attn_decoder_loss=0.295, over 29696.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1949, cr_loss=0.423, attn_decoder_loss=0.2798, over 5801117.94 frames. ], batch size: 84, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:46:51,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=121600.0, ans=0.025 2024-09-17 01:46:53,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.273e+01 1.020e+02 1.119e+02 1.210e+02 1.676e+02, threshold=2.238e+02, percent-clipped=0.0 2024-09-17 01:46:57,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=121640.0, ans=0.125 2024-09-17 01:47:12,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=121680.0, ans=0.0 2024-09-17 01:47:29,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=121720.0, ans=0.125 2024-09-17 01:47:35,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2024-09-17 01:47:45,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=121760.0, ans=0.125 2024-09-17 01:47:54,988 INFO [train.py:1198] (1/2) Epoch 7, batch 3300, loss[loss=0.2806, ctc_loss=0.1849, cr_loss=0.3945, attn_decoder_loss=0.2825, over 28201.00 frames. ], tot_loss[loss=0.2783, ctc_loss=0.1939, cr_loss=0.4208, attn_decoder_loss=0.2783, over 5799271.27 frames. ], batch size: 111, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:48:28,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=121880.0, ans=0.0 2024-09-17 01:48:30,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=121880.0, ans=0.125 2024-09-17 01:49:03,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=121960.0, ans=0.2 2024-09-17 01:49:05,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=121960.0, ans=0.125 2024-09-17 01:49:09,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=121960.0, ans=0.125 2024-09-17 01:49:13,993 INFO [train.py:1198] (1/2) Epoch 7, batch 3350, loss[loss=0.3008, ctc_loss=0.2145, cr_loss=0.4248, attn_decoder_loss=0.3009, over 28948.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1953, cr_loss=0.4217, attn_decoder_loss=0.2794, over 5775717.70 frames. ], batch size: 104, lr: 1.56e-02, grad_scale: 4.0 2024-09-17 01:49:14,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=122000.0, ans=0.125 2024-09-17 01:49:15,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=122000.0, ans=0.07 2024-09-17 01:49:32,809 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.051e+01 1.075e+02 1.159e+02 1.381e+02 2.720e+02, threshold=2.319e+02, percent-clipped=3.0 2024-09-17 01:49:49,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=122080.0, ans=0.125 2024-09-17 01:50:32,397 INFO [train.py:1198] (1/2) Epoch 7, batch 3400, loss[loss=0.2401, ctc_loss=0.1627, cr_loss=0.3776, attn_decoder_loss=0.2404, over 29365.00 frames. ], tot_loss[loss=0.2791, ctc_loss=0.1953, cr_loss=0.4211, attn_decoder_loss=0.2791, over 5767258.68 frames. ], batch size: 67, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:50:32,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=122200.0, ans=0.125 2024-09-17 01:50:41,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=122200.0, ans=0.125 2024-09-17 01:50:58,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=122240.0, ans=0.2 2024-09-17 01:51:15,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=122280.0, ans=0.125 2024-09-17 01:51:26,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2024-09-17 01:51:28,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.08 vs. limit=15.0 2024-09-17 01:51:32,008 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:51:40,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=122360.0, ans=0.0 2024-09-17 01:51:44,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=122360.0, ans=0.0 2024-09-17 01:51:48,684 INFO [train.py:1198] (1/2) Epoch 7, batch 3450, loss[loss=0.298, ctc_loss=0.2181, cr_loss=0.4223, attn_decoder_loss=0.2975, over 28068.00 frames. ], tot_loss[loss=0.2794, ctc_loss=0.1952, cr_loss=0.4212, attn_decoder_loss=0.2793, over 5776271.10 frames. ], batch size: 111, lr: 1.56e-02, grad_scale: 4.0 2024-09-17 01:52:09,037 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.695e+01 1.040e+02 1.098e+02 1.235e+02 2.393e+02, threshold=2.195e+02, percent-clipped=1.0 2024-09-17 01:52:12,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=122440.0, ans=0.125 2024-09-17 01:52:23,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122480.0, ans=0.1 2024-09-17 01:52:40,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=122520.0, ans=0.125 2024-09-17 01:52:46,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=122520.0, ans=0.125 2024-09-17 01:52:52,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=122560.0, ans=0.2 2024-09-17 01:52:53,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2024-09-17 01:52:59,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=122560.0, ans=0.1 2024-09-17 01:53:07,009 INFO [train.py:1198] (1/2) Epoch 7, batch 3500, loss[loss=0.2537, ctc_loss=0.1796, cr_loss=0.396, attn_decoder_loss=0.2531, over 29342.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.195, cr_loss=0.4217, attn_decoder_loss=0.2788, over 5778119.18 frames. ], batch size: 71, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:53:22,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=122640.0, ans=0.125 2024-09-17 01:53:29,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2024-09-17 01:53:42,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=122680.0, ans=0.125 2024-09-17 01:53:47,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=122680.0, ans=0.125 2024-09-17 01:54:03,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=122720.0, ans=0.125 2024-09-17 01:54:24,529 INFO [train.py:1198] (1/2) Epoch 7, batch 3550, loss[loss=0.298, ctc_loss=0.205, cr_loss=0.4414, attn_decoder_loss=0.2986, over 29705.00 frames. ], tot_loss[loss=0.2787, ctc_loss=0.1947, cr_loss=0.4218, attn_decoder_loss=0.2787, over 5784488.50 frames. ], batch size: 89, lr: 1.56e-02, grad_scale: 4.0 2024-09-17 01:54:41,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=122840.0, ans=0.0 2024-09-17 01:54:43,848 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.552e+01 9.824e+01 1.101e+02 1.214e+02 1.774e+02, threshold=2.203e+02, percent-clipped=0.0 2024-09-17 01:55:02,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=122880.0, ans=0.04949747468305833 2024-09-17 01:55:17,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=122920.0, ans=0.09899494936611666 2024-09-17 01:55:39,617 INFO [train.py:1198] (1/2) Epoch 7, batch 3600, loss[loss=0.2651, ctc_loss=0.1795, cr_loss=0.4098, attn_decoder_loss=0.2655, over 29495.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.1946, cr_loss=0.422, attn_decoder_loss=0.2789, over 5792699.34 frames. ], batch size: 77, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 01:55:41,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=123000.0, ans=0.0 2024-09-17 01:55:43,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.57 vs. limit=15.0 2024-09-17 01:56:00,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.95 vs. limit=15.0 2024-09-17 01:56:08,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=123080.0, ans=0.2 2024-09-17 01:56:23,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=123120.0, ans=0.125 2024-09-17 01:56:32,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=123120.0, ans=0.1 2024-09-17 01:56:34,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=123120.0, ans=0.025 2024-09-17 01:56:37,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=123120.0, ans=0.05 2024-09-17 01:56:37,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.49 vs. limit=15.0 2024-09-17 01:56:55,560 INFO [train.py:1198] (1/2) Epoch 7, batch 3650, loss[loss=0.3025, ctc_loss=0.2099, cr_loss=0.4717, attn_decoder_loss=0.3023, over 29482.00 frames. ], tot_loss[loss=0.2782, ctc_loss=0.1937, cr_loss=0.4209, attn_decoder_loss=0.2782, over 5794196.46 frames. ], batch size: 90, lr: 1.55e-02, grad_scale: 4.0 2024-09-17 01:57:05,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=123200.0, ans=0.125 2024-09-17 01:57:08,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.95 vs. limit=22.5 2024-09-17 01:57:12,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123240.0, ans=0.1 2024-09-17 01:57:16,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.280e+01 1.041e+02 1.137e+02 1.251e+02 2.329e+02, threshold=2.273e+02, percent-clipped=0.0 2024-09-17 01:57:32,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=123280.0, ans=0.2 2024-09-17 01:57:41,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=123320.0, ans=0.04949747468305833 2024-09-17 01:57:55,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=123320.0, ans=0.125 2024-09-17 01:58:13,150 INFO [train.py:1198] (1/2) Epoch 7, batch 3700, loss[loss=0.291, ctc_loss=0.2027, cr_loss=0.4275, attn_decoder_loss=0.2913, over 29715.00 frames. ], tot_loss[loss=0.2786, ctc_loss=0.1938, cr_loss=0.4214, attn_decoder_loss=0.2787, over 5803300.43 frames. ], batch size: 84, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 01:59:05,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=123520.0, ans=0.0 2024-09-17 01:59:09,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-09-17 01:59:22,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2024-09-17 01:59:28,061 INFO [train.py:1198] (1/2) Epoch 7, batch 3750, loss[loss=0.2508, ctc_loss=0.1757, cr_loss=0.3727, attn_decoder_loss=0.2509, over 29361.00 frames. ], tot_loss[loss=0.2781, ctc_loss=0.1932, cr_loss=0.4201, attn_decoder_loss=0.2782, over 5806834.34 frames. ], batch size: 67, lr: 1.55e-02, grad_scale: 4.0 2024-09-17 01:59:34,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.81 vs. limit=15.0 2024-09-17 01:59:48,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=123640.0, ans=0.125 2024-09-17 01:59:50,730 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.690e+01 1.049e+02 1.152e+02 1.342e+02 3.942e+02, threshold=2.304e+02, percent-clipped=2.0 2024-09-17 01:59:57,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=123680.0, ans=0.125 2024-09-17 02:00:01,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.59 vs. limit=15.0 2024-09-17 02:00:13,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=123720.0, ans=0.125 2024-09-17 02:00:24,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=123720.0, ans=0.2 2024-09-17 02:00:35,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=123760.0, ans=0.125 2024-09-17 02:00:42,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=123760.0, ans=0.0 2024-09-17 02:00:43,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=123800.0, ans=0.0 2024-09-17 02:00:45,085 INFO [train.py:1198] (1/2) Epoch 7, batch 3800, loss[loss=0.2882, ctc_loss=0.195, cr_loss=0.4342, attn_decoder_loss=0.2889, over 29608.00 frames. ], tot_loss[loss=0.2779, ctc_loss=0.1932, cr_loss=0.4199, attn_decoder_loss=0.278, over 5797053.83 frames. ], batch size: 86, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 02:01:34,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2024-09-17 02:01:43,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=123960.0, ans=0.09899494936611666 2024-09-17 02:01:51,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123960.0, ans=0.1 2024-09-17 02:02:00,447 INFO [train.py:1198] (1/2) Epoch 7, batch 3850, loss[loss=0.2891, ctc_loss=0.1991, cr_loss=0.4342, attn_decoder_loss=0.2895, over 29273.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.1928, cr_loss=0.4198, attn_decoder_loss=0.2776, over 5810983.84 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 4.0 2024-09-17 02:02:20,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=124040.0, ans=0.0 2024-09-17 02:02:24,328 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.352e+01 1.024e+02 1.121e+02 1.176e+02 2.647e+02, threshold=2.243e+02, percent-clipped=2.0 2024-09-17 02:02:35,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=124080.0, ans=0.0 2024-09-17 02:02:38,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=124080.0, ans=0.2 2024-09-17 02:02:44,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=124120.0, ans=0.125 2024-09-17 02:02:47,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=124120.0, ans=0.025 2024-09-17 02:03:15,228 INFO [train.py:1198] (1/2) Epoch 7, batch 3900, loss[loss=0.2953, ctc_loss=0.2118, cr_loss=0.4647, attn_decoder_loss=0.2942, over 29639.00 frames. ], tot_loss[loss=0.278, ctc_loss=0.1932, cr_loss=0.4205, attn_decoder_loss=0.2781, over 5815347.47 frames. ], batch size: 86, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 02:03:21,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124200.0, ans=0.125 2024-09-17 02:03:36,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=124240.0, ans=0.0 2024-09-17 02:03:38,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=124240.0, ans=0.125 2024-09-17 02:03:55,468 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-09-17 02:04:03,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=124320.0, ans=0.015 2024-09-17 02:04:04,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=124320.0, ans=0.2 2024-09-17 02:04:26,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=124360.0, ans=0.125 2024-09-17 02:04:31,859 INFO [train.py:1198] (1/2) Epoch 7, batch 3950, loss[loss=0.2895, ctc_loss=0.1999, cr_loss=0.4183, attn_decoder_loss=0.2902, over 29493.00 frames. ], tot_loss[loss=0.2779, ctc_loss=0.1927, cr_loss=0.4207, attn_decoder_loss=0.278, over 5835040.91 frames. ], batch size: 97, lr: 1.55e-02, grad_scale: 4.0 2024-09-17 02:04:57,235 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.925e+01 1.017e+02 1.080e+02 1.236e+02 3.410e+02, threshold=2.160e+02, percent-clipped=1.0 2024-09-17 02:05:12,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=124480.0, ans=0.0 2024-09-17 02:05:24,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.16 vs. limit=10.0 2024-09-17 02:05:47,552 INFO [train.py:1198] (1/2) Epoch 7, batch 4000, loss[loss=0.2526, ctc_loss=0.1678, cr_loss=0.3743, attn_decoder_loss=0.2537, over 29527.00 frames. ], tot_loss[loss=0.2784, ctc_loss=0.1934, cr_loss=0.4211, attn_decoder_loss=0.2784, over 5812745.59 frames. ], batch size: 74, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 02:05:59,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=124600.0, ans=0.0 2024-09-17 02:06:11,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=124640.0, ans=0.0 2024-09-17 02:06:22,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=124680.0, ans=0.025 2024-09-17 02:06:48,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=124760.0, ans=0.0 2024-09-17 02:06:50,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=124760.0, ans=0.1 2024-09-17 02:06:52,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=124760.0, ans=0.025 2024-09-17 02:07:02,873 INFO [train.py:1198] (1/2) Epoch 7, batch 4050, loss[loss=0.3212, ctc_loss=0.2601, cr_loss=0.4519, attn_decoder_loss=0.318, over 20625.00 frames. ], tot_loss[loss=0.2783, ctc_loss=0.1935, cr_loss=0.4211, attn_decoder_loss=0.2784, over 5796673.21 frames. ], batch size: 209, lr: 1.54e-02, grad_scale: 4.0 2024-09-17 02:07:09,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=124800.0, ans=0.2 2024-09-17 02:07:26,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=124840.0, ans=0.1 2024-09-17 02:07:29,486 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.830e+01 1.037e+02 1.133e+02 1.279e+02 3.685e+02, threshold=2.266e+02, percent-clipped=2.0 2024-09-17 02:08:01,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=24.10 vs. limit=15.0 2024-09-17 02:08:06,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=124960.0, ans=0.05 2024-09-17 02:08:12,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=124960.0, ans=0.125 2024-09-17 02:08:14,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124960.0, ans=0.1 2024-09-17 02:08:18,290 INFO [train.py:1198] (1/2) Epoch 7, batch 4100, loss[loss=0.2956, ctc_loss=0.2058, cr_loss=0.4392, attn_decoder_loss=0.2958, over 29495.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1937, cr_loss=0.4212, attn_decoder_loss=0.2786, over 5791813.32 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 8.0 2024-09-17 02:08:24,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=125000.0, ans=0.1 2024-09-17 02:08:58,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=125080.0, ans=0.0 2024-09-17 02:09:01,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=125120.0, ans=0.125 2024-09-17 02:09:04,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=125120.0, ans=0.0 2024-09-17 02:09:13,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125120.0, ans=0.1 2024-09-17 02:09:17,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=125160.0, ans=0.0 2024-09-17 02:09:32,371 INFO [train.py:1198] (1/2) Epoch 7, batch 4150, loss[loss=0.2718, ctc_loss=0.1871, cr_loss=0.4043, attn_decoder_loss=0.2722, over 29520.00 frames. ], tot_loss[loss=0.2783, ctc_loss=0.1937, cr_loss=0.4218, attn_decoder_loss=0.2783, over 5797646.99 frames. ], batch size: 77, lr: 1.54e-02, grad_scale: 4.0 2024-09-17 02:10:01,822 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.306e+01 1.018e+02 1.090e+02 1.211e+02 2.746e+02, threshold=2.181e+02, percent-clipped=3.0 2024-09-17 02:10:03,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=125280.0, ans=0.125 2024-09-17 02:10:18,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=125320.0, ans=0.125 2024-09-17 02:10:44,893 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:10:47,450 INFO [train.py:1198] (1/2) Epoch 7, batch 4200, loss[loss=0.2839, ctc_loss=0.1904, cr_loss=0.4338, attn_decoder_loss=0.2847, over 29506.00 frames. ], tot_loss[loss=0.2787, ctc_loss=0.194, cr_loss=0.4219, attn_decoder_loss=0.2787, over 5800242.21 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 8.0 2024-09-17 02:10:56,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-17 02:10:59,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=125400.0, ans=0.125 2024-09-17 02:11:24,922 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:12:02,718 INFO [train.py:1198] (1/2) Epoch 7, batch 4250, loss[loss=0.2639, ctc_loss=0.1808, cr_loss=0.4166, attn_decoder_loss=0.2639, over 29500.00 frames. ], tot_loss[loss=0.2788, ctc_loss=0.1937, cr_loss=0.4216, attn_decoder_loss=0.2788, over 5805624.18 frames. ], batch size: 74, lr: 1.54e-02, grad_scale: 4.0 2024-09-17 02:12:02,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=125600.0, ans=0.125 2024-09-17 02:12:05,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.91 vs. limit=15.0 2024-09-17 02:12:05,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2024-09-17 02:12:06,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=125600.0, ans=0.125 2024-09-17 02:12:24,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=125640.0, ans=0.125 2024-09-17 02:12:31,954 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.603e+01 1.048e+02 1.150e+02 1.288e+02 2.522e+02, threshold=2.299e+02, percent-clipped=2.0 2024-09-17 02:12:32,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=125680.0, ans=0.125 2024-09-17 02:12:45,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=125720.0, ans=0.0 2024-09-17 02:12:51,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=125720.0, ans=0.0 2024-09-17 02:12:58,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.96 vs. limit=15.0 2024-09-17 02:12:59,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=125720.0, ans=0.0 2024-09-17 02:13:16,850 INFO [train.py:1198] (1/2) Epoch 7, batch 4300, loss[loss=0.2898, ctc_loss=0.1953, cr_loss=0.4354, attn_decoder_loss=0.2906, over 29528.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1947, cr_loss=0.4226, attn_decoder_loss=0.2796, over 5796600.83 frames. ], batch size: 87, lr: 1.54e-02, grad_scale: 8.0 2024-09-17 02:13:29,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=125800.0, ans=0.0 2024-09-17 02:13:58,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=125880.0, ans=0.05 2024-09-17 02:14:05,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=12.0 2024-09-17 02:14:32,529 INFO [train.py:1198] (1/2) Epoch 7, batch 4350, loss[loss=0.3048, ctc_loss=0.2215, cr_loss=0.4672, attn_decoder_loss=0.3037, over 29521.00 frames. ], tot_loss[loss=0.2832, ctc_loss=0.1982, cr_loss=0.4275, attn_decoder_loss=0.2832, over 5798208.23 frames. ], batch size: 97, lr: 1.54e-02, grad_scale: 4.0 2024-09-17 02:14:40,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=126000.0, ans=0.0 2024-09-17 02:14:50,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126040.0, ans=0.1 2024-09-17 02:15:04,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.068e+01 1.042e+02 1.125e+02 1.257e+02 6.277e+02, threshold=2.251e+02, percent-clipped=2.0 2024-09-17 02:15:16,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=126120.0, ans=0.025 2024-09-17 02:15:16,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=126120.0, ans=0.125 2024-09-17 02:15:25,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=126120.0, ans=0.125 2024-09-17 02:15:35,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=126160.0, ans=0.125 2024-09-17 02:15:47,179 INFO [train.py:1198] (1/2) Epoch 7, batch 4400, loss[loss=0.303, ctc_loss=0.2237, cr_loss=0.4585, attn_decoder_loss=0.3016, over 27481.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.2011, cr_loss=0.4311, attn_decoder_loss=0.2861, over 5768137.65 frames. ], batch size: 124, lr: 1.54e-02, grad_scale: 8.0 2024-09-17 02:15:48,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=126200.0, ans=0.125 2024-09-17 02:15:48,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=126200.0, ans=0.125 2024-09-17 02:15:51,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=126200.0, ans=0.125 2024-09-17 02:15:53,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2024-09-17 02:15:56,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=126200.0, ans=0.125 2024-09-17 02:16:00,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=126240.0, ans=0.1 2024-09-17 02:16:02,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=126240.0, ans=0.025 2024-09-17 02:16:07,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2024-09-17 02:16:32,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=126320.0, ans=0.0 2024-09-17 02:16:32,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=126320.0, ans=0.025 2024-09-17 02:16:43,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126320.0, ans=0.1 2024-09-17 02:17:03,040 INFO [train.py:1198] (1/2) Epoch 7, batch 4450, loss[loss=0.3133, ctc_loss=0.2594, cr_loss=0.4594, attn_decoder_loss=0.3091, over 20571.00 frames. ], tot_loss[loss=0.2897, ctc_loss=0.2071, cr_loss=0.4346, attn_decoder_loss=0.2892, over 5575737.76 frames. ], batch size: 209, lr: 1.53e-02, grad_scale: 4.0 2024-09-17 02:17:05,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=16.81 vs. limit=15.0 2024-09-17 02:17:11,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-09-17 02:17:36,227 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.504e+01 1.059e+02 1.182e+02 1.268e+02 2.368e+02, threshold=2.364e+02, percent-clipped=1.0 2024-09-17 02:17:38,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=126480.0, ans=0.2 2024-09-17 02:17:45,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=126480.0, ans=0.125 2024-09-17 02:18:13,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=126560.0, ans=0.05 2024-09-17 02:18:19,613 INFO [train.py:1198] (1/2) Epoch 7, batch 4500, loss[loss=0.3257, ctc_loss=0.2757, cr_loss=0.4568, attn_decoder_loss=0.3211, over 18836.00 frames. ], tot_loss[loss=0.2936, ctc_loss=0.2146, cr_loss=0.436, attn_decoder_loss=0.2927, over 5236827.78 frames. ], batch size: 210, lr: 1.53e-02, grad_scale: 8.0 2024-09-17 02:18:36,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=126640.0, ans=0.125 2024-09-17 02:18:36,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=126640.0, ans=0.0 2024-09-17 02:18:44,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-09-17 02:18:54,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=126680.0, ans=0.125 2024-09-17 02:19:46,525 INFO [train.py:1198] (1/2) Epoch 8, batch 0, loss[loss=0.2629, ctc_loss=0.1659, cr_loss=0.3745, attn_decoder_loss=0.2654, over 29586.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1659, cr_loss=0.3745, attn_decoder_loss=0.2654, over 29586.00 frames. ], batch size: 73, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:19:46,526 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 02:20:02,770 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.9871, 5.8845, 5.4621, 5.7261], device='cuda:1') 2024-09-17 02:20:03,679 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7868, 4.2036, 4.6298, 4.6419], device='cuda:1') 2024-09-17 02:20:04,921 INFO [train.py:1230] (1/2) Epoch 8, validation: loss=0.2208, ctc_loss=0.05894, cr_loss=4.762e-15, attn_decoder_loss=0.2387, over 944034.00 frames. 2024-09-17 02:20:04,922 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 02:20:21,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=126740.0, ans=0.2 2024-09-17 02:20:43,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=126780.0, ans=0.125 2024-09-17 02:20:44,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=126780.0, ans=0.1 2024-09-17 02:20:50,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=126820.0, ans=0.0 2024-09-17 02:20:54,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.35 vs. limit=22.5 2024-09-17 02:20:58,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=126820.0, ans=0.0 2024-09-17 02:21:19,343 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.753e+01 1.155e+02 1.254e+02 1.387e+02 1.225e+03, threshold=2.508e+02, percent-clipped=2.0 2024-09-17 02:21:20,940 INFO [train.py:1198] (1/2) Epoch 8, batch 50, loss[loss=0.2505, ctc_loss=0.1726, cr_loss=0.3855, attn_decoder_loss=0.2505, over 29449.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.1986, cr_loss=0.4228, attn_decoder_loss=0.2808, over 1267887.86 frames. ], batch size: 70, lr: 1.44e-02, grad_scale: 4.0 2024-09-17 02:21:28,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=126900.0, ans=0.125 2024-09-17 02:21:50,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=126980.0, ans=0.2 2024-09-17 02:22:15,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-09-17 02:22:41,946 INFO [train.py:1198] (1/2) Epoch 8, batch 100, loss[loss=0.2688, ctc_loss=0.1882, cr_loss=0.3716, attn_decoder_loss=0.2694, over 29534.00 frames. ], tot_loss[loss=0.2827, ctc_loss=0.199, cr_loss=0.4248, attn_decoder_loss=0.2826, over 2252297.10 frames. ], batch size: 76, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:22:57,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=127140.0, ans=0.0 2024-09-17 02:22:57,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=127140.0, ans=0.1 2024-09-17 02:22:57,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=127140.0, ans=0.125 2024-09-17 02:23:14,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.64 vs. limit=12.0 2024-09-17 02:23:18,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=127180.0, ans=0.2 2024-09-17 02:23:23,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=15.0 2024-09-17 02:23:39,489 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:23:48,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=127260.0, ans=0.09899494936611666 2024-09-17 02:23:51,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=127260.0, ans=0.125 2024-09-17 02:23:56,929 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.517e+01 1.081e+02 1.202e+02 1.454e+02 2.807e+02, threshold=2.403e+02, percent-clipped=1.0 2024-09-17 02:23:56,953 INFO [train.py:1198] (1/2) Epoch 8, batch 150, loss[loss=0.2532, ctc_loss=0.1759, cr_loss=0.4023, attn_decoder_loss=0.2529, over 29439.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1951, cr_loss=0.4216, attn_decoder_loss=0.2795, over 3047840.90 frames. ], batch size: 70, lr: 1.44e-02, grad_scale: 4.0 2024-09-17 02:24:00,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=127300.0, ans=0.0 2024-09-17 02:24:24,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=127340.0, ans=0.0 2024-09-17 02:24:45,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2024-09-17 02:24:54,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=127420.0, ans=0.125 2024-09-17 02:24:57,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=127460.0, ans=0.0 2024-09-17 02:25:00,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=15.0 2024-09-17 02:25:12,587 INFO [train.py:1198] (1/2) Epoch 8, batch 200, loss[loss=0.2913, ctc_loss=0.2081, cr_loss=0.4185, attn_decoder_loss=0.2912, over 27152.00 frames. ], tot_loss[loss=0.2783, ctc_loss=0.1936, cr_loss=0.4217, attn_decoder_loss=0.2784, over 3659944.99 frames. ], batch size: 124, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:25:13,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=12.0 2024-09-17 02:25:26,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=127540.0, ans=0.125 2024-09-17 02:25:27,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2024-09-17 02:25:38,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=127540.0, ans=0.0 2024-09-17 02:25:50,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=127580.0, ans=0.0 2024-09-17 02:25:54,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-09-17 02:26:17,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=127660.0, ans=0.125 2024-09-17 02:26:33,475 INFO [train.py:1198] (1/2) Epoch 8, batch 250, loss[loss=0.295, ctc_loss=0.2048, cr_loss=0.4351, attn_decoder_loss=0.2953, over 29244.00 frames. ], tot_loss[loss=0.2777, ctc_loss=0.1923, cr_loss=0.421, attn_decoder_loss=0.2778, over 4140822.62 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 4.0 2024-09-17 02:26:34,927 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 9.771e+01 1.014e+02 1.103e+02 1.585e+02, threshold=2.028e+02, percent-clipped=0.0 2024-09-17 02:26:47,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=127740.0, ans=0.1 2024-09-17 02:27:00,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=127740.0, ans=0.125 2024-09-17 02:27:03,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=127780.0, ans=0.125 2024-09-17 02:27:17,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=127820.0, ans=0.0 2024-09-17 02:27:19,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=127820.0, ans=0.125 2024-09-17 02:27:25,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=127820.0, ans=0.125 2024-09-17 02:27:48,967 INFO [train.py:1198] (1/2) Epoch 8, batch 300, loss[loss=0.2923, ctc_loss=0.1983, cr_loss=0.4432, attn_decoder_loss=0.2929, over 29526.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.192, cr_loss=0.421, attn_decoder_loss=0.2777, over 4509614.03 frames. ], batch size: 92, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:27:52,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=127900.0, ans=0.125 2024-09-17 02:27:52,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=127900.0, ans=0.125 2024-09-17 02:27:53,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=127900.0, ans=0.125 2024-09-17 02:27:59,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=127900.0, ans=0.1 2024-09-17 02:28:20,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=127980.0, ans=0.0 2024-09-17 02:28:37,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=127980.0, ans=0.125 2024-09-17 02:28:54,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=128020.0, ans=0.09899494936611666 2024-09-17 02:29:12,056 INFO [train.py:1198] (1/2) Epoch 8, batch 350, loss[loss=0.253, ctc_loss=0.1738, cr_loss=0.3976, attn_decoder_loss=0.2529, over 29347.00 frames. ], tot_loss[loss=0.2777, ctc_loss=0.192, cr_loss=0.4212, attn_decoder_loss=0.2779, over 4795465.45 frames. ], batch size: 71, lr: 1.44e-02, grad_scale: 4.0 2024-09-17 02:29:13,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=128100.0, ans=0.125 2024-09-17 02:29:14,914 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.409e+01 1.011e+02 1.095e+02 1.201e+02 2.476e+02, threshold=2.189e+02, percent-clipped=3.0 2024-09-17 02:29:19,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=128100.0, ans=0.0 2024-09-17 02:29:59,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=128220.0, ans=0.0 2024-09-17 02:30:29,956 INFO [train.py:1198] (1/2) Epoch 8, batch 400, loss[loss=0.2956, ctc_loss=0.2035, cr_loss=0.4478, attn_decoder_loss=0.2959, over 29709.00 frames. ], tot_loss[loss=0.2773, ctc_loss=0.1913, cr_loss=0.4209, attn_decoder_loss=0.2775, over 5024735.56 frames. ], batch size: 82, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:30:39,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2024-09-17 02:30:41,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-09-17 02:30:49,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=128340.0, ans=0.125 2024-09-17 02:30:58,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.66 vs. limit=15.0 2024-09-17 02:30:59,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-09-17 02:31:26,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128420.0, ans=0.1 2024-09-17 02:31:40,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=128460.0, ans=0.0 2024-09-17 02:31:48,781 INFO [train.py:1198] (1/2) Epoch 8, batch 450, loss[loss=0.2833, ctc_loss=0.1901, cr_loss=0.3982, attn_decoder_loss=0.2848, over 29690.00 frames. ], tot_loss[loss=0.2769, ctc_loss=0.1907, cr_loss=0.4203, attn_decoder_loss=0.2771, over 5187306.64 frames. ], batch size: 83, lr: 1.43e-02, grad_scale: 4.0 2024-09-17 02:31:49,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=128500.0, ans=0.125 2024-09-17 02:31:53,265 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 1.003e+02 1.077e+02 1.187e+02 3.906e+02, threshold=2.154e+02, percent-clipped=1.0 2024-09-17 02:32:06,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.74 vs. limit=10.0 2024-09-17 02:32:13,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=128540.0, ans=0.0 2024-09-17 02:32:16,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=128540.0, ans=0.0 2024-09-17 02:32:42,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=128620.0, ans=0.2 2024-09-17 02:32:44,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.30 vs. limit=15.0 2024-09-17 02:33:03,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128700.0, ans=0.1 2024-09-17 02:33:05,019 INFO [train.py:1198] (1/2) Epoch 8, batch 500, loss[loss=0.2957, ctc_loss=0.2056, cr_loss=0.4559, attn_decoder_loss=0.2956, over 29406.00 frames. ], tot_loss[loss=0.2761, ctc_loss=0.1899, cr_loss=0.4193, attn_decoder_loss=0.2764, over 5329590.49 frames. ], batch size: 94, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:33:17,894 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:34:06,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=128860.0, ans=0.2 2024-09-17 02:34:09,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128860.0, ans=0.1 2024-09-17 02:34:14,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=128860.0, ans=0.125 2024-09-17 02:34:17,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=128860.0, ans=0.125 2024-09-17 02:34:19,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=128860.0, ans=0.0 2024-09-17 02:34:19,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.69 vs. limit=15.0 2024-09-17 02:34:23,769 INFO [train.py:1198] (1/2) Epoch 8, batch 550, loss[loss=0.2857, ctc_loss=0.2023, cr_loss=0.4257, attn_decoder_loss=0.2855, over 28821.00 frames. ], tot_loss[loss=0.2763, ctc_loss=0.1903, cr_loss=0.4196, attn_decoder_loss=0.2766, over 5423202.59 frames. ], batch size: 104, lr: 1.43e-02, grad_scale: 4.0 2024-09-17 02:34:31,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=128900.0, ans=0.025 2024-09-17 02:34:32,981 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.612e+01 1.023e+02 1.117e+02 1.226e+02 1.997e+02, threshold=2.234e+02, percent-clipped=0.0 2024-09-17 02:34:50,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=128940.0, ans=0.125 2024-09-17 02:34:52,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=5.58 vs. limit=12.0 2024-09-17 02:35:13,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=129020.0, ans=0.95 2024-09-17 02:35:14,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=129020.0, ans=0.0 2024-09-17 02:35:19,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=129020.0, ans=0.0 2024-09-17 02:35:43,603 INFO [train.py:1198] (1/2) Epoch 8, batch 600, loss[loss=0.2936, ctc_loss=0.2042, cr_loss=0.4512, attn_decoder_loss=0.2935, over 29236.00 frames. ], tot_loss[loss=0.2766, ctc_loss=0.1903, cr_loss=0.4199, attn_decoder_loss=0.2769, over 5509411.54 frames. ], batch size: 100, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:35:47,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=129100.0, ans=0.125 2024-09-17 02:35:59,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=129140.0, ans=0.125 2024-09-17 02:36:02,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=129140.0, ans=0.125 2024-09-17 02:36:06,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=129140.0, ans=0.1 2024-09-17 02:36:13,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2024-09-17 02:36:38,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=129220.0, ans=0.035 2024-09-17 02:36:55,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=129260.0, ans=0.2 2024-09-17 02:36:55,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=129260.0, ans=0.2 2024-09-17 02:36:59,411 INFO [train.py:1198] (1/2) Epoch 8, batch 650, loss[loss=0.2604, ctc_loss=0.1708, cr_loss=0.4052, attn_decoder_loss=0.2614, over 29770.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1887, cr_loss=0.418, attn_decoder_loss=0.2756, over 5586276.17 frames. ], batch size: 81, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:37:01,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-09-17 02:37:04,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=129300.0, ans=0.125 2024-09-17 02:37:05,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.505e+01 9.950e+01 1.082e+02 1.181e+02 2.497e+02, threshold=2.164e+02, percent-clipped=2.0 2024-09-17 02:37:27,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=129340.0, ans=0.025 2024-09-17 02:37:54,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2024-09-17 02:38:15,953 INFO [train.py:1198] (1/2) Epoch 8, batch 700, loss[loss=0.2661, ctc_loss=0.1766, cr_loss=0.3781, attn_decoder_loss=0.2677, over 29541.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1891, cr_loss=0.4189, attn_decoder_loss=0.2763, over 5637833.77 frames. ], batch size: 76, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:38:20,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=129500.0, ans=0.1 2024-09-17 02:38:46,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=129540.0, ans=0.0 2024-09-17 02:38:59,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=129580.0, ans=0.125 2024-09-17 02:39:18,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=129620.0, ans=0.0 2024-09-17 02:39:32,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-09-17 02:39:37,334 INFO [train.py:1198] (1/2) Epoch 8, batch 750, loss[loss=0.286, ctc_loss=0.1941, cr_loss=0.4322, attn_decoder_loss=0.2866, over 29687.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1887, cr_loss=0.4172, attn_decoder_loss=0.2758, over 5676344.66 frames. ], batch size: 82, lr: 1.43e-02, grad_scale: 4.0 2024-09-17 02:39:37,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=129700.0, ans=0.125 2024-09-17 02:39:46,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.613e+01 1.021e+02 1.093e+02 1.208e+02 3.929e+02, threshold=2.185e+02, percent-clipped=1.0 2024-09-17 02:39:47,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2024-09-17 02:40:47,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=129860.0, ans=0.0 2024-09-17 02:40:53,437 INFO [train.py:1198] (1/2) Epoch 8, batch 800, loss[loss=0.2472, ctc_loss=0.1647, cr_loss=0.3948, attn_decoder_loss=0.2476, over 29608.00 frames. ], tot_loss[loss=0.2751, ctc_loss=0.1885, cr_loss=0.4169, attn_decoder_loss=0.2755, over 5706845.59 frames. ], batch size: 73, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:41:57,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2024-09-17 02:42:00,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=130060.0, ans=0.0 2024-09-17 02:42:08,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=130100.0, ans=0.125 2024-09-17 02:42:09,583 INFO [train.py:1198] (1/2) Epoch 8, batch 850, loss[loss=0.2808, ctc_loss=0.1845, cr_loss=0.4274, attn_decoder_loss=0.282, over 29711.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1879, cr_loss=0.4166, attn_decoder_loss=0.2751, over 5735114.88 frames. ], batch size: 89, lr: 1.43e-02, grad_scale: 4.0 2024-09-17 02:42:20,111 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.352e+01 1.021e+02 1.113e+02 1.293e+02 2.449e+02, threshold=2.226e+02, percent-clipped=1.0 2024-09-17 02:42:20,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130100.0, ans=0.1 2024-09-17 02:42:32,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.24 vs. limit=22.5 2024-09-17 02:42:39,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=130140.0, ans=0.0 2024-09-17 02:42:41,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2024-09-17 02:43:01,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=130220.0, ans=0.04949747468305833 2024-09-17 02:43:20,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=130260.0, ans=0.025 2024-09-17 02:43:24,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=130260.0, ans=0.125 2024-09-17 02:43:27,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=130260.0, ans=0.125 2024-09-17 02:43:31,943 INFO [train.py:1198] (1/2) Epoch 8, batch 900, loss[loss=0.2455, ctc_loss=0.1558, cr_loss=0.3481, attn_decoder_loss=0.2478, over 29568.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1882, cr_loss=0.4171, attn_decoder_loss=0.2754, over 5740257.07 frames. ], batch size: 73, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:43:41,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=130300.0, ans=0.0 2024-09-17 02:43:55,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=130340.0, ans=0.125 2024-09-17 02:44:04,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=130380.0, ans=0.0 2024-09-17 02:44:07,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=130380.0, ans=0.0 2024-09-17 02:44:29,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=22.5 2024-09-17 02:44:42,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=130460.0, ans=0.125 2024-09-17 02:44:48,522 INFO [train.py:1198] (1/2) Epoch 8, batch 950, loss[loss=0.2588, ctc_loss=0.1722, cr_loss=0.3939, attn_decoder_loss=0.2597, over 29520.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1886, cr_loss=0.4171, attn_decoder_loss=0.2756, over 5743017.61 frames. ], batch size: 74, lr: 1.42e-02, grad_scale: 4.0 2024-09-17 02:44:51,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=130500.0, ans=0.2 2024-09-17 02:44:59,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=130500.0, ans=0.125 2024-09-17 02:45:00,450 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.707e+01 1.021e+02 1.105e+02 1.238e+02 2.320e+02, threshold=2.209e+02, percent-clipped=1.0 2024-09-17 02:45:00,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=130500.0, ans=0.05 2024-09-17 02:45:13,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130540.0, ans=0.1 2024-09-17 02:45:34,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=12.0 2024-09-17 02:45:44,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=130620.0, ans=0.0 2024-09-17 02:45:53,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=130660.0, ans=0.0 2024-09-17 02:46:04,908 INFO [train.py:1198] (1/2) Epoch 8, batch 1000, loss[loss=0.2627, ctc_loss=0.1703, cr_loss=0.3937, attn_decoder_loss=0.2642, over 29495.00 frames. ], tot_loss[loss=0.2763, ctc_loss=0.1896, cr_loss=0.4192, attn_decoder_loss=0.2766, over 5737309.70 frames. ], batch size: 77, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:46:23,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=130740.0, ans=0.125 2024-09-17 02:46:41,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.09 vs. limit=15.0 2024-09-17 02:46:42,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=130780.0, ans=0.125 2024-09-17 02:46:44,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-09-17 02:46:54,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=130820.0, ans=0.0 2024-09-17 02:47:00,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=130820.0, ans=10.0 2024-09-17 02:47:02,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=22.5 2024-09-17 02:47:04,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2024-09-17 02:47:14,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.16 vs. limit=15.0 2024-09-17 02:47:19,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-09-17 02:47:26,002 INFO [train.py:1198] (1/2) Epoch 8, batch 1050, loss[loss=0.2763, ctc_loss=0.181, cr_loss=0.4284, attn_decoder_loss=0.2773, over 29698.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1886, cr_loss=0.4178, attn_decoder_loss=0.2757, over 5745150.14 frames. ], batch size: 85, lr: 1.42e-02, grad_scale: 4.0 2024-09-17 02:47:26,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=130900.0, ans=0.125 2024-09-17 02:47:32,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2024-09-17 02:47:34,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.20 vs. limit=15.0 2024-09-17 02:47:39,741 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.558e+01 1.020e+02 1.112e+02 1.252e+02 2.111e+02, threshold=2.224e+02, percent-clipped=0.0 2024-09-17 02:48:01,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=130980.0, ans=0.0 2024-09-17 02:48:08,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2024-09-17 02:48:18,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=131020.0, ans=0.125 2024-09-17 02:48:38,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=131060.0, ans=0.0 2024-09-17 02:48:42,653 INFO [train.py:1198] (1/2) Epoch 8, batch 1100, loss[loss=0.2767, ctc_loss=0.1937, cr_loss=0.4294, attn_decoder_loss=0.2764, over 29459.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1885, cr_loss=0.4178, attn_decoder_loss=0.2756, over 5757376.92 frames. ], batch size: 78, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:48:51,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2024-09-17 02:49:00,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.02 vs. limit=10.0 2024-09-17 02:49:01,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=131140.0, ans=0.0 2024-09-17 02:49:18,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=131180.0, ans=0.0 2024-09-17 02:49:18,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=131180.0, ans=0.0 2024-09-17 02:49:27,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=131220.0, ans=0.025 2024-09-17 02:49:55,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-09-17 02:49:59,485 INFO [train.py:1198] (1/2) Epoch 8, batch 1150, loss[loss=0.2731, ctc_loss=0.1866, cr_loss=0.4253, attn_decoder_loss=0.2733, over 29456.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1889, cr_loss=0.4184, attn_decoder_loss=0.276, over 5755616.40 frames. ], batch size: 78, lr: 1.42e-02, grad_scale: 4.0 2024-09-17 02:50:16,922 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.392e+01 9.941e+01 1.085e+02 1.238e+02 2.659e+02, threshold=2.171e+02, percent-clipped=2.0 2024-09-17 02:50:26,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=131340.0, ans=0.0 2024-09-17 02:50:44,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=131380.0, ans=0.125 2024-09-17 02:50:48,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=131420.0, ans=0.0 2024-09-17 02:50:54,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=131420.0, ans=0.125 2024-09-17 02:51:17,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=131460.0, ans=0.95 2024-09-17 02:51:19,981 INFO [train.py:1198] (1/2) Epoch 8, batch 1200, loss[loss=0.2838, ctc_loss=0.1908, cr_loss=0.4328, attn_decoder_loss=0.2845, over 29668.00 frames. ], tot_loss[loss=0.2768, ctc_loss=0.1903, cr_loss=0.4204, attn_decoder_loss=0.277, over 5748099.61 frames. ], batch size: 85, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:51:35,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=131540.0, ans=0.125 2024-09-17 02:51:43,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=131540.0, ans=0.0 2024-09-17 02:51:48,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.31 vs. limit=22.5 2024-09-17 02:51:57,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=131580.0, ans=0.0 2024-09-17 02:52:18,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=12.0 2024-09-17 02:52:36,141 INFO [train.py:1198] (1/2) Epoch 8, batch 1250, loss[loss=0.2898, ctc_loss=0.1998, cr_loss=0.4265, attn_decoder_loss=0.2903, over 29517.00 frames. ], tot_loss[loss=0.2772, ctc_loss=0.1902, cr_loss=0.4205, attn_decoder_loss=0.2775, over 5774459.38 frames. ], batch size: 92, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:52:50,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=131740.0, ans=10.0 2024-09-17 02:52:52,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=22.5 2024-09-17 02:52:52,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.535e+01 1.024e+02 1.090e+02 1.251e+02 7.392e+02, threshold=2.180e+02, percent-clipped=1.0 2024-09-17 02:53:09,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131780.0, ans=0.1 2024-09-17 02:53:20,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=131820.0, ans=0.125 2024-09-17 02:53:23,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=131820.0, ans=0.0 2024-09-17 02:53:52,340 INFO [train.py:1198] (1/2) Epoch 8, batch 1300, loss[loss=0.288, ctc_loss=0.2048, cr_loss=0.4016, attn_decoder_loss=0.2883, over 28510.00 frames. ], tot_loss[loss=0.2761, ctc_loss=0.1891, cr_loss=0.4189, attn_decoder_loss=0.2764, over 5780427.21 frames. ], batch size: 112, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:54:03,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=22.5 2024-09-17 02:54:13,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=131940.0, ans=0.125 2024-09-17 02:54:29,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=131980.0, ans=0.125 2024-09-17 02:54:43,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=132020.0, ans=0.0 2024-09-17 02:55:12,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=132100.0, ans=0.125 2024-09-17 02:55:13,328 INFO [train.py:1198] (1/2) Epoch 8, batch 1350, loss[loss=0.2752, ctc_loss=0.1805, cr_loss=0.422, attn_decoder_loss=0.2763, over 29762.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1884, cr_loss=0.4182, attn_decoder_loss=0.2762, over 5796615.90 frames. ], batch size: 81, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:55:15,108 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:55:16,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=132100.0, ans=0.125 2024-09-17 02:55:29,762 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.512e+01 9.976e+01 1.075e+02 1.151e+02 1.437e+02, threshold=2.149e+02, percent-clipped=0.0 2024-09-17 02:55:33,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.00 vs. limit=22.5 2024-09-17 02:55:53,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=132180.0, ans=0.125 2024-09-17 02:56:00,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=132220.0, ans=0.0 2024-09-17 02:56:05,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=17.54 vs. limit=15.0 2024-09-17 02:56:07,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132220.0, ans=0.1 2024-09-17 02:56:28,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-09-17 02:56:28,853 INFO [train.py:1198] (1/2) Epoch 8, batch 1400, loss[loss=0.238, ctc_loss=0.1558, cr_loss=0.3571, attn_decoder_loss=0.2392, over 29576.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1882, cr_loss=0.4187, attn_decoder_loss=0.2759, over 5807244.20 frames. ], batch size: 69, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:56:30,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=132300.0, ans=0.125 2024-09-17 02:56:33,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=132300.0, ans=0.125 2024-09-17 02:56:36,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132300.0, ans=0.125 2024-09-17 02:57:00,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=132380.0, ans=0.125 2024-09-17 02:57:02,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=132380.0, ans=0.2 2024-09-17 02:57:03,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=132380.0, ans=0.125 2024-09-17 02:57:28,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.13 vs. limit=15.0 2024-09-17 02:57:35,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=132460.0, ans=0.125 2024-09-17 02:57:44,592 INFO [train.py:1198] (1/2) Epoch 8, batch 1450, loss[loss=0.3018, ctc_loss=0.2121, cr_loss=0.4678, attn_decoder_loss=0.3014, over 29454.00 frames. ], tot_loss[loss=0.2759, ctc_loss=0.1887, cr_loss=0.4188, attn_decoder_loss=0.2763, over 5804220.59 frames. ], batch size: 94, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 02:58:03,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-09-17 02:58:06,572 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.324e+01 1.032e+02 1.089e+02 1.206e+02 2.427e+02, threshold=2.178e+02, percent-clipped=3.0 2024-09-17 02:58:26,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=132580.0, ans=0.09899494936611666 2024-09-17 02:58:49,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=132660.0, ans=0.07 2024-09-17 02:58:50,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=132660.0, ans=0.0 2024-09-17 02:59:04,178 INFO [train.py:1198] (1/2) Epoch 8, batch 1500, loss[loss=0.2922, ctc_loss=0.2126, cr_loss=0.4688, attn_decoder_loss=0.2906, over 29627.00 frames. ], tot_loss[loss=0.2761, ctc_loss=0.1889, cr_loss=0.4193, attn_decoder_loss=0.2765, over 5804452.59 frames. ], batch size: 86, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 02:59:06,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=132700.0, ans=0.1 2024-09-17 02:59:37,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=132780.0, ans=0.2 2024-09-17 02:59:49,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-09-17 03:00:20,624 INFO [train.py:1198] (1/2) Epoch 8, batch 1550, loss[loss=0.2964, ctc_loss=0.2095, cr_loss=0.4612, attn_decoder_loss=0.2958, over 29517.00 frames. ], tot_loss[loss=0.2759, ctc_loss=0.189, cr_loss=0.419, attn_decoder_loss=0.2763, over 5777918.50 frames. ], batch size: 90, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 03:00:34,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=132940.0, ans=0.0 2024-09-17 03:00:41,775 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.026e+01 9.829e+01 1.097e+02 1.218e+02 3.935e+02, threshold=2.194e+02, percent-clipped=3.0 2024-09-17 03:00:51,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132980.0, ans=0.125 2024-09-17 03:00:52,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=132980.0, ans=0.125 2024-09-17 03:00:57,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.09 vs. limit=12.0 2024-09-17 03:01:08,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2024-09-17 03:01:35,977 INFO [train.py:1198] (1/2) Epoch 8, batch 1600, loss[loss=0.2886, ctc_loss=0.1956, cr_loss=0.4454, attn_decoder_loss=0.289, over 29672.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1887, cr_loss=0.4186, attn_decoder_loss=0.2758, over 5761269.46 frames. ], batch size: 85, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 03:01:44,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.65 vs. limit=15.0 2024-09-17 03:01:52,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=133140.0, ans=0.125 2024-09-17 03:01:53,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=133140.0, ans=0.025 2024-09-17 03:01:54,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.51 vs. limit=22.5 2024-09-17 03:02:26,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-17 03:02:50,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.25 vs. limit=15.0 2024-09-17 03:02:55,930 INFO [train.py:1198] (1/2) Epoch 8, batch 1650, loss[loss=0.2833, ctc_loss=0.1902, cr_loss=0.4277, attn_decoder_loss=0.2841, over 29720.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1888, cr_loss=0.4183, attn_decoder_loss=0.2758, over 5757548.66 frames. ], batch size: 89, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 03:02:58,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2024-09-17 03:03:06,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=133300.0, ans=0.0 2024-09-17 03:03:14,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=133340.0, ans=0.05 2024-09-17 03:03:18,406 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.840e+01 1.022e+02 1.128e+02 1.304e+02 4.033e+02, threshold=2.256e+02, percent-clipped=2.0 2024-09-17 03:03:20,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=133340.0, ans=0.125 2024-09-17 03:03:41,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=133420.0, ans=0.0 2024-09-17 03:03:56,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=133460.0, ans=0.2 2024-09-17 03:04:11,204 INFO [train.py:1198] (1/2) Epoch 8, batch 1700, loss[loss=0.243, ctc_loss=0.1641, cr_loss=0.3752, attn_decoder_loss=0.2435, over 29588.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1881, cr_loss=0.4182, attn_decoder_loss=0.2754, over 5779349.94 frames. ], batch size: 69, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 03:04:32,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=133540.0, ans=0.125 2024-09-17 03:04:43,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=133580.0, ans=0.2 2024-09-17 03:04:52,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=133580.0, ans=0.0 2024-09-17 03:04:54,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.33 vs. limit=15.0 2024-09-17 03:05:26,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=133700.0, ans=0.2 2024-09-17 03:05:27,782 INFO [train.py:1198] (1/2) Epoch 8, batch 1750, loss[loss=0.2556, ctc_loss=0.181, cr_loss=0.4072, attn_decoder_loss=0.2549, over 29281.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1875, cr_loss=0.4171, attn_decoder_loss=0.2749, over 5789298.83 frames. ], batch size: 67, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 03:05:38,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=133700.0, ans=0.07 2024-09-17 03:05:42,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=133700.0, ans=0.125 2024-09-17 03:05:48,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.67 vs. limit=15.0 2024-09-17 03:05:55,426 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.904e+01 9.818e+01 1.049e+02 1.183e+02 2.492e+02, threshold=2.098e+02, percent-clipped=1.0 2024-09-17 03:06:13,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=133780.0, ans=0.1 2024-09-17 03:06:17,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=133820.0, ans=0.125 2024-09-17 03:06:48,891 INFO [train.py:1198] (1/2) Epoch 8, batch 1800, loss[loss=0.3003, ctc_loss=0.2112, cr_loss=0.4579, attn_decoder_loss=0.3001, over 29708.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1879, cr_loss=0.4171, attn_decoder_loss=0.2752, over 5792337.51 frames. ], batch size: 83, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 03:06:49,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=133900.0, ans=0.125 2024-09-17 03:06:49,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=133900.0, ans=0.125 2024-09-17 03:07:04,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-09-17 03:07:25,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=133980.0, ans=0.025 2024-09-17 03:07:31,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=133980.0, ans=0.0 2024-09-17 03:07:37,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=134020.0, ans=0.125 2024-09-17 03:07:45,716 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:07:54,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=134060.0, ans=0.2 2024-09-17 03:08:05,024 INFO [train.py:1198] (1/2) Epoch 8, batch 1850, loss[loss=0.2713, ctc_loss=0.1843, cr_loss=0.4256, attn_decoder_loss=0.2715, over 29634.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1869, cr_loss=0.4166, attn_decoder_loss=0.2745, over 5797800.12 frames. ], batch size: 86, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 03:08:17,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=134100.0, ans=0.2 2024-09-17 03:08:17,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=134100.0, ans=0.2 2024-09-17 03:08:30,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.292e+01 1.011e+02 1.086e+02 1.212e+02 2.686e+02, threshold=2.172e+02, percent-clipped=1.0 2024-09-17 03:08:52,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=134220.0, ans=0.125 2024-09-17 03:08:54,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=134220.0, ans=0.125 2024-09-17 03:09:00,060 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:09:05,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=134260.0, ans=0.125 2024-09-17 03:09:15,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=134260.0, ans=0.0 2024-09-17 03:09:18,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=134260.0, ans=0.2 2024-09-17 03:09:20,857 INFO [train.py:1198] (1/2) Epoch 8, batch 1900, loss[loss=0.289, ctc_loss=0.1982, cr_loss=0.4472, attn_decoder_loss=0.2892, over 29703.00 frames. ], tot_loss[loss=0.2751, ctc_loss=0.1877, cr_loss=0.4177, attn_decoder_loss=0.2755, over 5804775.35 frames. ], batch size: 89, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 03:09:21,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=134300.0, ans=0.2 2024-09-17 03:09:37,199 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:09:47,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=134340.0, ans=15.0 2024-09-17 03:09:49,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=134340.0, ans=0.125 2024-09-17 03:09:49,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=134340.0, ans=0.125 2024-09-17 03:10:02,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=134380.0, ans=0.125 2024-09-17 03:10:15,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-17 03:10:19,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=134420.0, ans=0.2 2024-09-17 03:10:22,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=134420.0, ans=0.125 2024-09-17 03:10:28,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=134460.0, ans=0.0 2024-09-17 03:10:41,891 INFO [train.py:1198] (1/2) Epoch 8, batch 1950, loss[loss=0.2788, ctc_loss=0.1934, cr_loss=0.4331, attn_decoder_loss=0.2786, over 29464.00 frames. ], tot_loss[loss=0.2765, ctc_loss=0.1886, cr_loss=0.4196, attn_decoder_loss=0.2769, over 5819136.30 frames. ], batch size: 78, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:10:56,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=134540.0, ans=0.125 2024-09-17 03:11:09,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.585e+01 1.007e+02 1.092e+02 1.214e+02 3.508e+02, threshold=2.184e+02, percent-clipped=3.0 2024-09-17 03:11:13,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2024-09-17 03:11:19,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=134580.0, ans=0.125 2024-09-17 03:11:28,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=134620.0, ans=0.2 2024-09-17 03:11:32,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=134620.0, ans=0.125 2024-09-17 03:11:42,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=134660.0, ans=0.125 2024-09-17 03:11:47,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=134660.0, ans=0.125 2024-09-17 03:11:49,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-09-17 03:11:57,952 INFO [train.py:1198] (1/2) Epoch 8, batch 2000, loss[loss=0.2343, ctc_loss=0.1563, cr_loss=0.3616, attn_decoder_loss=0.2349, over 29328.00 frames. ], tot_loss[loss=0.2772, ctc_loss=0.1895, cr_loss=0.4204, attn_decoder_loss=0.2776, over 5795308.39 frames. ], batch size: 67, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:12:15,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=134740.0, ans=0.125 2024-09-17 03:12:16,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=134740.0, ans=0.2 2024-09-17 03:12:21,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=134740.0, ans=0.125 2024-09-17 03:12:31,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2024-09-17 03:12:38,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=134780.0, ans=0.125 2024-09-17 03:12:51,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=134820.0, ans=0.0 2024-09-17 03:13:05,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=134860.0, ans=0.0 2024-09-17 03:13:08,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-17 03:13:14,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=6.0 2024-09-17 03:13:14,528 INFO [train.py:1198] (1/2) Epoch 8, batch 2050, loss[loss=0.2392, ctc_loss=0.157, cr_loss=0.3678, attn_decoder_loss=0.2402, over 29465.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1888, cr_loss=0.4194, attn_decoder_loss=0.2766, over 5788014.80 frames. ], batch size: 70, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:13:33,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=134940.0, ans=0.125 2024-09-17 03:13:37,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=134940.0, ans=0.05 2024-09-17 03:13:38,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=134940.0, ans=0.125 2024-09-17 03:13:39,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=134940.0, ans=15.0 2024-09-17 03:13:39,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2024-09-17 03:13:45,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.747e+01 9.821e+01 1.060e+02 1.158e+02 2.378e+02, threshold=2.119e+02, percent-clipped=1.0 2024-09-17 03:14:03,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2024-09-17 03:14:09,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135020.0, ans=0.1 2024-09-17 03:14:35,264 INFO [train.py:1198] (1/2) Epoch 8, batch 2100, loss[loss=0.2675, ctc_loss=0.1749, cr_loss=0.4057, attn_decoder_loss=0.2688, over 29736.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.188, cr_loss=0.4185, attn_decoder_loss=0.2761, over 5799174.19 frames. ], batch size: 81, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:14:35,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=135100.0, ans=12.0 2024-09-17 03:15:18,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135180.0, ans=0.1 2024-09-17 03:15:51,532 INFO [train.py:1198] (1/2) Epoch 8, batch 2150, loss[loss=0.2748, ctc_loss=0.1885, cr_loss=0.4481, attn_decoder_loss=0.2744, over 29463.00 frames. ], tot_loss[loss=0.2746, ctc_loss=0.1867, cr_loss=0.4174, attn_decoder_loss=0.275, over 5814440.31 frames. ], batch size: 78, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:16:04,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.58 vs. limit=10.0 2024-09-17 03:16:17,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135340.0, ans=0.1 2024-09-17 03:16:22,344 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.928e+01 9.784e+01 1.043e+02 1.111e+02 1.443e+02, threshold=2.086e+02, percent-clipped=0.0 2024-09-17 03:16:24,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=135380.0, ans=0.125 2024-09-17 03:16:48,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=135420.0, ans=0.0 2024-09-17 03:16:49,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.19 vs. limit=15.0 2024-09-17 03:17:02,617 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.27 vs. limit=12.0 2024-09-17 03:17:07,883 INFO [train.py:1198] (1/2) Epoch 8, batch 2200, loss[loss=0.289, ctc_loss=0.1957, cr_loss=0.4383, attn_decoder_loss=0.2896, over 29624.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1876, cr_loss=0.4184, attn_decoder_loss=0.2756, over 5811459.78 frames. ], batch size: 86, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:17:09,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=135500.0, ans=0.0 2024-09-17 03:17:17,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=135500.0, ans=0.0 2024-09-17 03:17:20,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=135500.0, ans=0.125 2024-09-17 03:17:24,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=135540.0, ans=0.125 2024-09-17 03:17:33,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=135540.0, ans=0.125 2024-09-17 03:17:47,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=135580.0, ans=0.025 2024-09-17 03:17:49,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=135580.0, ans=0.09899494936611666 2024-09-17 03:17:54,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.31 vs. limit=10.0 2024-09-17 03:18:28,891 INFO [train.py:1198] (1/2) Epoch 8, batch 2250, loss[loss=0.2808, ctc_loss=0.1978, cr_loss=0.427, attn_decoder_loss=0.2805, over 29706.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1874, cr_loss=0.4183, attn_decoder_loss=0.2753, over 5810789.77 frames. ], batch size: 82, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:18:30,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=135700.0, ans=0.125 2024-09-17 03:18:33,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=135700.0, ans=0.0 2024-09-17 03:18:54,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=135740.0, ans=0.125 2024-09-17 03:19:00,614 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.680e+01 9.920e+01 1.107e+02 1.209e+02 3.496e+02, threshold=2.214e+02, percent-clipped=1.0 2024-09-17 03:19:02,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=135780.0, ans=0.025 2024-09-17 03:19:08,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=135780.0, ans=0.125 2024-09-17 03:19:11,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=135780.0, ans=0.125 2024-09-17 03:19:40,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=135860.0, ans=0.0 2024-09-17 03:19:44,838 INFO [train.py:1198] (1/2) Epoch 8, batch 2300, loss[loss=0.2529, ctc_loss=0.1694, cr_loss=0.4124, attn_decoder_loss=0.253, over 29332.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1869, cr_loss=0.4174, attn_decoder_loss=0.2746, over 5799392.31 frames. ], batch size: 71, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:19:48,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=135900.0, ans=0.5 2024-09-17 03:19:49,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=135900.0, ans=0.0 2024-09-17 03:19:58,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=135940.0, ans=10.0 2024-09-17 03:20:07,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=135940.0, ans=0.025 2024-09-17 03:20:21,852 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:20:32,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=136020.0, ans=0.125 2024-09-17 03:20:34,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=136020.0, ans=0.2 2024-09-17 03:20:47,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-09-17 03:20:53,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=136060.0, ans=0.0 2024-09-17 03:20:57,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=136060.0, ans=0.0 2024-09-17 03:21:01,310 INFO [train.py:1198] (1/2) Epoch 8, batch 2350, loss[loss=0.28, ctc_loss=0.1803, cr_loss=0.4345, attn_decoder_loss=0.2814, over 29704.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1872, cr_loss=0.4177, attn_decoder_loss=0.2748, over 5805738.91 frames. ], batch size: 83, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:21:29,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=136140.0, ans=0.07 2024-09-17 03:21:34,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=136180.0, ans=0.125 2024-09-17 03:21:37,144 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.392e+01 1.038e+02 1.165e+02 1.369e+02 2.325e+02, threshold=2.330e+02, percent-clipped=1.0 2024-09-17 03:21:45,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=136180.0, ans=0.125 2024-09-17 03:22:21,881 INFO [train.py:1198] (1/2) Epoch 8, batch 2400, loss[loss=0.2634, ctc_loss=0.168, cr_loss=0.3885, attn_decoder_loss=0.2654, over 29545.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1874, cr_loss=0.4182, attn_decoder_loss=0.2753, over 5808933.29 frames. ], batch size: 76, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:22:26,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=136300.0, ans=0.0 2024-09-17 03:22:26,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=136300.0, ans=0.125 2024-09-17 03:22:29,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=136300.0, ans=0.125 2024-09-17 03:22:41,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=136340.0, ans=0.125 2024-09-17 03:23:25,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=136460.0, ans=0.125 2024-09-17 03:23:33,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=136460.0, ans=0.1 2024-09-17 03:23:37,600 INFO [train.py:1198] (1/2) Epoch 8, batch 2450, loss[loss=0.2795, ctc_loss=0.2007, cr_loss=0.4463, attn_decoder_loss=0.2783, over 29715.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1885, cr_loss=0.4199, attn_decoder_loss=0.2763, over 5786234.01 frames. ], batch size: 82, lr: 1.39e-02, grad_scale: 4.0 2024-09-17 03:23:48,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=136500.0, ans=0.125 2024-09-17 03:23:52,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=136540.0, ans=0.125 2024-09-17 03:24:11,924 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.397e+01 1.019e+02 1.082e+02 1.263e+02 3.288e+02, threshold=2.163e+02, percent-clipped=1.0 2024-09-17 03:24:19,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=136580.0, ans=0.125 2024-09-17 03:24:21,426 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:24:28,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=136620.0, ans=0.125 2024-09-17 03:24:38,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=136660.0, ans=0.1 2024-09-17 03:24:39,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=136660.0, ans=0.0 2024-09-17 03:24:40,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2024-09-17 03:24:52,935 INFO [train.py:1198] (1/2) Epoch 8, batch 2500, loss[loss=0.2726, ctc_loss=0.1833, cr_loss=0.3967, attn_decoder_loss=0.2737, over 29619.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1885, cr_loss=0.4195, attn_decoder_loss=0.2762, over 5796700.57 frames. ], batch size: 86, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:24:54,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=136700.0, ans=0.125 2024-09-17 03:24:57,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=136700.0, ans=0.0 2024-09-17 03:25:04,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=136700.0, ans=0.125 2024-09-17 03:25:21,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.05 vs. limit=22.5 2024-09-17 03:25:48,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=136820.0, ans=0.125 2024-09-17 03:26:13,247 INFO [train.py:1198] (1/2) Epoch 8, batch 2550, loss[loss=0.2611, ctc_loss=0.1794, cr_loss=0.4319, attn_decoder_loss=0.2606, over 29369.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1885, cr_loss=0.4198, attn_decoder_loss=0.2761, over 5798626.51 frames. ], batch size: 67, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:26:16,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=136900.0, ans=0.2 2024-09-17 03:26:34,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=136940.0, ans=0.0 2024-09-17 03:26:49,202 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.639e+01 1.024e+02 1.084e+02 1.212e+02 4.526e+02, threshold=2.168e+02, percent-clipped=2.0 2024-09-17 03:27:06,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=137020.0, ans=0.125 2024-09-17 03:27:28,806 INFO [train.py:1198] (1/2) Epoch 8, batch 2600, loss[loss=0.2678, ctc_loss=0.1822, cr_loss=0.4353, attn_decoder_loss=0.2677, over 29464.00 frames. ], tot_loss[loss=0.2761, ctc_loss=0.1885, cr_loss=0.4201, attn_decoder_loss=0.2764, over 5795388.48 frames. ], batch size: 78, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:27:54,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=137140.0, ans=0.0 2024-09-17 03:28:03,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=137180.0, ans=0.125 2024-09-17 03:28:43,794 INFO [train.py:1198] (1/2) Epoch 8, batch 2650, loss[loss=0.2984, ctc_loss=0.2108, cr_loss=0.4395, attn_decoder_loss=0.2984, over 29243.00 frames. ], tot_loss[loss=0.2761, ctc_loss=0.1886, cr_loss=0.4204, attn_decoder_loss=0.2765, over 5801788.50 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 4.0 2024-09-17 03:28:44,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=137300.0, ans=0.125 2024-09-17 03:28:48,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=137300.0, ans=0.0 2024-09-17 03:28:50,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=137300.0, ans=0.1 2024-09-17 03:29:00,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.57 vs. limit=15.0 2024-09-17 03:29:08,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=137340.0, ans=0.0 2024-09-17 03:29:13,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=137340.0, ans=0.0 2024-09-17 03:29:14,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=137380.0, ans=0.125 2024-09-17 03:29:23,309 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.237e+01 1.027e+02 1.110e+02 1.218e+02 2.254e+02, threshold=2.220e+02, percent-clipped=2.0 2024-09-17 03:29:49,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-17 03:29:53,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=137460.0, ans=0.0 2024-09-17 03:30:02,789 INFO [train.py:1198] (1/2) Epoch 8, batch 2700, loss[loss=0.2782, ctc_loss=0.1848, cr_loss=0.4271, attn_decoder_loss=0.2791, over 29534.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1885, cr_loss=0.4196, attn_decoder_loss=0.2766, over 5796478.68 frames. ], batch size: 87, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:30:09,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=137500.0, ans=0.125 2024-09-17 03:30:19,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=137540.0, ans=0.04949747468305833 2024-09-17 03:30:21,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=137540.0, ans=0.125 2024-09-17 03:30:31,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=137580.0, ans=0.0 2024-09-17 03:30:50,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=137620.0, ans=0.05 2024-09-17 03:31:08,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=137660.0, ans=0.025 2024-09-17 03:31:18,719 INFO [train.py:1198] (1/2) Epoch 8, batch 2750, loss[loss=0.2779, ctc_loss=0.1963, cr_loss=0.4317, attn_decoder_loss=0.2774, over 29502.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1886, cr_loss=0.4195, attn_decoder_loss=0.2759, over 5795769.54 frames. ], batch size: 75, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:31:19,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=137700.0, ans=0.125 2024-09-17 03:31:32,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=137740.0, ans=0.125 2024-09-17 03:31:37,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=137740.0, ans=0.1 2024-09-17 03:31:53,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=137780.0, ans=0.95 2024-09-17 03:31:56,139 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.690e+01 1.009e+02 1.091e+02 1.195e+02 3.553e+02, threshold=2.183e+02, percent-clipped=1.0 2024-09-17 03:32:15,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-09-17 03:32:16,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=137820.0, ans=0.125 2024-09-17 03:32:20,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=137860.0, ans=0.2 2024-09-17 03:32:34,230 INFO [train.py:1198] (1/2) Epoch 8, batch 2800, loss[loss=0.3067, ctc_loss=0.2525, cr_loss=0.454, attn_decoder_loss=0.3026, over 20093.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1894, cr_loss=0.4194, attn_decoder_loss=0.2765, over 5776217.39 frames. ], batch size: 209, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:32:34,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=137900.0, ans=0.0 2024-09-17 03:32:40,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=137900.0, ans=15.0 2024-09-17 03:33:15,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=137980.0, ans=0.0 2024-09-17 03:33:24,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=138020.0, ans=0.125 2024-09-17 03:33:32,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=138020.0, ans=0.125 2024-09-17 03:33:41,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=138060.0, ans=0.09899494936611666 2024-09-17 03:33:41,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=138060.0, ans=0.2 2024-09-17 03:33:53,372 INFO [train.py:1198] (1/2) Epoch 8, batch 2850, loss[loss=0.268, ctc_loss=0.1828, cr_loss=0.4221, attn_decoder_loss=0.2681, over 29505.00 frames. ], tot_loss[loss=0.2768, ctc_loss=0.19, cr_loss=0.4203, attn_decoder_loss=0.2771, over 5762173.42 frames. ], batch size: 77, lr: 1.39e-02, grad_scale: 4.0 2024-09-17 03:33:59,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=138100.0, ans=0.125 2024-09-17 03:34:13,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=138140.0, ans=0.025 2024-09-17 03:34:33,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2024-09-17 03:34:34,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.049e+01 1.050e+02 1.191e+02 1.407e+02 3.981e+02, threshold=2.382e+02, percent-clipped=5.0 2024-09-17 03:34:45,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=138220.0, ans=0.125 2024-09-17 03:35:09,031 INFO [train.py:1198] (1/2) Epoch 8, batch 2900, loss[loss=0.2703, ctc_loss=0.1753, cr_loss=0.4092, attn_decoder_loss=0.2718, over 29412.00 frames. ], tot_loss[loss=0.2773, ctc_loss=0.1897, cr_loss=0.4214, attn_decoder_loss=0.2777, over 5787780.21 frames. ], batch size: 79, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:35:24,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.72 vs. limit=15.0 2024-09-17 03:35:35,041 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:35:45,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=138380.0, ans=0.125 2024-09-17 03:35:56,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=138420.0, ans=0.07 2024-09-17 03:36:00,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=138420.0, ans=0.09899494936611666 2024-09-17 03:36:16,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2024-09-17 03:36:24,468 INFO [train.py:1198] (1/2) Epoch 8, batch 2950, loss[loss=0.2632, ctc_loss=0.1747, cr_loss=0.4253, attn_decoder_loss=0.2636, over 29511.00 frames. ], tot_loss[loss=0.2759, ctc_loss=0.1884, cr_loss=0.419, attn_decoder_loss=0.2763, over 5782724.01 frames. ], batch size: 75, lr: 1.38e-02, grad_scale: 4.0 2024-09-17 03:37:04,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=138580.0, ans=0.0 2024-09-17 03:37:07,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=138580.0, ans=0.125 2024-09-17 03:37:08,773 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.343e+01 1.007e+02 1.102e+02 1.224e+02 2.215e+02, threshold=2.205e+02, percent-clipped=0.0 2024-09-17 03:37:42,134 INFO [train.py:1198] (1/2) Epoch 8, batch 3000, loss[loss=0.2769, ctc_loss=0.1807, cr_loss=0.4122, attn_decoder_loss=0.2785, over 29754.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1881, cr_loss=0.4188, attn_decoder_loss=0.2761, over 5782944.64 frames. ], batch size: 81, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:37:42,135 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 03:37:59,568 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7366, 5.5894, 4.9604, 5.2875], device='cuda:1') 2024-09-17 03:38:01,065 INFO [train.py:1230] (1/2) Epoch 8, validation: loss=0.2156, ctc_loss=0.0545, cr_loss=4.305e-15, attn_decoder_loss=0.2335, over 944034.00 frames. 2024-09-17 03:38:01,066 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 03:38:31,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=138780.0, ans=0.0 2024-09-17 03:38:31,595 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:39:04,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=138860.0, ans=0.125 2024-09-17 03:39:16,479 INFO [train.py:1198] (1/2) Epoch 8, batch 3050, loss[loss=0.2755, ctc_loss=0.1932, cr_loss=0.4184, attn_decoder_loss=0.2754, over 29538.00 frames. ], tot_loss[loss=0.2765, ctc_loss=0.1888, cr_loss=0.4197, attn_decoder_loss=0.2769, over 5776436.92 frames. ], batch size: 76, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:39:27,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=138900.0, ans=0.125 2024-09-17 03:39:31,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=138940.0, ans=0.0 2024-09-17 03:39:49,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=138980.0, ans=0.125 2024-09-17 03:39:58,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.688e+01 1.026e+02 1.087e+02 1.186e+02 2.791e+02, threshold=2.173e+02, percent-clipped=1.0 2024-09-17 03:39:59,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=138980.0, ans=0.125 2024-09-17 03:40:14,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=139020.0, ans=0.125 2024-09-17 03:40:20,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=139060.0, ans=0.2 2024-09-17 03:40:24,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=139060.0, ans=0.2 2024-09-17 03:40:27,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=139060.0, ans=0.0 2024-09-17 03:40:33,614 INFO [train.py:1198] (1/2) Epoch 8, batch 3100, loss[loss=0.2903, ctc_loss=0.2085, cr_loss=0.4239, attn_decoder_loss=0.2899, over 29257.00 frames. ], tot_loss[loss=0.2757, ctc_loss=0.1883, cr_loss=0.4189, attn_decoder_loss=0.2761, over 5775585.15 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:40:42,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=139100.0, ans=0.0 2024-09-17 03:40:45,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.49 vs. limit=15.0 2024-09-17 03:40:50,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=139140.0, ans=0.125 2024-09-17 03:40:50,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=139140.0, ans=0.0 2024-09-17 03:41:18,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-09-17 03:41:19,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=139220.0, ans=0.0 2024-09-17 03:41:28,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=139220.0, ans=0.2 2024-09-17 03:41:37,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=139260.0, ans=0.0 2024-09-17 03:41:40,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=139260.0, ans=0.025 2024-09-17 03:41:44,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=139260.0, ans=0.0 2024-09-17 03:41:51,267 INFO [train.py:1198] (1/2) Epoch 8, batch 3150, loss[loss=0.2901, ctc_loss=0.2053, cr_loss=0.4606, attn_decoder_loss=0.2893, over 28855.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1884, cr_loss=0.4201, attn_decoder_loss=0.2762, over 5781562.50 frames. ], batch size: 104, lr: 1.38e-02, grad_scale: 4.0 2024-09-17 03:42:36,630 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.236e+01 1.018e+02 1.127e+02 1.309e+02 2.778e+02, threshold=2.254e+02, percent-clipped=1.0 2024-09-17 03:42:57,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=139460.0, ans=0.0 2024-09-17 03:43:06,660 INFO [train.py:1198] (1/2) Epoch 8, batch 3200, loss[loss=0.2752, ctc_loss=0.1864, cr_loss=0.4272, attn_decoder_loss=0.2756, over 29441.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1876, cr_loss=0.4196, attn_decoder_loss=0.2756, over 5792958.64 frames. ], batch size: 79, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:43:08,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=139500.0, ans=0.125 2024-09-17 03:43:18,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2024-09-17 03:43:52,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=139620.0, ans=0.0 2024-09-17 03:44:11,677 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=22.5 2024-09-17 03:44:17,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=139660.0, ans=0.0 2024-09-17 03:44:24,506 INFO [train.py:1198] (1/2) Epoch 8, batch 3250, loss[loss=0.2722, ctc_loss=0.1832, cr_loss=0.3852, attn_decoder_loss=0.2736, over 29713.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1871, cr_loss=0.4189, attn_decoder_loss=0.2755, over 5799362.14 frames. ], batch size: 84, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:44:34,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.01 vs. limit=15.0 2024-09-17 03:44:39,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=139740.0, ans=10.0 2024-09-17 03:44:42,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=139740.0, ans=0.025 2024-09-17 03:44:56,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.22 vs. limit=10.0 2024-09-17 03:45:08,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=139820.0, ans=0.0 2024-09-17 03:45:08,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=139820.0, ans=0.0 2024-09-17 03:45:09,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.457e+01 9.664e+01 1.027e+02 1.100e+02 2.131e+02, threshold=2.054e+02, percent-clipped=0.0 2024-09-17 03:45:23,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=139860.0, ans=0.0 2024-09-17 03:45:32,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=139860.0, ans=0.95 2024-09-17 03:45:41,706 INFO [train.py:1198] (1/2) Epoch 8, batch 3300, loss[loss=0.2765, ctc_loss=0.1843, cr_loss=0.4108, attn_decoder_loss=0.2776, over 28339.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1857, cr_loss=0.4158, attn_decoder_loss=0.274, over 5796738.45 frames. ], batch size: 111, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:46:02,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.96 vs. limit=15.0 2024-09-17 03:46:04,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139940.0, ans=0.1 2024-09-17 03:46:26,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2024-09-17 03:46:32,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=12.0 2024-09-17 03:46:57,285 INFO [train.py:1198] (1/2) Epoch 8, batch 3350, loss[loss=0.2871, ctc_loss=0.202, cr_loss=0.4388, attn_decoder_loss=0.2868, over 28900.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1865, cr_loss=0.4163, attn_decoder_loss=0.2747, over 5772162.58 frames. ], batch size: 104, lr: 1.38e-02, grad_scale: 4.0 2024-09-17 03:47:13,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=140140.0, ans=15.0 2024-09-17 03:47:26,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=140180.0, ans=0.0 2024-09-17 03:47:29,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2024-09-17 03:47:47,549 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.416e+01 1.028e+02 1.095e+02 1.236e+02 5.561e+02, threshold=2.191e+02, percent-clipped=3.0 2024-09-17 03:47:47,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=140220.0, ans=0.125 2024-09-17 03:47:49,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=140220.0, ans=0.025 2024-09-17 03:47:58,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=140260.0, ans=0.125 2024-09-17 03:48:07,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=140260.0, ans=0.125 2024-09-17 03:48:14,969 INFO [train.py:1198] (1/2) Epoch 8, batch 3400, loss[loss=0.2436, ctc_loss=0.164, cr_loss=0.3773, attn_decoder_loss=0.2441, over 29363.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1873, cr_loss=0.4168, attn_decoder_loss=0.2753, over 5766105.31 frames. ], batch size: 67, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:49:08,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2024-09-17 03:49:18,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=140460.0, ans=0.125 2024-09-17 03:49:27,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=140460.0, ans=0.0 2024-09-17 03:49:31,834 INFO [train.py:1198] (1/2) Epoch 8, batch 3450, loss[loss=0.2845, ctc_loss=0.1931, cr_loss=0.3876, attn_decoder_loss=0.2861, over 28198.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1871, cr_loss=0.4167, attn_decoder_loss=0.2755, over 5775378.36 frames. ], batch size: 111, lr: 1.38e-02, grad_scale: 4.0 2024-09-17 03:49:33,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=140500.0, ans=0.125 2024-09-17 03:49:45,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=140540.0, ans=0.07 2024-09-17 03:49:53,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=140540.0, ans=0.1 2024-09-17 03:50:00,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=140580.0, ans=0.0 2024-09-17 03:50:21,400 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.539e+01 1.005e+02 1.084e+02 1.145e+02 2.009e+02, threshold=2.168e+02, percent-clipped=0.0 2024-09-17 03:50:21,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=140620.0, ans=0.02 2024-09-17 03:50:24,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=140620.0, ans=0.125 2024-09-17 03:50:24,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=140620.0, ans=0.125 2024-09-17 03:50:28,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=140620.0, ans=0.125 2024-09-17 03:50:30,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.08 vs. limit=10.0 2024-09-17 03:50:47,161 INFO [train.py:1198] (1/2) Epoch 8, batch 3500, loss[loss=0.246, ctc_loss=0.1612, cr_loss=0.3642, attn_decoder_loss=0.2473, over 29349.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1868, cr_loss=0.4165, attn_decoder_loss=0.2749, over 5777793.28 frames. ], batch size: 71, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 03:50:58,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.16 vs. limit=15.0 2024-09-17 03:51:02,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140740.0, ans=0.1 2024-09-17 03:51:12,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=140740.0, ans=0.0 2024-09-17 03:51:14,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=140740.0, ans=0.0 2024-09-17 03:51:31,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=140820.0, ans=0.0 2024-09-17 03:51:42,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=140820.0, ans=0.125 2024-09-17 03:51:49,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.77 vs. limit=10.0 2024-09-17 03:52:03,853 INFO [train.py:1198] (1/2) Epoch 8, batch 3550, loss[loss=0.282, ctc_loss=0.1904, cr_loss=0.4383, attn_decoder_loss=0.2824, over 29701.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1868, cr_loss=0.4172, attn_decoder_loss=0.275, over 5784471.55 frames. ], batch size: 89, lr: 1.37e-02, grad_scale: 4.0 2024-09-17 03:52:05,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=140900.0, ans=0.125 2024-09-17 03:52:05,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=140900.0, ans=0.0 2024-09-17 03:52:11,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=140900.0, ans=0.015 2024-09-17 03:52:15,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=140900.0, ans=0.0 2024-09-17 03:52:20,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=140940.0, ans=0.1 2024-09-17 03:52:24,761 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:52:31,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=15.0 2024-09-17 03:52:36,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=140980.0, ans=0.125 2024-09-17 03:52:38,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.29 vs. limit=10.0 2024-09-17 03:52:46,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=141020.0, ans=0.95 2024-09-17 03:52:53,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.324e+01 1.017e+02 1.100e+02 1.203e+02 4.569e+02, threshold=2.200e+02, percent-clipped=1.0 2024-09-17 03:52:57,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=141020.0, ans=0.0 2024-09-17 03:53:06,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=141060.0, ans=0.05 2024-09-17 03:53:12,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=141060.0, ans=0.125 2024-09-17 03:53:17,656 INFO [train.py:1198] (1/2) Epoch 8, batch 3600, loss[loss=0.2661, ctc_loss=0.1774, cr_loss=0.4108, attn_decoder_loss=0.2668, over 29506.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1869, cr_loss=0.4183, attn_decoder_loss=0.2751, over 5792760.23 frames. ], batch size: 77, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 03:53:18,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.78 vs. limit=15.0 2024-09-17 03:53:32,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=141140.0, ans=0.2 2024-09-17 03:53:44,732 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:54:13,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=141220.0, ans=0.1 2024-09-17 03:54:32,136 INFO [train.py:1198] (1/2) Epoch 8, batch 3650, loss[loss=0.2905, ctc_loss=0.1987, cr_loss=0.4395, attn_decoder_loss=0.2909, over 29492.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1858, cr_loss=0.4169, attn_decoder_loss=0.2742, over 5794742.18 frames. ], batch size: 90, lr: 1.37e-02, grad_scale: 4.0 2024-09-17 03:54:48,978 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:55:26,039 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.265e+01 1.005e+02 1.060e+02 1.174e+02 2.245e+02, threshold=2.119e+02, percent-clipped=1.0 2024-09-17 03:55:38,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=141460.0, ans=0.125 2024-09-17 03:55:45,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=141460.0, ans=0.1 2024-09-17 03:55:46,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-09-17 03:55:48,539 INFO [train.py:1198] (1/2) Epoch 8, batch 3700, loss[loss=0.2845, ctc_loss=0.1992, cr_loss=0.4608, attn_decoder_loss=0.2838, over 29716.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.186, cr_loss=0.4176, attn_decoder_loss=0.2746, over 5804304.62 frames. ], batch size: 84, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 03:55:59,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=141500.0, ans=0.125 2024-09-17 03:56:12,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=141540.0, ans=0.0 2024-09-17 03:56:16,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=141580.0, ans=0.0 2024-09-17 03:56:28,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=141580.0, ans=0.125 2024-09-17 03:56:47,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-17 03:56:52,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=141660.0, ans=0.0 2024-09-17 03:57:02,879 INFO [train.py:1198] (1/2) Epoch 8, batch 3750, loss[loss=0.2419, ctc_loss=0.158, cr_loss=0.3621, attn_decoder_loss=0.2432, over 29358.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1857, cr_loss=0.4173, attn_decoder_loss=0.2742, over 5807845.41 frames. ], batch size: 67, lr: 1.37e-02, grad_scale: 4.0 2024-09-17 03:57:09,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.56 vs. limit=12.0 2024-09-17 03:57:12,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-09-17 03:57:30,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=141740.0, ans=0.025 2024-09-17 03:57:40,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=141780.0, ans=0.125 2024-09-17 03:57:41,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=141780.0, ans=0.025 2024-09-17 03:57:53,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=141820.0, ans=0.125 2024-09-17 03:57:55,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=141820.0, ans=0.07 2024-09-17 03:57:56,594 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.547e+01 9.777e+01 1.089e+02 1.271e+02 6.127e+02, threshold=2.178e+02, percent-clipped=4.0 2024-09-17 03:58:11,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=141860.0, ans=0.1 2024-09-17 03:58:18,901 INFO [train.py:1198] (1/2) Epoch 8, batch 3800, loss[loss=0.2904, ctc_loss=0.1963, cr_loss=0.4217, attn_decoder_loss=0.2914, over 29639.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1858, cr_loss=0.4168, attn_decoder_loss=0.274, over 5798320.54 frames. ], batch size: 86, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 03:58:33,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=141940.0, ans=0.0 2024-09-17 03:58:35,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=141940.0, ans=0.125 2024-09-17 03:58:48,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2024-09-17 03:58:55,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2024-09-17 03:59:02,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=142020.0, ans=0.1 2024-09-17 03:59:03,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=142020.0, ans=0.1 2024-09-17 03:59:11,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142020.0, ans=0.1 2024-09-17 03:59:14,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=142020.0, ans=0.2 2024-09-17 03:59:25,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-09-17 03:59:33,122 INFO [train.py:1198] (1/2) Epoch 8, batch 3850, loss[loss=0.2724, ctc_loss=0.1825, cr_loss=0.4039, attn_decoder_loss=0.2735, over 29245.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1848, cr_loss=0.4154, attn_decoder_loss=0.2735, over 5812535.53 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 04:00:07,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=142180.0, ans=0.125 2024-09-17 04:00:26,679 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.741e+01 9.765e+01 1.055e+02 1.135e+02 1.958e+02, threshold=2.110e+02, percent-clipped=1.0 2024-09-17 04:00:41,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=142260.0, ans=0.0 2024-09-17 04:00:48,852 INFO [train.py:1198] (1/2) Epoch 8, batch 3900, loss[loss=0.2766, ctc_loss=0.1762, cr_loss=0.404, attn_decoder_loss=0.2787, over 29597.00 frames. ], tot_loss[loss=0.2733, ctc_loss=0.1847, cr_loss=0.4157, attn_decoder_loss=0.2739, over 5816995.20 frames. ], batch size: 86, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 04:00:53,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=142300.0, ans=0.2 2024-09-17 04:01:09,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=142340.0, ans=0.2 2024-09-17 04:01:13,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=142340.0, ans=0.125 2024-09-17 04:01:38,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-09-17 04:02:02,694 INFO [train.py:1198] (1/2) Epoch 8, batch 3950, loss[loss=0.2851, ctc_loss=0.1875, cr_loss=0.4285, attn_decoder_loss=0.2864, over 29454.00 frames. ], tot_loss[loss=0.273, ctc_loss=0.1837, cr_loss=0.4152, attn_decoder_loss=0.2737, over 5836277.97 frames. ], batch size: 97, lr: 1.37e-02, grad_scale: 4.0 2024-09-17 04:02:07,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=142500.0, ans=0.0 2024-09-17 04:02:09,407 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.18 vs. limit=6.0 2024-09-17 04:02:14,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=142500.0, ans=0.025 2024-09-17 04:02:42,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=142580.0, ans=0.125 2024-09-17 04:02:54,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=142620.0, ans=0.0 2024-09-17 04:02:55,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.03 vs. limit=22.5 2024-09-17 04:02:58,359 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 9.742e+01 1.045e+02 1.185e+02 2.599e+02, threshold=2.090e+02, percent-clipped=1.0 2024-09-17 04:03:11,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=142660.0, ans=0.0 2024-09-17 04:03:16,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.26 vs. limit=15.0 2024-09-17 04:03:17,015 INFO [train.py:1198] (1/2) Epoch 8, batch 4000, loss[loss=0.2566, ctc_loss=0.1626, cr_loss=0.3983, attn_decoder_loss=0.2582, over 29510.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1839, cr_loss=0.4148, attn_decoder_loss=0.2738, over 5813337.81 frames. ], batch size: 74, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:03:46,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=142780.0, ans=0.125 2024-09-17 04:03:51,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=142780.0, ans=0.125 2024-09-17 04:04:29,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=142900.0, ans=0.0 2024-09-17 04:04:30,884 INFO [train.py:1198] (1/2) Epoch 8, batch 4050, loss[loss=0.3166, ctc_loss=0.2599, cr_loss=0.4343, attn_decoder_loss=0.3132, over 20030.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1838, cr_loss=0.4142, attn_decoder_loss=0.2735, over 5796711.69 frames. ], batch size: 209, lr: 1.36e-02, grad_scale: 4.0 2024-09-17 04:04:45,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=142940.0, ans=0.125 2024-09-17 04:04:48,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=142940.0, ans=0.125 2024-09-17 04:05:12,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=142980.0, ans=0.0 2024-09-17 04:05:18,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=143020.0, ans=0.125 2024-09-17 04:05:29,421 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.245e+01 1.069e+02 1.234e+02 1.438e+02 3.012e+02, threshold=2.468e+02, percent-clipped=5.0 2024-09-17 04:05:31,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2024-09-17 04:05:44,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=143100.0, ans=0.125 2024-09-17 04:05:45,648 INFO [train.py:1198] (1/2) Epoch 8, batch 4100, loss[loss=0.2835, ctc_loss=0.1965, cr_loss=0.439, attn_decoder_loss=0.2835, over 29503.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1851, cr_loss=0.4167, attn_decoder_loss=0.2743, over 5792364.59 frames. ], batch size: 90, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:05:50,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=143100.0, ans=0.125 2024-09-17 04:06:00,472 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:06:04,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=143140.0, ans=0.035 2024-09-17 04:06:06,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=143140.0, ans=0.0 2024-09-17 04:06:07,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=143140.0, ans=0.0 2024-09-17 04:06:09,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=143140.0, ans=0.125 2024-09-17 04:06:40,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=143220.0, ans=0.0 2024-09-17 04:06:50,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2024-09-17 04:06:59,220 INFO [train.py:1198] (1/2) Epoch 8, batch 4150, loss[loss=0.2672, ctc_loss=0.1757, cr_loss=0.4083, attn_decoder_loss=0.2682, over 29498.00 frames. ], tot_loss[loss=0.2733, ctc_loss=0.1849, cr_loss=0.4168, attn_decoder_loss=0.2739, over 5797754.60 frames. ], batch size: 77, lr: 1.36e-02, grad_scale: 4.0 2024-09-17 04:07:12,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-09-17 04:07:25,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=12.0 2024-09-17 04:07:32,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=143380.0, ans=0.025 2024-09-17 04:07:38,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=143380.0, ans=0.125 2024-09-17 04:07:56,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=143420.0, ans=0.0 2024-09-17 04:07:58,960 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.371e+01 9.814e+01 1.059e+02 1.146e+02 1.859e+02, threshold=2.118e+02, percent-clipped=0.0 2024-09-17 04:08:13,624 INFO [train.py:1198] (1/2) Epoch 8, batch 4200, loss[loss=0.2901, ctc_loss=0.2002, cr_loss=0.4272, attn_decoder_loss=0.2906, over 29504.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1853, cr_loss=0.4176, attn_decoder_loss=0.2745, over 5799803.63 frames. ], batch size: 90, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:08:16,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.36 vs. limit=15.0 2024-09-17 04:08:16,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=143500.0, ans=0.1 2024-09-17 04:08:18,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=143500.0, ans=0.125 2024-09-17 04:08:36,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=143540.0, ans=0.025 2024-09-17 04:08:47,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143580.0, ans=0.1 2024-09-17 04:09:05,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=143620.0, ans=0.125 2024-09-17 04:09:05,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-17 04:09:22,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=143660.0, ans=0.0 2024-09-17 04:09:23,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=143660.0, ans=0.025 2024-09-17 04:09:27,756 INFO [train.py:1198] (1/2) Epoch 8, batch 4250, loss[loss=0.2451, ctc_loss=0.1505, cr_loss=0.3604, attn_decoder_loss=0.2476, over 29533.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1848, cr_loss=0.4169, attn_decoder_loss=0.2745, over 5805954.46 frames. ], batch size: 74, lr: 1.36e-02, grad_scale: 4.0 2024-09-17 04:09:30,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.66 vs. limit=10.0 2024-09-17 04:10:08,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=143780.0, ans=0.0 2024-09-17 04:10:09,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=29.72 vs. limit=22.5 2024-09-17 04:10:27,813 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.334e+01 1.014e+02 1.108e+02 1.214e+02 2.997e+02, threshold=2.217e+02, percent-clipped=4.0 2024-09-17 04:10:28,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=143860.0, ans=0.2 2024-09-17 04:10:39,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=143900.0, ans=0.0 2024-09-17 04:10:41,043 INFO [train.py:1198] (1/2) Epoch 8, batch 4300, loss[loss=0.2957, ctc_loss=0.203, cr_loss=0.4649, attn_decoder_loss=0.2956, over 29512.00 frames. ], tot_loss[loss=0.274, ctc_loss=0.1852, cr_loss=0.4174, attn_decoder_loss=0.2746, over 5795688.51 frames. ], batch size: 87, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:10:47,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=143900.0, ans=0.0 2024-09-17 04:10:50,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=15.0 2024-09-17 04:10:58,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=143940.0, ans=0.125 2024-09-17 04:11:02,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=143940.0, ans=0.125 2024-09-17 04:11:56,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=144060.0, ans=0.1 2024-09-17 04:12:02,537 INFO [train.py:1198] (1/2) Epoch 8, batch 4350, loss[loss=0.2914, ctc_loss=0.1982, cr_loss=0.4393, attn_decoder_loss=0.292, over 29451.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.188, cr_loss=0.4222, attn_decoder_loss=0.2781, over 5798409.50 frames. ], batch size: 97, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:12:08,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=144100.0, ans=0.1 2024-09-17 04:12:08,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=144100.0, ans=0.1 2024-09-17 04:12:16,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=144140.0, ans=0.1 2024-09-17 04:12:16,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=144140.0, ans=0.2 2024-09-17 04:12:22,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=144140.0, ans=0.125 2024-09-17 04:12:43,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-09-17 04:12:54,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-09-17 04:12:55,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.66 vs. limit=12.0 2024-09-17 04:13:03,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=144260.0, ans=0.125 2024-09-17 04:13:04,678 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.658e+01 1.032e+02 1.110e+02 1.170e+02 3.272e+02, threshold=2.221e+02, percent-clipped=1.0 2024-09-17 04:13:05,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2024-09-17 04:13:16,184 INFO [train.py:1198] (1/2) Epoch 8, batch 4400, loss[loss=0.2888, ctc_loss=0.2031, cr_loss=0.4504, attn_decoder_loss=0.2883, over 27298.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1904, cr_loss=0.4245, attn_decoder_loss=0.2803, over 5768149.71 frames. ], batch size: 124, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:13:19,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=144300.0, ans=0.02 2024-09-17 04:13:22,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=144300.0, ans=0.125 2024-09-17 04:13:28,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=144300.0, ans=0.0 2024-09-17 04:13:32,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-09-17 04:13:53,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=144380.0, ans=0.125 2024-09-17 04:13:54,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=144380.0, ans=0.2 2024-09-17 04:13:56,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=144380.0, ans=0.125 2024-09-17 04:14:10,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=144420.0, ans=0.0 2024-09-17 04:14:10,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=144420.0, ans=0.125 2024-09-17 04:14:13,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2024-09-17 04:14:29,772 INFO [train.py:1198] (1/2) Epoch 8, batch 4450, loss[loss=0.3078, ctc_loss=0.2582, cr_loss=0.4239, attn_decoder_loss=0.3038, over 20039.00 frames. ], tot_loss[loss=0.2834, ctc_loss=0.1963, cr_loss=0.4279, attn_decoder_loss=0.2835, over 5572366.71 frames. ], batch size: 209, lr: 1.36e-02, grad_scale: 4.0 2024-09-17 04:14:30,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=144500.0, ans=0.0 2024-09-17 04:14:38,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=144500.0, ans=0.0 2024-09-17 04:14:38,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=144500.0, ans=0.0 2024-09-17 04:14:55,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.02 vs. limit=10.0 2024-09-17 04:15:15,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=144620.0, ans=0.025 2024-09-17 04:15:17,093 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:15:17,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.10 vs. limit=10.0 2024-09-17 04:15:34,822 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.417e+01 1.090e+02 1.182e+02 1.322e+02 3.138e+02, threshold=2.364e+02, percent-clipped=1.0 2024-09-17 04:15:38,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=144660.0, ans=0.125 2024-09-17 04:15:43,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=144700.0, ans=0.2 2024-09-17 04:15:45,080 INFO [train.py:1198] (1/2) Epoch 8, batch 4500, loss[loss=0.3081, ctc_loss=0.2408, cr_loss=0.4334, attn_decoder_loss=0.3059, over 20336.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.204, cr_loss=0.4298, attn_decoder_loss=0.2873, over 5232412.54 frames. ], batch size: 210, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:15:56,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2024-09-17 04:16:18,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=144780.0, ans=0.2 2024-09-17 04:17:10,885 INFO [train.py:1198] (1/2) Epoch 9, batch 0, loss[loss=0.2623, ctc_loss=0.1571, cr_loss=0.3806, attn_decoder_loss=0.2655, over 29651.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1571, cr_loss=0.3806, attn_decoder_loss=0.2655, over 29651.00 frames. ], batch size: 73, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:17:10,885 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 04:17:29,061 INFO [train.py:1230] (1/2) Epoch 9, validation: loss=0.2184, ctc_loss=0.05457, cr_loss=4.594e-15, attn_decoder_loss=0.2366, over 944034.00 frames. 2024-09-17 04:17:29,061 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 04:17:30,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=144800.0, ans=0.125 2024-09-17 04:17:47,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=144840.0, ans=0.125 2024-09-17 04:17:57,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=144840.0, ans=0.0 2024-09-17 04:17:57,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=144840.0, ans=0.1 2024-09-17 04:17:58,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=144840.0, ans=0.0 2024-09-17 04:18:09,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=144880.0, ans=0.2 2024-09-17 04:18:24,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=144920.0, ans=0.125 2024-09-17 04:18:28,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=144920.0, ans=0.125 2024-09-17 04:18:42,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=144960.0, ans=0.125 2024-09-17 04:18:44,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=144960.0, ans=0.0 2024-09-17 04:18:48,436 INFO [train.py:1198] (1/2) Epoch 9, batch 50, loss[loss=0.2575, ctc_loss=0.1799, cr_loss=0.4289, attn_decoder_loss=0.2565, over 29442.00 frames. ], tot_loss[loss=0.2781, ctc_loss=0.1915, cr_loss=0.4223, attn_decoder_loss=0.2783, over 1266815.79 frames. ], batch size: 70, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:19:18,665 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.315e+01 1.028e+02 1.122e+02 1.290e+02 1.269e+03, threshold=2.245e+02, percent-clipped=1.0 2024-09-17 04:19:32,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=145120.0, ans=0.1 2024-09-17 04:19:37,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=145120.0, ans=0.125 2024-09-17 04:19:43,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=145120.0, ans=0.125 2024-09-17 04:19:47,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=145160.0, ans=0.125 2024-09-17 04:20:04,118 INFO [train.py:1198] (1/2) Epoch 9, batch 100, loss[loss=0.2769, ctc_loss=0.1957, cr_loss=0.4224, attn_decoder_loss=0.2766, over 29532.00 frames. ], tot_loss[loss=0.2796, ctc_loss=0.1923, cr_loss=0.4252, attn_decoder_loss=0.2799, over 2251715.09 frames. ], batch size: 76, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:20:07,561 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:20:18,202 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:20:42,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=145280.0, ans=0.2 2024-09-17 04:20:55,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=145320.0, ans=0.125 2024-09-17 04:21:12,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145360.0, ans=0.1 2024-09-17 04:21:19,404 INFO [train.py:1198] (1/2) Epoch 9, batch 150, loss[loss=0.2475, ctc_loss=0.1576, cr_loss=0.3967, attn_decoder_loss=0.2486, over 29436.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1871, cr_loss=0.4199, attn_decoder_loss=0.2765, over 3046311.12 frames. ], batch size: 70, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:21:19,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=145400.0, ans=0.0 2024-09-17 04:21:20,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-09-17 04:21:45,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=145440.0, ans=0.0 2024-09-17 04:21:55,705 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.472e+01 1.015e+02 1.087e+02 1.260e+02 1.994e+02, threshold=2.174e+02, percent-clipped=0.0 2024-09-17 04:22:08,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-17 04:22:11,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-09-17 04:22:18,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=145520.0, ans=0.0 2024-09-17 04:22:25,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-17 04:22:30,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=145560.0, ans=0.125 2024-09-17 04:22:32,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=145560.0, ans=0.125 2024-09-17 04:22:39,821 INFO [train.py:1198] (1/2) Epoch 9, batch 200, loss[loss=0.282, ctc_loss=0.2011, cr_loss=0.4278, attn_decoder_loss=0.2815, over 27638.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1858, cr_loss=0.4189, attn_decoder_loss=0.275, over 3659240.49 frames. ], batch size: 125, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:22:47,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=145600.0, ans=0.125 2024-09-17 04:23:07,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=145640.0, ans=0.025 2024-09-17 04:23:07,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-09-17 04:23:14,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=145680.0, ans=0.0 2024-09-17 04:23:19,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=145680.0, ans=0.95 2024-09-17 04:23:36,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=145720.0, ans=0.125 2024-09-17 04:23:40,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=145760.0, ans=0.0 2024-09-17 04:23:55,897 INFO [train.py:1198] (1/2) Epoch 9, batch 250, loss[loss=0.2963, ctc_loss=0.2099, cr_loss=0.4516, attn_decoder_loss=0.2958, over 29193.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1843, cr_loss=0.4173, attn_decoder_loss=0.2741, over 4141856.27 frames. ], batch size: 100, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:24:05,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=145800.0, ans=0.125 2024-09-17 04:24:10,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.57 vs. limit=15.0 2024-09-17 04:24:29,187 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.608e+01 1.032e+02 1.129e+02 1.433e+02, threshold=2.064e+02, percent-clipped=0.0 2024-09-17 04:24:29,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=145880.0, ans=0.0 2024-09-17 04:25:11,661 INFO [train.py:1198] (1/2) Epoch 9, batch 300, loss[loss=0.2908, ctc_loss=0.1969, cr_loss=0.429, attn_decoder_loss=0.2917, over 29527.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1833, cr_loss=0.4158, attn_decoder_loss=0.2733, over 4509365.65 frames. ], batch size: 92, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:25:51,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=146080.0, ans=0.125 2024-09-17 04:26:00,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=146120.0, ans=0.0 2024-09-17 04:26:03,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=146120.0, ans=0.125 2024-09-17 04:26:16,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=146160.0, ans=0.2 2024-09-17 04:26:32,376 INFO [train.py:1198] (1/2) Epoch 9, batch 350, loss[loss=0.2406, ctc_loss=0.1529, cr_loss=0.3693, attn_decoder_loss=0.2421, over 29313.00 frames. ], tot_loss[loss=0.2727, ctc_loss=0.1834, cr_loss=0.4162, attn_decoder_loss=0.2734, over 4795198.32 frames. ], batch size: 71, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:26:34,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=146200.0, ans=0.025 2024-09-17 04:26:44,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146200.0, ans=0.1 2024-09-17 04:26:46,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=146240.0, ans=0.125 2024-09-17 04:26:55,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=146240.0, ans=0.125 2024-09-17 04:26:56,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=146240.0, ans=0.125 2024-09-17 04:27:06,993 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.424e+01 9.475e+01 1.008e+02 1.084e+02 2.956e+02, threshold=2.017e+02, percent-clipped=2.0 2024-09-17 04:27:27,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.28 vs. limit=10.0 2024-09-17 04:27:33,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=146360.0, ans=0.125 2024-09-17 04:27:41,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=146360.0, ans=0.125 2024-09-17 04:27:48,641 INFO [train.py:1198] (1/2) Epoch 9, batch 400, loss[loss=0.2775, ctc_loss=0.1845, cr_loss=0.4334, attn_decoder_loss=0.2782, over 29705.00 frames. ], tot_loss[loss=0.2721, ctc_loss=0.1825, cr_loss=0.4155, attn_decoder_loss=0.2728, over 5024805.77 frames. ], batch size: 82, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:28:13,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=146440.0, ans=0.125 2024-09-17 04:28:18,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=146480.0, ans=0.025 2024-09-17 04:28:21,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=146480.0, ans=0.0 2024-09-17 04:28:30,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=146480.0, ans=0.125 2024-09-17 04:28:39,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=146520.0, ans=0.125 2024-09-17 04:28:40,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=146520.0, ans=0.125 2024-09-17 04:29:03,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=146600.0, ans=0.125 2024-09-17 04:29:04,924 INFO [train.py:1198] (1/2) Epoch 9, batch 450, loss[loss=0.274, ctc_loss=0.1786, cr_loss=0.3924, attn_decoder_loss=0.2759, over 29706.00 frames. ], tot_loss[loss=0.272, ctc_loss=0.1825, cr_loss=0.4155, attn_decoder_loss=0.2727, over 5187751.78 frames. ], batch size: 83, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:29:31,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=146640.0, ans=0.125 2024-09-17 04:29:46,240 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.248e+01 9.637e+01 1.024e+02 1.129e+02 3.219e+02, threshold=2.049e+02, percent-clipped=1.0 2024-09-17 04:29:51,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=146680.0, ans=0.0 2024-09-17 04:29:57,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=146720.0, ans=0.125 2024-09-17 04:30:17,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=146760.0, ans=0.0 2024-09-17 04:30:24,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=146800.0, ans=0.125 2024-09-17 04:30:25,865 INFO [train.py:1198] (1/2) Epoch 9, batch 500, loss[loss=0.2742, ctc_loss=0.1808, cr_loss=0.4054, attn_decoder_loss=0.2756, over 29432.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1814, cr_loss=0.4138, attn_decoder_loss=0.2717, over 5329581.16 frames. ], batch size: 94, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:30:26,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=146800.0, ans=0.125 2024-09-17 04:30:32,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=146800.0, ans=0.2 2024-09-17 04:30:37,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=146800.0, ans=0.125 2024-09-17 04:30:47,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=146840.0, ans=0.125 2024-09-17 04:30:49,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=146840.0, ans=0.1 2024-09-17 04:31:26,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=146960.0, ans=0.125 2024-09-17 04:31:31,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2024-09-17 04:31:38,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=146960.0, ans=0.125 2024-09-17 04:31:42,467 INFO [train.py:1198] (1/2) Epoch 9, batch 550, loss[loss=0.2813, ctc_loss=0.193, cr_loss=0.4282, attn_decoder_loss=0.2816, over 28752.00 frames. ], tot_loss[loss=0.2711, ctc_loss=0.1818, cr_loss=0.4141, attn_decoder_loss=0.2718, over 5422550.17 frames. ], batch size: 104, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:32:19,205 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.353e+01 9.413e+01 1.021e+02 1.124e+02 5.702e+02, threshold=2.041e+02, percent-clipped=1.0 2024-09-17 04:32:31,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=147120.0, ans=0.0 2024-09-17 04:32:37,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.67 vs. limit=15.0 2024-09-17 04:32:59,457 INFO [train.py:1198] (1/2) Epoch 9, batch 600, loss[loss=0.2822, ctc_loss=0.1858, cr_loss=0.4078, attn_decoder_loss=0.2839, over 29302.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1819, cr_loss=0.4143, attn_decoder_loss=0.2722, over 5509270.10 frames. ], batch size: 100, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:33:19,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=147240.0, ans=0.125 2024-09-17 04:33:19,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=147240.0, ans=10.0 2024-09-17 04:33:44,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.27 vs. limit=22.5 2024-09-17 04:33:51,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.71 vs. limit=15.0 2024-09-17 04:34:06,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=147360.0, ans=0.2 2024-09-17 04:34:09,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=147360.0, ans=0.1 2024-09-17 04:34:14,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=147360.0, ans=0.07 2024-09-17 04:34:20,310 INFO [train.py:1198] (1/2) Epoch 9, batch 650, loss[loss=0.2657, ctc_loss=0.166, cr_loss=0.3901, attn_decoder_loss=0.2681, over 29745.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1806, cr_loss=0.4131, attn_decoder_loss=0.2714, over 5586025.90 frames. ], batch size: 81, lr: 1.27e-02, grad_scale: 4.0 2024-09-17 04:34:20,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=147400.0, ans=0.125 2024-09-17 04:34:32,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=147400.0, ans=0.1 2024-09-17 04:34:35,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=147440.0, ans=0.07 2024-09-17 04:34:40,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=147440.0, ans=0.125 2024-09-17 04:34:54,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=147480.0, ans=0.125 2024-09-17 04:34:58,647 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.437e+01 9.683e+01 1.026e+02 1.151e+02 1.521e+02, threshold=2.052e+02, percent-clipped=0.0 2024-09-17 04:35:16,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-09-17 04:35:19,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.77 vs. limit=15.0 2024-09-17 04:35:29,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=147560.0, ans=0.0 2024-09-17 04:35:36,640 INFO [train.py:1198] (1/2) Epoch 9, batch 700, loss[loss=0.2691, ctc_loss=0.1824, cr_loss=0.4198, attn_decoder_loss=0.2695, over 29518.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1816, cr_loss=0.415, attn_decoder_loss=0.2722, over 5636280.61 frames. ], batch size: 76, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:35:52,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=147640.0, ans=0.0 2024-09-17 04:35:56,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=147640.0, ans=0.1 2024-09-17 04:36:01,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=147640.0, ans=0.125 2024-09-17 04:36:19,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=147680.0, ans=0.0 2024-09-17 04:36:33,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=147720.0, ans=0.09899494936611666 2024-09-17 04:36:33,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=147720.0, ans=0.0 2024-09-17 04:36:52,804 INFO [train.py:1198] (1/2) Epoch 9, batch 750, loss[loss=0.2598, ctc_loss=0.1617, cr_loss=0.3821, attn_decoder_loss=0.2622, over 29716.00 frames. ], tot_loss[loss=0.2711, ctc_loss=0.1813, cr_loss=0.4146, attn_decoder_loss=0.2719, over 5674886.00 frames. ], batch size: 82, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:36:56,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=147800.0, ans=0.025 2024-09-17 04:37:05,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=147800.0, ans=0.2 2024-09-17 04:37:23,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=147880.0, ans=0.125 2024-09-17 04:37:37,021 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.147e+01 9.690e+01 1.045e+02 1.120e+02 4.390e+02, threshold=2.090e+02, percent-clipped=1.0 2024-09-17 04:37:38,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=147880.0, ans=0.025 2024-09-17 04:37:38,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=147880.0, ans=0.125 2024-09-17 04:38:13,960 INFO [train.py:1198] (1/2) Epoch 9, batch 800, loss[loss=0.2539, ctc_loss=0.1631, cr_loss=0.3909, attn_decoder_loss=0.2553, over 29606.00 frames. ], tot_loss[loss=0.2711, ctc_loss=0.1813, cr_loss=0.4146, attn_decoder_loss=0.2719, over 5705865.30 frames. ], batch size: 73, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:38:17,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=148000.0, ans=0.125 2024-09-17 04:38:35,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=148040.0, ans=0.125 2024-09-17 04:38:50,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=148080.0, ans=0.0 2024-09-17 04:39:18,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=148160.0, ans=0.025 2024-09-17 04:39:29,746 INFO [train.py:1198] (1/2) Epoch 9, batch 850, loss[loss=0.2838, ctc_loss=0.1899, cr_loss=0.4243, attn_decoder_loss=0.2848, over 29728.00 frames. ], tot_loss[loss=0.2711, ctc_loss=0.1813, cr_loss=0.4147, attn_decoder_loss=0.2719, over 5735268.33 frames. ], batch size: 89, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:39:45,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-09-17 04:39:50,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=148240.0, ans=0.125 2024-09-17 04:40:03,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2024-09-17 04:40:04,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=148280.0, ans=0.125 2024-09-17 04:40:09,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=148280.0, ans=0.05 2024-09-17 04:40:10,332 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.251e+01 9.624e+01 1.050e+02 1.134e+02 2.702e+02, threshold=2.101e+02, percent-clipped=1.0 2024-09-17 04:40:23,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.12 vs. limit=15.0 2024-09-17 04:40:30,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=148360.0, ans=0.025 2024-09-17 04:40:33,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=148360.0, ans=0.07 2024-09-17 04:40:36,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148360.0, ans=0.1 2024-09-17 04:40:39,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=148360.0, ans=0.025 2024-09-17 04:40:45,748 INFO [train.py:1198] (1/2) Epoch 9, batch 900, loss[loss=0.2433, ctc_loss=0.1571, cr_loss=0.3765, attn_decoder_loss=0.2445, over 29589.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1814, cr_loss=0.4151, attn_decoder_loss=0.272, over 5740660.39 frames. ], batch size: 73, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:41:04,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=148440.0, ans=0.025 2024-09-17 04:41:04,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=148440.0, ans=0.125 2024-09-17 04:41:26,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=148480.0, ans=0.0 2024-09-17 04:41:47,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=148520.0, ans=0.05 2024-09-17 04:41:48,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=148520.0, ans=0.0 2024-09-17 04:42:06,875 INFO [train.py:1198] (1/2) Epoch 9, batch 950, loss[loss=0.2546, ctc_loss=0.1641, cr_loss=0.3917, attn_decoder_loss=0.256, over 29493.00 frames. ], tot_loss[loss=0.2713, ctc_loss=0.181, cr_loss=0.4142, attn_decoder_loss=0.2722, over 5742983.50 frames. ], batch size: 74, lr: 1.27e-02, grad_scale: 4.0 2024-09-17 04:42:30,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2024-09-17 04:42:34,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=148640.0, ans=0.0 2024-09-17 04:42:43,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=148680.0, ans=0.125 2024-09-17 04:42:49,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.631e+01 1.018e+02 1.126e+02 1.313e+02 4.383e+02, threshold=2.253e+02, percent-clipped=5.0 2024-09-17 04:42:50,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=148680.0, ans=0.125 2024-09-17 04:43:23,703 INFO [train.py:1198] (1/2) Epoch 9, batch 1000, loss[loss=0.26, ctc_loss=0.1727, cr_loss=0.4159, attn_decoder_loss=0.2604, over 29515.00 frames. ], tot_loss[loss=0.272, ctc_loss=0.182, cr_loss=0.4148, attn_decoder_loss=0.2728, over 5736410.92 frames. ], batch size: 77, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:43:25,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=148800.0, ans=0.125 2024-09-17 04:43:40,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=148840.0, ans=0.125 2024-09-17 04:44:39,600 INFO [train.py:1198] (1/2) Epoch 9, batch 1050, loss[loss=0.2772, ctc_loss=0.1869, cr_loss=0.4333, attn_decoder_loss=0.2776, over 29671.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1811, cr_loss=0.4128, attn_decoder_loss=0.272, over 5743933.85 frames. ], batch size: 85, lr: 1.27e-02, grad_scale: 4.0 2024-09-17 04:44:49,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=149000.0, ans=0.0 2024-09-17 04:44:50,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149000.0, ans=0.1 2024-09-17 04:45:12,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=12.0 2024-09-17 04:45:26,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 9.706e+01 1.051e+02 1.142e+02 2.250e+02, threshold=2.101e+02, percent-clipped=0.0 2024-09-17 04:46:00,304 INFO [train.py:1198] (1/2) Epoch 9, batch 1100, loss[loss=0.2677, ctc_loss=0.1795, cr_loss=0.4101, attn_decoder_loss=0.2684, over 29469.00 frames. ], tot_loss[loss=0.2707, ctc_loss=0.1807, cr_loss=0.4127, attn_decoder_loss=0.2715, over 5756952.76 frames. ], batch size: 78, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:46:06,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=149200.0, ans=0.125 2024-09-17 04:46:59,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=149360.0, ans=0.025 2024-09-17 04:47:07,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=149360.0, ans=0.125 2024-09-17 04:47:11,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=149360.0, ans=0.125 2024-09-17 04:47:13,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2024-09-17 04:47:16,064 INFO [train.py:1198] (1/2) Epoch 9, batch 1150, loss[loss=0.2614, ctc_loss=0.1759, cr_loss=0.4024, attn_decoder_loss=0.262, over 29408.00 frames. ], tot_loss[loss=0.2708, ctc_loss=0.1809, cr_loss=0.4132, attn_decoder_loss=0.2716, over 5754260.11 frames. ], batch size: 78, lr: 1.26e-02, grad_scale: 4.0 2024-09-17 04:47:16,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=149400.0, ans=0.1 2024-09-17 04:47:16,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.69 vs. limit=15.0 2024-09-17 04:47:21,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149400.0, ans=0.0 2024-09-17 04:48:01,928 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.308e+01 9.807e+01 1.085e+02 1.342e+02 2.441e+02, threshold=2.171e+02, percent-clipped=4.0 2024-09-17 04:48:05,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=149520.0, ans=0.125 2024-09-17 04:48:19,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=149560.0, ans=0.125 2024-09-17 04:48:33,335 INFO [train.py:1198] (1/2) Epoch 9, batch 1200, loss[loss=0.2772, ctc_loss=0.1855, cr_loss=0.4142, attn_decoder_loss=0.2782, over 29673.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1817, cr_loss=0.4139, attn_decoder_loss=0.2725, over 5745955.91 frames. ], batch size: 85, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:48:35,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=149600.0, ans=0.025 2024-09-17 04:48:54,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=149640.0, ans=10.0 2024-09-17 04:48:57,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=149640.0, ans=0.125 2024-09-17 04:48:57,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=149640.0, ans=0.0 2024-09-17 04:49:27,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=149720.0, ans=0.125 2024-09-17 04:49:30,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=149720.0, ans=0.125 2024-09-17 04:49:50,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=149760.0, ans=0.0 2024-09-17 04:49:52,905 INFO [train.py:1198] (1/2) Epoch 9, batch 1250, loss[loss=0.2866, ctc_loss=0.1962, cr_loss=0.4539, attn_decoder_loss=0.2866, over 29538.00 frames. ], tot_loss[loss=0.2723, ctc_loss=0.1817, cr_loss=0.4152, attn_decoder_loss=0.2731, over 5774492.94 frames. ], batch size: 92, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:49:56,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=149800.0, ans=0.125 2024-09-17 04:50:04,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=149800.0, ans=15.0 2024-09-17 04:50:10,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=149840.0, ans=0.125 2024-09-17 04:50:11,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=149840.0, ans=0.0 2024-09-17 04:50:22,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149880.0, ans=0.1 2024-09-17 04:50:22,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.95 vs. limit=22.5 2024-09-17 04:50:35,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149880.0, ans=0.125 2024-09-17 04:50:38,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 9.591e+01 1.046e+02 1.160e+02 1.832e+02, threshold=2.092e+02, percent-clipped=0.0 2024-09-17 04:50:41,897 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:50:49,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=149920.0, ans=0.0 2024-09-17 04:51:00,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=149960.0, ans=0.125 2024-09-17 04:51:08,746 INFO [train.py:1198] (1/2) Epoch 9, batch 1300, loss[loss=0.2793, ctc_loss=0.1873, cr_loss=0.4153, attn_decoder_loss=0.2803, over 28076.00 frames. ], tot_loss[loss=0.2715, ctc_loss=0.1811, cr_loss=0.4141, attn_decoder_loss=0.2723, over 5779760.84 frames. ], batch size: 111, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:51:24,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2024-09-17 04:51:33,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=150040.0, ans=0.0 2024-09-17 04:51:45,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=150080.0, ans=0.125 2024-09-17 04:51:51,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=150080.0, ans=0.125 2024-09-17 04:52:05,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=150120.0, ans=0.125 2024-09-17 04:52:15,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=150160.0, ans=0.125 2024-09-17 04:52:21,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=150160.0, ans=0.05 2024-09-17 04:52:24,370 INFO [train.py:1198] (1/2) Epoch 9, batch 1350, loss[loss=0.2673, ctc_loss=0.169, cr_loss=0.4103, attn_decoder_loss=0.2691, over 29755.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1804, cr_loss=0.4138, attn_decoder_loss=0.2718, over 5798251.29 frames. ], batch size: 81, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:52:30,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=150200.0, ans=0.0 2024-09-17 04:52:36,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=150200.0, ans=0.125 2024-09-17 04:53:01,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.89 vs. limit=22.5 2024-09-17 04:53:11,517 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.083e+01 9.658e+01 1.049e+02 1.137e+02 1.500e+02, threshold=2.097e+02, percent-clipped=0.0 2024-09-17 04:53:12,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-09-17 04:53:19,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=150320.0, ans=0.0 2024-09-17 04:53:19,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=150320.0, ans=0.125 2024-09-17 04:53:22,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150320.0, ans=0.1 2024-09-17 04:53:44,221 INFO [train.py:1198] (1/2) Epoch 9, batch 1400, loss[loss=0.2333, ctc_loss=0.1533, cr_loss=0.3759, attn_decoder_loss=0.2338, over 29574.00 frames. ], tot_loss[loss=0.2704, ctc_loss=0.1798, cr_loss=0.4131, attn_decoder_loss=0.2713, over 5809140.70 frames. ], batch size: 69, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:54:12,932 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:54:18,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=150480.0, ans=0.0 2024-09-17 04:54:46,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=150560.0, ans=0.0 2024-09-17 04:54:59,194 INFO [train.py:1198] (1/2) Epoch 9, batch 1450, loss[loss=0.2836, ctc_loss=0.181, cr_loss=0.4167, attn_decoder_loss=0.2857, over 29392.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1801, cr_loss=0.4139, attn_decoder_loss=0.2719, over 5805107.72 frames. ], batch size: 94, lr: 1.26e-02, grad_scale: 4.0 2024-09-17 04:54:59,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=150600.0, ans=0.125 2024-09-17 04:55:13,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.99 vs. limit=15.0 2024-09-17 04:55:16,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=150640.0, ans=0.125 2024-09-17 04:55:32,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=150680.0, ans=0.125 2024-09-17 04:55:45,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.347e+01 1.006e+02 1.117e+02 1.243e+02 2.760e+02, threshold=2.234e+02, percent-clipped=2.0 2024-09-17 04:56:06,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2024-09-17 04:56:10,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-09-17 04:56:14,388 INFO [train.py:1198] (1/2) Epoch 9, batch 1500, loss[loss=0.2809, ctc_loss=0.1874, cr_loss=0.439, attn_decoder_loss=0.2816, over 29624.00 frames. ], tot_loss[loss=0.2715, ctc_loss=0.1805, cr_loss=0.4143, attn_decoder_loss=0.2724, over 5805090.56 frames. ], batch size: 86, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:56:20,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=150800.0, ans=0.125 2024-09-17 04:56:28,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=150840.0, ans=0.025 2024-09-17 04:56:42,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=150840.0, ans=0.125 2024-09-17 04:56:43,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.55 vs. limit=10.0 2024-09-17 04:56:47,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=150880.0, ans=0.125 2024-09-17 04:57:03,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2024-09-17 04:57:06,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2024-09-17 04:57:09,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=150920.0, ans=0.125 2024-09-17 04:57:27,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-09-17 04:57:34,242 INFO [train.py:1198] (1/2) Epoch 9, batch 1550, loss[loss=0.2838, ctc_loss=0.1886, cr_loss=0.4046, attn_decoder_loss=0.2854, over 29502.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1807, cr_loss=0.4142, attn_decoder_loss=0.2723, over 5781370.29 frames. ], batch size: 90, lr: 1.26e-02, grad_scale: 4.0 2024-09-17 04:57:45,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=22.5 2024-09-17 04:57:58,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=151040.0, ans=0.125 2024-09-17 04:58:16,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.47 vs. limit=15.0 2024-09-17 04:58:19,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=151120.0, ans=0.0 2024-09-17 04:58:22,181 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.049e+01 9.832e+01 1.106e+02 1.253e+02 2.763e+02, threshold=2.212e+02, percent-clipped=1.0 2024-09-17 04:58:40,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=151160.0, ans=0.125 2024-09-17 04:58:47,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=22.5 2024-09-17 04:58:49,790 INFO [train.py:1198] (1/2) Epoch 9, batch 1600, loss[loss=0.2718, ctc_loss=0.1773, cr_loss=0.4217, attn_decoder_loss=0.2729, over 29676.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.181, cr_loss=0.4147, attn_decoder_loss=0.2725, over 5764414.90 frames. ], batch size: 85, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:58:50,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=151200.0, ans=0.125 2024-09-17 04:58:57,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=151200.0, ans=0.0 2024-09-17 04:59:02,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.62 vs. limit=15.0 2024-09-17 04:59:08,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=151240.0, ans=0.125 2024-09-17 04:59:18,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151280.0, ans=0.1 2024-09-17 04:59:40,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2024-09-17 04:59:53,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=151360.0, ans=0.125 2024-09-17 05:00:00,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.27 vs. limit=22.5 2024-09-17 05:00:00,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=151360.0, ans=0.0 2024-09-17 05:00:05,104 INFO [train.py:1198] (1/2) Epoch 9, batch 1650, loss[loss=0.2741, ctc_loss=0.1829, cr_loss=0.4031, attn_decoder_loss=0.2753, over 29700.00 frames. ], tot_loss[loss=0.2713, ctc_loss=0.1807, cr_loss=0.414, attn_decoder_loss=0.2722, over 5759747.46 frames. ], batch size: 89, lr: 1.26e-02, grad_scale: 4.0 2024-09-17 05:00:05,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=151400.0, ans=0.0 2024-09-17 05:00:12,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=151400.0, ans=0.0 2024-09-17 05:00:15,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=151400.0, ans=10.0 2024-09-17 05:00:57,020 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.589e+01 1.020e+02 1.089e+02 1.544e+02, threshold=2.040e+02, percent-clipped=0.0 2024-09-17 05:01:15,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=151560.0, ans=0.125 2024-09-17 05:01:16,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=151560.0, ans=0.125 2024-09-17 05:01:16,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=151560.0, ans=0.0 2024-09-17 05:01:23,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.61 vs. limit=10.0 2024-09-17 05:01:24,480 INFO [train.py:1198] (1/2) Epoch 9, batch 1700, loss[loss=0.2429, ctc_loss=0.1514, cr_loss=0.3807, attn_decoder_loss=0.2446, over 29550.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1802, cr_loss=0.4142, attn_decoder_loss=0.2719, over 5780599.69 frames. ], batch size: 69, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 05:01:25,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2024-09-17 05:01:53,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=151680.0, ans=10.0 2024-09-17 05:01:54,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=151680.0, ans=0.0 2024-09-17 05:02:06,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=151680.0, ans=0.5 2024-09-17 05:02:08,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=151720.0, ans=0.2 2024-09-17 05:02:36,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=151760.0, ans=0.1 2024-09-17 05:02:39,563 INFO [train.py:1198] (1/2) Epoch 9, batch 1750, loss[loss=0.2325, ctc_loss=0.1542, cr_loss=0.3765, attn_decoder_loss=0.2329, over 29360.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.1793, cr_loss=0.4126, attn_decoder_loss=0.271, over 5788504.33 frames. ], batch size: 67, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:02:44,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=151800.0, ans=0.125 2024-09-17 05:03:11,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=151880.0, ans=0.125 2024-09-17 05:03:27,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.90 vs. limit=15.0 2024-09-17 05:03:30,571 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.963e+01 9.433e+01 1.015e+02 1.120e+02 2.449e+02, threshold=2.030e+02, percent-clipped=1.0 2024-09-17 05:03:33,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-09-17 05:03:35,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=12.0 2024-09-17 05:03:41,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=151960.0, ans=0.05 2024-09-17 05:03:54,781 INFO [train.py:1198] (1/2) Epoch 9, batch 1800, loss[loss=0.2845, ctc_loss=0.1942, cr_loss=0.423, attn_decoder_loss=0.2851, over 29707.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.18, cr_loss=0.4131, attn_decoder_loss=0.2715, over 5790897.89 frames. ], batch size: 83, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:03:59,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=152000.0, ans=0.125 2024-09-17 05:04:10,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=152040.0, ans=0.0 2024-09-17 05:04:23,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=152080.0, ans=0.0 2024-09-17 05:04:27,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-09-17 05:04:52,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=152120.0, ans=0.1 2024-09-17 05:04:54,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=152120.0, ans=0.0 2024-09-17 05:04:58,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152160.0, ans=0.1 2024-09-17 05:05:03,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=152160.0, ans=0.125 2024-09-17 05:05:12,096 INFO [train.py:1198] (1/2) Epoch 9, batch 1850, loss[loss=0.2863, ctc_loss=0.1997, cr_loss=0.4287, attn_decoder_loss=0.2864, over 29619.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1798, cr_loss=0.4131, attn_decoder_loss=0.2714, over 5796565.53 frames. ], batch size: 86, lr: 1.25e-02, grad_scale: 4.0 2024-09-17 05:05:30,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=152240.0, ans=0.125 2024-09-17 05:05:34,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2024-09-17 05:05:45,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=152280.0, ans=0.1 2024-09-17 05:05:46,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=152280.0, ans=0.0 2024-09-17 05:05:55,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.52 vs. limit=22.5 2024-09-17 05:06:06,697 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.561e+01 1.000e+02 1.112e+02 1.269e+02 1.875e+02, threshold=2.225e+02, percent-clipped=0.0 2024-09-17 05:06:29,014 INFO [train.py:1198] (1/2) Epoch 9, batch 1900, loss[loss=0.286, ctc_loss=0.1871, cr_loss=0.4318, attn_decoder_loss=0.2874, over 29722.00 frames. ], tot_loss[loss=0.2713, ctc_loss=0.1805, cr_loss=0.4145, attn_decoder_loss=0.2722, over 5804574.49 frames. ], batch size: 89, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:06:29,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=152400.0, ans=0.125 2024-09-17 05:06:40,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.30 vs. limit=22.5 2024-09-17 05:06:44,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=152440.0, ans=0.125 2024-09-17 05:07:01,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=152480.0, ans=0.125 2024-09-17 05:07:04,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=152480.0, ans=0.2 2024-09-17 05:07:34,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=152560.0, ans=0.125 2024-09-17 05:07:38,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=152560.0, ans=0.0 2024-09-17 05:07:44,262 INFO [train.py:1198] (1/2) Epoch 9, batch 1950, loss[loss=0.2618, ctc_loss=0.1774, cr_loss=0.4065, attn_decoder_loss=0.2622, over 29482.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1813, cr_loss=0.4154, attn_decoder_loss=0.2733, over 5818749.34 frames. ], batch size: 78, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:07:46,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=152600.0, ans=0.125 2024-09-17 05:08:10,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-17 05:08:38,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2024-09-17 05:08:40,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 9.742e+01 1.027e+02 1.111e+02 1.388e+02, threshold=2.054e+02, percent-clipped=0.0 2024-09-17 05:08:49,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=152760.0, ans=0.125 2024-09-17 05:08:57,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-09-17 05:09:01,813 INFO [train.py:1198] (1/2) Epoch 9, batch 2000, loss[loss=0.2393, ctc_loss=0.1553, cr_loss=0.4053, attn_decoder_loss=0.2396, over 29345.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1819, cr_loss=0.4167, attn_decoder_loss=0.2738, over 5795500.62 frames. ], batch size: 67, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:09:23,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=152840.0, ans=0.125 2024-09-17 05:09:31,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=152840.0, ans=0.125 2024-09-17 05:09:31,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=152840.0, ans=0.2 2024-09-17 05:09:43,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=152880.0, ans=0.09899494936611666 2024-09-17 05:10:04,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=152960.0, ans=0.0 2024-09-17 05:10:11,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=152960.0, ans=0.2 2024-09-17 05:10:19,048 INFO [train.py:1198] (1/2) Epoch 9, batch 2050, loss[loss=0.2381, ctc_loss=0.1538, cr_loss=0.3855, attn_decoder_loss=0.2389, over 29441.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1808, cr_loss=0.4146, attn_decoder_loss=0.2726, over 5787993.73 frames. ], batch size: 70, lr: 1.25e-02, grad_scale: 4.0 2024-09-17 05:10:20,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=153000.0, ans=0.125 2024-09-17 05:10:29,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153000.0, ans=0.1 2024-09-17 05:10:51,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=153080.0, ans=0.125 2024-09-17 05:10:51,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=153080.0, ans=0.5 2024-09-17 05:11:14,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-09-17 05:11:15,050 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.385e+01 9.413e+01 1.004e+02 1.102e+02 4.512e+02, threshold=2.009e+02, percent-clipped=3.0 2024-09-17 05:11:18,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=153160.0, ans=0.125 2024-09-17 05:11:33,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=153200.0, ans=0.125 2024-09-17 05:11:34,699 INFO [train.py:1198] (1/2) Epoch 9, batch 2100, loss[loss=0.2813, ctc_loss=0.1921, cr_loss=0.4389, attn_decoder_loss=0.2814, over 29747.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1801, cr_loss=0.4133, attn_decoder_loss=0.2719, over 5799087.36 frames. ], batch size: 81, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:11:46,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.56 vs. limit=10.0 2024-09-17 05:11:55,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=153240.0, ans=0.125 2024-09-17 05:12:02,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=153240.0, ans=0.0 2024-09-17 05:12:09,508 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:12:39,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=153360.0, ans=0.125 2024-09-17 05:12:39,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=153360.0, ans=0.0 2024-09-17 05:12:39,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=153360.0, ans=0.125 2024-09-17 05:12:48,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=153360.0, ans=0.2 2024-09-17 05:12:51,521 INFO [train.py:1198] (1/2) Epoch 9, batch 2150, loss[loss=0.2638, ctc_loss=0.1751, cr_loss=0.4112, attn_decoder_loss=0.2645, over 29443.00 frames. ], tot_loss[loss=0.2703, ctc_loss=0.1793, cr_loss=0.4124, attn_decoder_loss=0.2712, over 5813603.10 frames. ], batch size: 78, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:13:17,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-09-17 05:13:21,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.25 vs. limit=15.0 2024-09-17 05:13:38,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.95 vs. limit=22.5 2024-09-17 05:13:51,014 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 9.836e+01 1.055e+02 1.144e+02 2.218e+02, threshold=2.111e+02, percent-clipped=2.0 2024-09-17 05:13:54,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=153560.0, ans=0.125 2024-09-17 05:14:09,669 INFO [train.py:1198] (1/2) Epoch 9, batch 2200, loss[loss=0.2889, ctc_loss=0.1884, cr_loss=0.425, attn_decoder_loss=0.2906, over 29606.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1794, cr_loss=0.4125, attn_decoder_loss=0.2715, over 5810278.91 frames. ], batch size: 86, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:14:20,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=153600.0, ans=0.125 2024-09-17 05:14:29,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=153640.0, ans=0.125 2024-09-17 05:14:30,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=153640.0, ans=0.1 2024-09-17 05:14:30,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=153640.0, ans=0.125 2024-09-17 05:14:41,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=153680.0, ans=0.125 2024-09-17 05:14:43,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=153680.0, ans=0.0 2024-09-17 05:15:02,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=153720.0, ans=0.0 2024-09-17 05:15:25,270 INFO [train.py:1198] (1/2) Epoch 9, batch 2250, loss[loss=0.2639, ctc_loss=0.1696, cr_loss=0.3806, attn_decoder_loss=0.2659, over 29700.00 frames. ], tot_loss[loss=0.2703, ctc_loss=0.1789, cr_loss=0.4123, attn_decoder_loss=0.2713, over 5809331.57 frames. ], batch size: 82, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:15:30,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=153800.0, ans=0.125 2024-09-17 05:15:31,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=153800.0, ans=0.125 2024-09-17 05:15:40,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=153840.0, ans=0.125 2024-09-17 05:15:54,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=153880.0, ans=12.0 2024-09-17 05:16:24,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.430e+01 9.555e+01 1.015e+02 1.096e+02 3.730e+02, threshold=2.031e+02, percent-clipped=3.0 2024-09-17 05:16:26,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=153960.0, ans=0.1 2024-09-17 05:16:30,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-09-17 05:16:39,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=153960.0, ans=0.125 2024-09-17 05:16:42,434 INFO [train.py:1198] (1/2) Epoch 9, batch 2300, loss[loss=0.2432, ctc_loss=0.1636, cr_loss=0.4104, attn_decoder_loss=0.2429, over 29339.00 frames. ], tot_loss[loss=0.2694, ctc_loss=0.1783, cr_loss=0.4115, attn_decoder_loss=0.2704, over 5797439.34 frames. ], batch size: 71, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:16:42,745 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:16:51,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154000.0, ans=0.125 2024-09-17 05:17:12,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=154040.0, ans=0.025 2024-09-17 05:17:50,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=154160.0, ans=0.025 2024-09-17 05:18:02,036 INFO [train.py:1198] (1/2) Epoch 9, batch 2350, loss[loss=0.2782, ctc_loss=0.1811, cr_loss=0.4441, attn_decoder_loss=0.2791, over 29691.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1779, cr_loss=0.411, attn_decoder_loss=0.2702, over 5802564.28 frames. ], batch size: 83, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:18:11,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=154200.0, ans=0.125 2024-09-17 05:18:12,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154200.0, ans=0.125 2024-09-17 05:18:20,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=154240.0, ans=0.125 2024-09-17 05:18:26,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=154240.0, ans=0.025 2024-09-17 05:18:31,018 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:18:40,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=154280.0, ans=0.125 2024-09-17 05:18:55,399 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:18:59,585 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.337e+01 9.484e+01 1.020e+02 1.101e+02 1.845e+02, threshold=2.040e+02, percent-clipped=0.0 2024-09-17 05:19:17,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-17 05:19:18,463 INFO [train.py:1198] (1/2) Epoch 9, batch 2400, loss[loss=0.259, ctc_loss=0.17, cr_loss=0.4139, attn_decoder_loss=0.2597, over 29535.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1783, cr_loss=0.4119, attn_decoder_loss=0.271, over 5806691.18 frames. ], batch size: 76, lr: 1.24e-02, grad_scale: 16.0 2024-09-17 05:19:24,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=154400.0, ans=0.125 2024-09-17 05:19:52,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=154480.0, ans=0.5 2024-09-17 05:20:36,508 INFO [train.py:1198] (1/2) Epoch 9, batch 2450, loss[loss=0.2766, ctc_loss=0.1961, cr_loss=0.4404, attn_decoder_loss=0.2758, over 29699.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1795, cr_loss=0.4122, attn_decoder_loss=0.272, over 5785022.21 frames. ], batch size: 82, lr: 1.24e-02, grad_scale: 4.0 2024-09-17 05:20:50,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=154640.0, ans=0.125 2024-09-17 05:20:57,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=154640.0, ans=0.125 2024-09-17 05:20:57,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=154640.0, ans=0.2 2024-09-17 05:21:34,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.10 vs. limit=15.0 2024-09-17 05:21:38,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.373e+01 9.786e+01 1.038e+02 1.229e+02 2.658e+02, threshold=2.076e+02, percent-clipped=2.0 2024-09-17 05:21:41,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=12.0 2024-09-17 05:21:49,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2024-09-17 05:21:50,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.50 vs. limit=15.0 2024-09-17 05:21:53,783 INFO [train.py:1198] (1/2) Epoch 9, batch 2500, loss[loss=0.2711, ctc_loss=0.1724, cr_loss=0.399, attn_decoder_loss=0.2732, over 29636.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1793, cr_loss=0.4127, attn_decoder_loss=0.2716, over 5794998.65 frames. ], batch size: 86, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:21:57,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.78 vs. limit=10.0 2024-09-17 05:22:00,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.37 vs. limit=12.0 2024-09-17 05:22:30,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=154880.0, ans=0.2 2024-09-17 05:22:47,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=154920.0, ans=0.2 2024-09-17 05:23:09,598 INFO [train.py:1198] (1/2) Epoch 9, batch 2550, loss[loss=0.2477, ctc_loss=0.1591, cr_loss=0.3961, attn_decoder_loss=0.2488, over 29340.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1792, cr_loss=0.4123, attn_decoder_loss=0.2715, over 5798202.33 frames. ], batch size: 67, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:23:12,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=155000.0, ans=0.125 2024-09-17 05:23:15,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155000.0, ans=0.1 2024-09-17 05:23:40,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=155080.0, ans=0.0 2024-09-17 05:23:43,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=12.0 2024-09-17 05:24:09,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=155120.0, ans=0.125 2024-09-17 05:24:12,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.204e+01 9.986e+01 1.053e+02 1.251e+02 2.083e+02, threshold=2.107e+02, percent-clipped=1.0 2024-09-17 05:24:22,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-17 05:24:28,037 INFO [train.py:1198] (1/2) Epoch 9, batch 2600, loss[loss=0.2656, ctc_loss=0.1742, cr_loss=0.4019, attn_decoder_loss=0.2668, over 29411.00 frames. ], tot_loss[loss=0.2708, ctc_loss=0.1792, cr_loss=0.4128, attn_decoder_loss=0.2718, over 5793612.82 frames. ], batch size: 78, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:24:31,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155200.0, ans=0.1 2024-09-17 05:24:42,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155240.0, ans=0.1 2024-09-17 05:24:46,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=155240.0, ans=0.0 2024-09-17 05:24:53,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=155240.0, ans=0.125 2024-09-17 05:25:00,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=155280.0, ans=0.05 2024-09-17 05:25:19,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.72 vs. limit=15.0 2024-09-17 05:25:26,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=155320.0, ans=0.0 2024-09-17 05:25:32,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=155360.0, ans=0.125 2024-09-17 05:25:35,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=155360.0, ans=0.0 2024-09-17 05:25:45,384 INFO [train.py:1198] (1/2) Epoch 9, batch 2650, loss[loss=0.2923, ctc_loss=0.197, cr_loss=0.4406, attn_decoder_loss=0.2931, over 29225.00 frames. ], tot_loss[loss=0.2713, ctc_loss=0.1795, cr_loss=0.413, attn_decoder_loss=0.2724, over 5800223.01 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 4.0 2024-09-17 05:25:59,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=155440.0, ans=0.125 2024-09-17 05:26:11,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=155440.0, ans=0.0 2024-09-17 05:26:32,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-09-17 05:26:37,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=155520.0, ans=0.09899494936611666 2024-09-17 05:26:47,529 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.070e+01 9.715e+01 1.022e+02 1.111e+02 3.079e+02, threshold=2.044e+02, percent-clipped=1.0 2024-09-17 05:26:49,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=155560.0, ans=0.0 2024-09-17 05:27:01,161 INFO [train.py:1198] (1/2) Epoch 9, batch 2700, loss[loss=0.2758, ctc_loss=0.1798, cr_loss=0.4073, attn_decoder_loss=0.2774, over 29538.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1793, cr_loss=0.4131, attn_decoder_loss=0.2724, over 5796703.11 frames. ], batch size: 87, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:27:13,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=155600.0, ans=0.125 2024-09-17 05:27:14,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=155640.0, ans=0.125 2024-09-17 05:27:32,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys.whitening_limit, batch_count=155680.0, ans=6.0 2024-09-17 05:27:49,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=155720.0, ans=0.125 2024-09-17 05:27:50,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=155720.0, ans=0.2 2024-09-17 05:28:14,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=155760.0, ans=0.2 2024-09-17 05:28:15,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2024-09-17 05:28:19,242 INFO [train.py:1198] (1/2) Epoch 9, batch 2750, loss[loss=0.2538, ctc_loss=0.1698, cr_loss=0.404, attn_decoder_loss=0.2542, over 29503.00 frames. ], tot_loss[loss=0.2702, ctc_loss=0.1788, cr_loss=0.4123, attn_decoder_loss=0.2712, over 5795249.60 frames. ], batch size: 75, lr: 1.24e-02, grad_scale: 4.0 2024-09-17 05:28:43,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=155840.0, ans=0.0 2024-09-17 05:29:02,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-09-17 05:29:03,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=155880.0, ans=0.09899494936611666 2024-09-17 05:29:05,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=155920.0, ans=0.07 2024-09-17 05:29:08,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155920.0, ans=0.1 2024-09-17 05:29:25,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.518e+01 1.047e+02 1.158e+02 3.298e+02, threshold=2.093e+02, percent-clipped=1.0 2024-09-17 05:29:38,021 INFO [train.py:1198] (1/2) Epoch 9, batch 2800, loss[loss=0.3132, ctc_loss=0.2523, cr_loss=0.4335, attn_decoder_loss=0.3103, over 20293.00 frames. ], tot_loss[loss=0.2703, ctc_loss=0.1792, cr_loss=0.4127, attn_decoder_loss=0.2713, over 5776309.31 frames. ], batch size: 209, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:30:08,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=156080.0, ans=0.125 2024-09-17 05:30:35,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=156120.0, ans=0.125 2024-09-17 05:30:44,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=156160.0, ans=0.2 2024-09-17 05:30:48,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=156160.0, ans=0.0 2024-09-17 05:30:48,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156160.0, ans=0.1 2024-09-17 05:30:52,939 INFO [train.py:1198] (1/2) Epoch 9, batch 2850, loss[loss=0.2561, ctc_loss=0.1695, cr_loss=0.3884, attn_decoder_loss=0.2571, over 29471.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1807, cr_loss=0.4146, attn_decoder_loss=0.2721, over 5762599.73 frames. ], batch size: 77, lr: 1.24e-02, grad_scale: 4.0 2024-09-17 05:31:17,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=156240.0, ans=0.04949747468305833 2024-09-17 05:31:18,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=156240.0, ans=0.0 2024-09-17 05:31:25,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=156280.0, ans=0.0 2024-09-17 05:32:00,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.550e+01 9.773e+01 1.033e+02 1.202e+02 1.627e+02, threshold=2.066e+02, percent-clipped=0.0 2024-09-17 05:32:10,733 INFO [train.py:1198] (1/2) Epoch 9, batch 2900, loss[loss=0.2662, ctc_loss=0.17, cr_loss=0.4256, attn_decoder_loss=0.2675, over 29437.00 frames. ], tot_loss[loss=0.2721, ctc_loss=0.1809, cr_loss=0.4163, attn_decoder_loss=0.273, over 5787951.73 frames. ], batch size: 79, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:32:13,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2024-09-17 05:32:33,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-09-17 05:32:51,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=156480.0, ans=0.125 2024-09-17 05:32:52,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=156480.0, ans=0.2 2024-09-17 05:33:18,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=156560.0, ans=0.2 2024-09-17 05:33:19,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=156560.0, ans=0.0 2024-09-17 05:33:28,637 INFO [train.py:1198] (1/2) Epoch 9, batch 2950, loss[loss=0.2556, ctc_loss=0.1654, cr_loss=0.3892, attn_decoder_loss=0.257, over 29511.00 frames. ], tot_loss[loss=0.2708, ctc_loss=0.1801, cr_loss=0.4137, attn_decoder_loss=0.2717, over 5782118.60 frames. ], batch size: 75, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:33:53,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=156640.0, ans=0.2 2024-09-17 05:34:09,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=156680.0, ans=0.0 2024-09-17 05:34:28,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=156760.0, ans=0.125 2024-09-17 05:34:33,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.085e+01 9.535e+01 1.020e+02 1.127e+02 2.521e+02, threshold=2.039e+02, percent-clipped=1.0 2024-09-17 05:34:43,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=156800.0, ans=0.125 2024-09-17 05:34:44,917 INFO [train.py:1198] (1/2) Epoch 9, batch 3000, loss[loss=0.2614, ctc_loss=0.1679, cr_loss=0.4178, attn_decoder_loss=0.2625, over 29739.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1797, cr_loss=0.4139, attn_decoder_loss=0.2715, over 5782508.88 frames. ], batch size: 81, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:34:44,917 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 05:35:03,247 INFO [train.py:1230] (1/2) Epoch 9, validation: loss=0.2139, ctc_loss=0.05057, cr_loss=4.328e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-17 05:35:03,247 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 05:35:15,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=156800.0, ans=0.0 2024-09-17 05:35:36,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=156880.0, ans=0.125 2024-09-17 05:35:40,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=156880.0, ans=0.09899494936611666 2024-09-17 05:35:46,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=156880.0, ans=0.125 2024-09-17 05:36:08,288 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:36:12,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=156960.0, ans=0.125 2024-09-17 05:36:21,549 INFO [train.py:1198] (1/2) Epoch 9, batch 3050, loss[loss=0.2645, ctc_loss=0.1753, cr_loss=0.4479, attn_decoder_loss=0.2645, over 29522.00 frames. ], tot_loss[loss=0.2713, ctc_loss=0.1802, cr_loss=0.4148, attn_decoder_loss=0.2722, over 5776332.22 frames. ], batch size: 76, lr: 1.23e-02, grad_scale: 4.0 2024-09-17 05:36:49,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=157040.0, ans=0.125 2024-09-17 05:36:49,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=157040.0, ans=0.0 2024-09-17 05:36:51,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=157040.0, ans=0.125 2024-09-17 05:36:55,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=157080.0, ans=0.125 2024-09-17 05:37:00,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=157080.0, ans=0.0 2024-09-17 05:37:06,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=157080.0, ans=0.125 2024-09-17 05:37:10,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=157120.0, ans=0.125 2024-09-17 05:37:13,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=157120.0, ans=0.0 2024-09-17 05:37:28,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=157160.0, ans=0.125 2024-09-17 05:37:29,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.561e+01 1.002e+02 1.065e+02 1.234e+02 3.157e+02, threshold=2.130e+02, percent-clipped=3.0 2024-09-17 05:37:38,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.09 vs. limit=15.0 2024-09-17 05:37:38,808 INFO [train.py:1198] (1/2) Epoch 9, batch 3100, loss[loss=0.2927, ctc_loss=0.2036, cr_loss=0.4529, attn_decoder_loss=0.2925, over 29313.00 frames. ], tot_loss[loss=0.2708, ctc_loss=0.1799, cr_loss=0.4136, attn_decoder_loss=0.2717, over 5776307.37 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:37:43,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=157200.0, ans=0.07 2024-09-17 05:37:57,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=157240.0, ans=0.0 2024-09-17 05:38:13,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=157280.0, ans=0.0 2024-09-17 05:38:21,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=157280.0, ans=0.0 2024-09-17 05:38:27,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=157320.0, ans=0.07 2024-09-17 05:38:42,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=157360.0, ans=0.125 2024-09-17 05:38:46,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.12 vs. limit=15.0 2024-09-17 05:38:54,755 INFO [train.py:1198] (1/2) Epoch 9, batch 3150, loss[loss=0.2782, ctc_loss=0.1759, cr_loss=0.4189, attn_decoder_loss=0.2803, over 28799.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1793, cr_loss=0.4131, attn_decoder_loss=0.2715, over 5782820.29 frames. ], batch size: 104, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:38:58,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=157400.0, ans=0.125 2024-09-17 05:39:16,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=157440.0, ans=0.125 2024-09-17 05:39:23,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=157440.0, ans=0.2 2024-09-17 05:39:29,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=157480.0, ans=0.125 2024-09-17 05:39:44,126 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:39:44,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=157520.0, ans=0.125 2024-09-17 05:39:53,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.61 vs. limit=15.0 2024-09-17 05:40:04,838 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.427e+01 1.013e+02 1.077e+02 1.205e+02 2.021e+02, threshold=2.154e+02, percent-clipped=0.0 2024-09-17 05:40:12,973 INFO [train.py:1198] (1/2) Epoch 9, batch 3200, loss[loss=0.2629, ctc_loss=0.1697, cr_loss=0.4179, attn_decoder_loss=0.264, over 29399.00 frames. ], tot_loss[loss=0.2703, ctc_loss=0.1791, cr_loss=0.4133, attn_decoder_loss=0.2712, over 5792998.23 frames. ], batch size: 79, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:40:27,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=157640.0, ans=0.125 2024-09-17 05:40:58,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=157680.0, ans=0.0 2024-09-17 05:41:15,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-09-17 05:41:25,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=157760.0, ans=0.125 2024-09-17 05:41:25,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-09-17 05:41:31,249 INFO [train.py:1198] (1/2) Epoch 9, batch 3250, loss[loss=0.2817, ctc_loss=0.1869, cr_loss=0.4156, attn_decoder_loss=0.283, over 29728.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.1788, cr_loss=0.4132, attn_decoder_loss=0.2711, over 5800529.63 frames. ], batch size: 84, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:41:36,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=157800.0, ans=0.1 2024-09-17 05:42:12,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2024-09-17 05:42:13,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=157880.0, ans=0.125 2024-09-17 05:42:13,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=157880.0, ans=0.2 2024-09-17 05:42:21,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=157920.0, ans=0.2 2024-09-17 05:42:32,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-09-17 05:42:39,124 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.206e+01 9.558e+01 1.067e+02 1.153e+02 2.320e+02, threshold=2.135e+02, percent-clipped=2.0 2024-09-17 05:42:45,765 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:42:46,851 INFO [train.py:1198] (1/2) Epoch 9, batch 3300, loss[loss=0.287, ctc_loss=0.1869, cr_loss=0.3941, attn_decoder_loss=0.2893, over 28306.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1781, cr_loss=0.4126, attn_decoder_loss=0.27, over 5796820.07 frames. ], batch size: 111, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:43:01,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=158000.0, ans=0.05 2024-09-17 05:43:03,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=158040.0, ans=0.025 2024-09-17 05:43:06,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2024-09-17 05:43:25,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=158080.0, ans=0.0 2024-09-17 05:43:42,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=158120.0, ans=0.0 2024-09-17 05:44:04,438 INFO [train.py:1198] (1/2) Epoch 9, batch 3350, loss[loss=0.2912, ctc_loss=0.1954, cr_loss=0.4306, attn_decoder_loss=0.2923, over 28831.00 frames. ], tot_loss[loss=0.27, ctc_loss=0.1792, cr_loss=0.4134, attn_decoder_loss=0.2709, over 5774259.92 frames. ], batch size: 104, lr: 1.23e-02, grad_scale: 4.0 2024-09-17 05:44:29,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=158240.0, ans=0.035 2024-09-17 05:44:30,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-09-17 05:44:38,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158280.0, ans=0.1 2024-09-17 05:44:46,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.51 vs. limit=10.0 2024-09-17 05:44:47,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=158280.0, ans=0.125 2024-09-17 05:45:02,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2024-09-17 05:45:16,186 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 9.844e+01 1.079e+02 1.203e+02 3.746e+02, threshold=2.158e+02, percent-clipped=3.0 2024-09-17 05:45:22,612 INFO [train.py:1198] (1/2) Epoch 9, batch 3400, loss[loss=0.2343, ctc_loss=0.139, cr_loss=0.3264, attn_decoder_loss=0.2377, over 29339.00 frames. ], tot_loss[loss=0.2702, ctc_loss=0.1794, cr_loss=0.4133, attn_decoder_loss=0.2711, over 5766551.40 frames. ], batch size: 67, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:45:30,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=158400.0, ans=0.125 2024-09-17 05:45:33,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=158400.0, ans=0.05 2024-09-17 05:45:36,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=158440.0, ans=0.125 2024-09-17 05:45:53,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=158480.0, ans=0.1 2024-09-17 05:46:11,762 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:46:26,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=158560.0, ans=0.0 2024-09-17 05:46:33,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2024-09-17 05:46:38,411 INFO [train.py:1198] (1/2) Epoch 9, batch 3450, loss[loss=0.2821, ctc_loss=0.1876, cr_loss=0.4198, attn_decoder_loss=0.2833, over 28310.00 frames. ], tot_loss[loss=0.2702, ctc_loss=0.1792, cr_loss=0.4135, attn_decoder_loss=0.2712, over 5774876.43 frames. ], batch size: 111, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:46:48,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=158600.0, ans=0.025 2024-09-17 05:46:53,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2024-09-17 05:47:18,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2024-09-17 05:47:19,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=158680.0, ans=0.0 2024-09-17 05:47:33,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=158720.0, ans=0.0 2024-09-17 05:47:39,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2024-09-17 05:47:39,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=158760.0, ans=0.07 2024-09-17 05:47:50,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=158760.0, ans=0.125 2024-09-17 05:47:50,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-09-17 05:47:51,397 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.572e+01 9.332e+01 9.969e+01 1.060e+02 1.614e+02, threshold=1.994e+02, percent-clipped=0.0 2024-09-17 05:47:55,998 INFO [train.py:1198] (1/2) Epoch 9, batch 3500, loss[loss=0.245, ctc_loss=0.1604, cr_loss=0.3691, attn_decoder_loss=0.2461, over 29330.00 frames. ], tot_loss[loss=0.2698, ctc_loss=0.1784, cr_loss=0.4119, attn_decoder_loss=0.2708, over 5777532.32 frames. ], batch size: 71, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:48:11,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=158840.0, ans=0.0 2024-09-17 05:48:25,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=158880.0, ans=0.125 2024-09-17 05:48:26,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-09-17 05:48:37,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=158880.0, ans=0.0 2024-09-17 05:48:37,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=158880.0, ans=0.04949747468305833 2024-09-17 05:48:38,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=22.5 2024-09-17 05:48:47,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=158920.0, ans=0.04949747468305833 2024-09-17 05:48:53,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=158920.0, ans=0.125 2024-09-17 05:48:58,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=158960.0, ans=0.2 2024-09-17 05:48:59,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=158960.0, ans=0.0 2024-09-17 05:49:08,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=158960.0, ans=0.125 2024-09-17 05:49:10,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.11 vs. limit=10.0 2024-09-17 05:49:12,737 INFO [train.py:1198] (1/2) Epoch 9, batch 3550, loss[loss=0.2831, ctc_loss=0.1888, cr_loss=0.4179, attn_decoder_loss=0.2843, over 29681.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1782, cr_loss=0.4121, attn_decoder_loss=0.2707, over 5783305.50 frames. ], batch size: 89, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:49:18,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=159000.0, ans=0.125 2024-09-17 05:49:29,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=159040.0, ans=0.125 2024-09-17 05:49:35,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=159040.0, ans=0.0 2024-09-17 05:49:36,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=159040.0, ans=0.125 2024-09-17 05:49:42,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=159080.0, ans=0.95 2024-09-17 05:49:44,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=159080.0, ans=0.125 2024-09-17 05:49:57,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=159120.0, ans=0.125 2024-09-17 05:50:23,962 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.321e+01 9.598e+01 1.057e+02 1.166e+02 3.699e+02, threshold=2.113e+02, percent-clipped=1.0 2024-09-17 05:50:27,356 INFO [train.py:1198] (1/2) Epoch 9, batch 3600, loss[loss=0.277, ctc_loss=0.1958, cr_loss=0.4564, attn_decoder_loss=0.2759, over 29509.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.1779, cr_loss=0.4125, attn_decoder_loss=0.2705, over 5791969.85 frames. ], batch size: 77, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:51:04,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=159280.0, ans=0.2 2024-09-17 05:51:25,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=159360.0, ans=0.0 2024-09-17 05:51:41,631 INFO [train.py:1198] (1/2) Epoch 9, batch 3650, loss[loss=0.2811, ctc_loss=0.1886, cr_loss=0.4403, attn_decoder_loss=0.2816, over 29501.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1772, cr_loss=0.4113, attn_decoder_loss=0.2698, over 5794466.57 frames. ], batch size: 90, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:51:52,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=7.30 vs. limit=12.0 2024-09-17 05:52:00,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.10 vs. limit=22.5 2024-09-17 05:52:21,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.50 vs. limit=10.0 2024-09-17 05:52:54,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.190e+01 9.468e+01 1.034e+02 1.121e+02 1.943e+02, threshold=2.068e+02, percent-clipped=0.0 2024-09-17 05:52:57,917 INFO [train.py:1198] (1/2) Epoch 9, batch 3700, loss[loss=0.2863, ctc_loss=0.1945, cr_loss=0.437, attn_decoder_loss=0.2868, over 29711.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1774, cr_loss=0.4119, attn_decoder_loss=0.27, over 5803296.78 frames. ], batch size: 84, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:52:58,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=159600.0, ans=0.125 2024-09-17 05:52:58,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-09-17 05:53:24,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=159640.0, ans=0.125 2024-09-17 05:53:41,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=159720.0, ans=0.125 2024-09-17 05:53:41,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=159720.0, ans=0.125 2024-09-17 05:53:51,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=159720.0, ans=0.025 2024-09-17 05:53:56,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=159760.0, ans=0.125 2024-09-17 05:54:09,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=159760.0, ans=0.05 2024-09-17 05:54:12,624 INFO [train.py:1198] (1/2) Epoch 9, batch 3750, loss[loss=0.2435, ctc_loss=0.1556, cr_loss=0.3634, attn_decoder_loss=0.2452, over 29369.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1766, cr_loss=0.4105, attn_decoder_loss=0.2694, over 5808720.17 frames. ], batch size: 67, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:54:55,368 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2024-09-17 05:55:05,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=159920.0, ans=0.125 2024-09-17 05:55:21,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=159960.0, ans=0.0 2024-09-17 05:55:25,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.200e+01 9.758e+01 1.061e+02 1.222e+02 3.852e+02, threshold=2.121e+02, percent-clipped=3.0 2024-09-17 05:55:35,959 INFO [train.py:1198] (1/2) Epoch 9, batch 3800, loss[loss=0.2887, ctc_loss=0.1981, cr_loss=0.4636, attn_decoder_loss=0.2884, over 29628.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1767, cr_loss=0.411, attn_decoder_loss=0.2693, over 5799315.21 frames. ], batch size: 86, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:55:39,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=160000.0, ans=0.2 2024-09-17 05:55:40,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160000.0, ans=0.1 2024-09-17 05:55:46,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=160000.0, ans=0.025 2024-09-17 05:55:53,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=160040.0, ans=0.2 2024-09-17 05:56:50,300 INFO [train.py:1198] (1/2) Epoch 9, batch 3850, loss[loss=0.3002, ctc_loss=0.2137, cr_loss=0.4627, attn_decoder_loss=0.2996, over 29242.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1772, cr_loss=0.4114, attn_decoder_loss=0.2696, over 5813350.66 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:57:08,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-17 05:57:32,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=160280.0, ans=0.125 2024-09-17 05:57:44,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=160320.0, ans=0.2 2024-09-17 05:57:44,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.23 vs. limit=22.5 2024-09-17 05:57:59,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.15 vs. limit=10.0 2024-09-17 05:58:03,424 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.356e+01 9.561e+01 1.016e+02 1.083e+02 1.844e+02, threshold=2.033e+02, percent-clipped=0.0 2024-09-17 05:58:06,430 INFO [train.py:1198] (1/2) Epoch 9, batch 3900, loss[loss=0.2653, ctc_loss=0.1676, cr_loss=0.3952, attn_decoder_loss=0.2673, over 29624.00 frames. ], tot_loss[loss=0.2693, ctc_loss=0.1777, cr_loss=0.4128, attn_decoder_loss=0.2703, over 5817825.96 frames. ], batch size: 86, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:58:12,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=160400.0, ans=0.125 2024-09-17 05:58:21,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2024-09-17 05:58:24,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=160440.0, ans=0.2 2024-09-17 05:58:52,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=160520.0, ans=0.125 2024-09-17 05:58:52,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=160520.0, ans=0.125 2024-09-17 05:58:55,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=160520.0, ans=0.125 2024-09-17 05:59:20,918 INFO [train.py:1198] (1/2) Epoch 9, batch 3950, loss[loss=0.282, ctc_loss=0.1865, cr_loss=0.4413, attn_decoder_loss=0.2828, over 29451.00 frames. ], tot_loss[loss=0.2696, ctc_loss=0.1776, cr_loss=0.4131, attn_decoder_loss=0.2707, over 5836915.41 frames. ], batch size: 97, lr: 1.22e-02, grad_scale: 4.0 2024-09-17 05:59:26,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.13 vs. limit=15.0 2024-09-17 05:59:34,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=160640.0, ans=0.0 2024-09-17 05:59:44,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=160640.0, ans=0.0 2024-09-17 06:00:11,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=160720.0, ans=0.2 2024-09-17 06:00:34,493 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.640e+01 1.057e+02 1.201e+02 4.208e+02, threshold=2.114e+02, percent-clipped=4.0 2024-09-17 06:00:36,380 INFO [train.py:1198] (1/2) Epoch 9, batch 4000, loss[loss=0.2425, ctc_loss=0.1437, cr_loss=0.3717, attn_decoder_loss=0.2452, over 29494.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1773, cr_loss=0.412, attn_decoder_loss=0.2701, over 5813203.84 frames. ], batch size: 74, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 06:00:38,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=160800.0, ans=0.0 2024-09-17 06:01:15,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=160880.0, ans=0.0 2024-09-17 06:01:42,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=160960.0, ans=0.125 2024-09-17 06:01:43,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=160960.0, ans=0.125 2024-09-17 06:01:49,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=161000.0, ans=0.0 2024-09-17 06:01:50,827 INFO [train.py:1198] (1/2) Epoch 9, batch 4050, loss[loss=0.2954, ctc_loss=0.2212, cr_loss=0.4145, attn_decoder_loss=0.2945, over 19973.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1775, cr_loss=0.412, attn_decoder_loss=0.2702, over 5797521.00 frames. ], batch size: 209, lr: 1.22e-02, grad_scale: 4.0 2024-09-17 06:01:56,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=161000.0, ans=0.125 2024-09-17 06:01:57,138 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:02:14,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=161040.0, ans=0.025 2024-09-17 06:02:17,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=161040.0, ans=0.125 2024-09-17 06:02:31,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=161080.0, ans=0.125 2024-09-17 06:02:31,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=161080.0, ans=0.0 2024-09-17 06:02:48,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-09-17 06:03:01,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-09-17 06:03:05,378 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.114e+01 9.617e+01 1.028e+02 1.240e+02 2.479e+02, threshold=2.055e+02, percent-clipped=2.0 2024-09-17 06:03:05,401 INFO [train.py:1198] (1/2) Epoch 9, batch 4100, loss[loss=0.2846, ctc_loss=0.1823, cr_loss=0.4174, attn_decoder_loss=0.2867, over 29504.00 frames. ], tot_loss[loss=0.2696, ctc_loss=0.1782, cr_loss=0.4131, attn_decoder_loss=0.2706, over 5793886.04 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 06:03:21,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161240.0, ans=0.1 2024-09-17 06:03:29,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=161240.0, ans=0.125 2024-09-17 06:03:40,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=161280.0, ans=0.2 2024-09-17 06:03:51,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.53 vs. limit=15.0 2024-09-17 06:04:18,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2024-09-17 06:04:19,362 INFO [train.py:1198] (1/2) Epoch 9, batch 4150, loss[loss=0.2584, ctc_loss=0.1649, cr_loss=0.4021, attn_decoder_loss=0.2598, over 29526.00 frames. ], tot_loss[loss=0.2696, ctc_loss=0.1782, cr_loss=0.4132, attn_decoder_loss=0.2706, over 5798258.49 frames. ], batch size: 77, lr: 1.22e-02, grad_scale: 4.0 2024-09-17 06:04:25,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=161400.0, ans=0.05 2024-09-17 06:04:41,753 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:04:51,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=161480.0, ans=10.0 2024-09-17 06:04:54,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=161480.0, ans=0.125 2024-09-17 06:04:58,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=161480.0, ans=0.1 2024-09-17 06:04:58,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=161480.0, ans=0.0 2024-09-17 06:05:02,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2024-09-17 06:05:06,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=161520.0, ans=0.125 2024-09-17 06:05:18,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=161560.0, ans=0.0 2024-09-17 06:05:19,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=161560.0, ans=0.0 2024-09-17 06:05:34,475 INFO [train.py:1198] (1/2) Epoch 9, batch 4200, loss[loss=0.2858, ctc_loss=0.195, cr_loss=0.4501, attn_decoder_loss=0.2859, over 29497.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1779, cr_loss=0.4129, attn_decoder_loss=0.2707, over 5799943.49 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 06:05:35,869 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.556e+01 1.027e+02 1.111e+02 2.120e+02, threshold=2.054e+02, percent-clipped=2.0 2024-09-17 06:05:36,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=161600.0, ans=0.125 2024-09-17 06:05:39,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2024-09-17 06:05:44,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-09-17 06:06:27,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2024-09-17 06:06:48,812 INFO [train.py:1198] (1/2) Epoch 9, batch 4250, loss[loss=0.2436, ctc_loss=0.1516, cr_loss=0.3747, attn_decoder_loss=0.2455, over 29520.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.1774, cr_loss=0.4123, attn_decoder_loss=0.2706, over 5805325.68 frames. ], batch size: 74, lr: 1.22e-02, grad_scale: 4.0 2024-09-17 06:07:12,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=161840.0, ans=0.1 2024-09-17 06:07:20,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2024-09-17 06:07:59,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=161960.0, ans=0.125 2024-09-17 06:08:02,493 INFO [train.py:1198] (1/2) Epoch 9, batch 4300, loss[loss=0.2843, ctc_loss=0.1871, cr_loss=0.4432, attn_decoder_loss=0.2852, over 29525.00 frames. ], tot_loss[loss=0.2703, ctc_loss=0.1784, cr_loss=0.4137, attn_decoder_loss=0.2713, over 5795087.41 frames. ], batch size: 87, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 06:08:05,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.660e+01 9.959e+01 1.074e+02 1.170e+02 2.141e+02, threshold=2.147e+02, percent-clipped=1.0 2024-09-17 06:08:16,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=162040.0, ans=0.0 2024-09-17 06:08:46,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=162120.0, ans=0.0 2024-09-17 06:09:04,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=162160.0, ans=0.1 2024-09-17 06:09:17,152 INFO [train.py:1198] (1/2) Epoch 9, batch 4350, loss[loss=0.2927, ctc_loss=0.1947, cr_loss=0.4392, attn_decoder_loss=0.2938, over 29449.00 frames. ], tot_loss[loss=0.274, ctc_loss=0.1815, cr_loss=0.4191, attn_decoder_loss=0.2749, over 5797376.53 frames. ], batch size: 97, lr: 1.21e-02, grad_scale: 8.0 2024-09-17 06:09:55,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=162280.0, ans=0.0 2024-09-17 06:09:55,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=162280.0, ans=0.125 2024-09-17 06:09:58,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=162280.0, ans=0.025 2024-09-17 06:10:07,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162320.0, ans=0.1 2024-09-17 06:10:23,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=162360.0, ans=0.09899494936611666 2024-09-17 06:10:25,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2024-09-17 06:10:31,074 INFO [train.py:1198] (1/2) Epoch 9, batch 4400, loss[loss=0.2877, ctc_loss=0.2012, cr_loss=0.4359, attn_decoder_loss=0.2877, over 27496.00 frames. ], tot_loss[loss=0.2763, ctc_loss=0.1837, cr_loss=0.4218, attn_decoder_loss=0.2772, over 5765498.70 frames. ], batch size: 124, lr: 1.21e-02, grad_scale: 8.0 2024-09-17 06:10:35,517 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.675e+01 9.860e+01 1.034e+02 1.169e+02 1.757e+02, threshold=2.069e+02, percent-clipped=0.0 2024-09-17 06:10:43,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162400.0, ans=0.1 2024-09-17 06:10:43,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=162400.0, ans=0.125 2024-09-17 06:10:50,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=162440.0, ans=0.05 2024-09-17 06:10:53,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=162440.0, ans=0.0 2024-09-17 06:11:05,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.52 vs. limit=10.0 2024-09-17 06:11:20,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=162520.0, ans=0.125 2024-09-17 06:11:46,092 INFO [train.py:1198] (1/2) Epoch 9, batch 4450, loss[loss=0.3124, ctc_loss=0.2492, cr_loss=0.4708, attn_decoder_loss=0.309, over 20072.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.1892, cr_loss=0.4256, attn_decoder_loss=0.2803, over 5570202.92 frames. ], batch size: 210, lr: 1.21e-02, grad_scale: 4.0 2024-09-17 06:11:46,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2024-09-17 06:11:51,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=162600.0, ans=10.0 2024-09-17 06:12:00,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=162640.0, ans=0.09899494936611666 2024-09-17 06:12:07,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=162640.0, ans=0.125 2024-09-17 06:12:12,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-09-17 06:12:17,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=162680.0, ans=0.0 2024-09-17 06:12:28,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=162680.0, ans=0.1 2024-09-17 06:12:37,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=162720.0, ans=0.0 2024-09-17 06:13:01,685 INFO [train.py:1198] (1/2) Epoch 9, batch 4500, loss[loss=0.2971, ctc_loss=0.2238, cr_loss=0.4398, attn_decoder_loss=0.2955, over 20067.00 frames. ], tot_loss[loss=0.2836, ctc_loss=0.1968, cr_loss=0.4278, attn_decoder_loss=0.2838, over 5228674.98 frames. ], batch size: 209, lr: 1.21e-02, grad_scale: 8.0 2024-09-17 06:13:07,509 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.470e+01 1.060e+02 1.171e+02 1.308e+02 2.646e+02, threshold=2.342e+02, percent-clipped=3.0 2024-09-17 06:13:12,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=162800.0, ans=0.2 2024-09-17 06:13:13,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=162800.0, ans=0.025 2024-09-17 06:13:22,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=162840.0, ans=0.0 2024-09-17 06:13:28,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=162840.0, ans=0.125 2024-09-17 06:13:33,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=162880.0, ans=0.05 2024-09-17 06:14:29,147 INFO [train.py:1198] (1/2) Epoch 10, batch 0, loss[loss=0.2581, ctc_loss=0.1605, cr_loss=0.3872, attn_decoder_loss=0.2603, over 29606.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1605, cr_loss=0.3872, attn_decoder_loss=0.2603, over 29606.00 frames. ], batch size: 73, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:14:29,148 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 06:14:47,510 INFO [train.py:1230] (1/2) Epoch 10, validation: loss=0.2171, ctc_loss=0.05118, cr_loss=4.759e-15, attn_decoder_loss=0.2355, over 944034.00 frames. 2024-09-17 06:14:47,511 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 06:14:50,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=162900.0, ans=0.125 2024-09-17 06:15:02,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=162940.0, ans=0.125 2024-09-17 06:15:43,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=163020.0, ans=10.0 2024-09-17 06:15:49,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=163060.0, ans=0.025 2024-09-17 06:15:52,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=163060.0, ans=0.125 2024-09-17 06:15:55,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=163060.0, ans=0.2 2024-09-17 06:16:02,853 INFO [train.py:1198] (1/2) Epoch 10, batch 50, loss[loss=0.2324, ctc_loss=0.1473, cr_loss=0.3674, attn_decoder_loss=0.2337, over 29423.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1817, cr_loss=0.4143, attn_decoder_loss=0.2725, over 1267008.14 frames. ], batch size: 70, lr: 1.15e-02, grad_scale: 4.0 2024-09-17 06:16:07,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=163100.0, ans=0.1 2024-09-17 06:16:19,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=163140.0, ans=0.125 2024-09-17 06:16:25,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-17 06:16:52,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.660e+01 1.078e+02 1.244e+02 7.750e+02, threshold=2.155e+02, percent-clipped=3.0 2024-09-17 06:17:02,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=163220.0, ans=0.1 2024-09-17 06:17:05,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=163220.0, ans=0.0 2024-09-17 06:17:13,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=22.5 2024-09-17 06:17:17,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=163260.0, ans=0.0 2024-09-17 06:17:22,992 INFO [train.py:1198] (1/2) Epoch 10, batch 100, loss[loss=0.2542, ctc_loss=0.1576, cr_loss=0.3854, attn_decoder_loss=0.2564, over 29526.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.1813, cr_loss=0.4168, attn_decoder_loss=0.2742, over 2251338.93 frames. ], batch size: 76, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:17:27,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=163300.0, ans=0.0 2024-09-17 06:17:50,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=163340.0, ans=15.0 2024-09-17 06:18:08,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=163420.0, ans=0.0 2024-09-17 06:18:08,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=163420.0, ans=10.0 2024-09-17 06:18:15,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.94 vs. limit=15.0 2024-09-17 06:18:21,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=163460.0, ans=0.125 2024-09-17 06:18:23,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=15.0 2024-09-17 06:18:33,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=163460.0, ans=0.0 2024-09-17 06:18:37,442 INFO [train.py:1198] (1/2) Epoch 10, batch 150, loss[loss=0.2493, ctc_loss=0.1673, cr_loss=0.3956, attn_decoder_loss=0.2496, over 29467.00 frames. ], tot_loss[loss=0.2702, ctc_loss=0.1777, cr_loss=0.412, attn_decoder_loss=0.2713, over 3047478.20 frames. ], batch size: 70, lr: 1.15e-02, grad_scale: 4.0 2024-09-17 06:18:51,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=163540.0, ans=0.125 2024-09-17 06:19:00,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163540.0, ans=0.1 2024-09-17 06:19:02,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=163540.0, ans=0.5 2024-09-17 06:19:04,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=163540.0, ans=0.125 2024-09-17 06:19:07,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2024-09-17 06:19:08,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-17 06:19:13,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=163580.0, ans=0.125 2024-09-17 06:19:15,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2024-09-17 06:19:24,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2024-09-17 06:19:25,268 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.231e+01 9.712e+01 1.046e+02 1.496e+02, threshold=1.942e+02, percent-clipped=0.0 2024-09-17 06:19:30,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=163620.0, ans=0.1 2024-09-17 06:19:39,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=163660.0, ans=0.025 2024-09-17 06:19:52,311 INFO [train.py:1198] (1/2) Epoch 10, batch 200, loss[loss=0.2921, ctc_loss=0.1955, cr_loss=0.4316, attn_decoder_loss=0.2933, over 27343.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1761, cr_loss=0.4097, attn_decoder_loss=0.2696, over 3659428.76 frames. ], batch size: 125, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:20:00,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.51 vs. limit=22.5 2024-09-17 06:20:03,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=163700.0, ans=0.0 2024-09-17 06:20:15,080 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:20:21,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=163780.0, ans=0.09899494936611666 2024-09-17 06:20:27,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2024-09-17 06:20:41,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=163820.0, ans=0.125 2024-09-17 06:20:53,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=163820.0, ans=0.2 2024-09-17 06:20:55,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=12.0 2024-09-17 06:21:12,288 INFO [train.py:1198] (1/2) Epoch 10, batch 250, loss[loss=0.2812, ctc_loss=0.1893, cr_loss=0.4253, attn_decoder_loss=0.282, over 29289.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1755, cr_loss=0.4096, attn_decoder_loss=0.269, over 4140730.77 frames. ], batch size: 100, lr: 1.15e-02, grad_scale: 4.0 2024-09-17 06:21:17,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.17 vs. limit=22.5 2024-09-17 06:21:45,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163980.0, ans=0.1 2024-09-17 06:21:52,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=163980.0, ans=0.09899494936611666 2024-09-17 06:21:55,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=22.5 2024-09-17 06:21:57,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-09-17 06:22:02,582 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.245e+01 9.493e+01 1.020e+02 1.129e+02 1.613e+02, threshold=2.040e+02, percent-clipped=0.0 2024-09-17 06:22:23,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.49 vs. limit=15.0 2024-09-17 06:22:28,435 INFO [train.py:1198] (1/2) Epoch 10, batch 300, loss[loss=0.2803, ctc_loss=0.1812, cr_loss=0.4367, attn_decoder_loss=0.2816, over 29529.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.175, cr_loss=0.4096, attn_decoder_loss=0.2688, over 4509830.03 frames. ], batch size: 92, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:22:41,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.47 vs. limit=15.0 2024-09-17 06:22:51,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=164140.0, ans=0.125 2024-09-17 06:22:57,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=164180.0, ans=0.125 2024-09-17 06:23:05,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164180.0, ans=0.1 2024-09-17 06:23:28,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=164260.0, ans=0.0 2024-09-17 06:23:28,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164260.0, ans=0.1 2024-09-17 06:23:30,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.10 vs. limit=10.0 2024-09-17 06:23:34,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=164260.0, ans=0.025 2024-09-17 06:23:43,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=164300.0, ans=0.125 2024-09-17 06:23:44,699 INFO [train.py:1198] (1/2) Epoch 10, batch 350, loss[loss=0.2408, ctc_loss=0.153, cr_loss=0.3857, attn_decoder_loss=0.242, over 29328.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1756, cr_loss=0.4103, attn_decoder_loss=0.2699, over 4795205.45 frames. ], batch size: 71, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:23:45,079 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:23:57,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=164300.0, ans=0.0 2024-09-17 06:24:07,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=164340.0, ans=0.125 2024-09-17 06:24:07,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=164340.0, ans=0.125 2024-09-17 06:24:12,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=164340.0, ans=0.0 2024-09-17 06:24:19,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=164380.0, ans=0.1 2024-09-17 06:24:34,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.056e+01 9.667e+01 1.045e+02 1.260e+02 3.351e+02, threshold=2.090e+02, percent-clipped=2.0 2024-09-17 06:24:36,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=164420.0, ans=0.1 2024-09-17 06:25:05,712 INFO [train.py:1198] (1/2) Epoch 10, batch 400, loss[loss=0.2679, ctc_loss=0.1691, cr_loss=0.3869, attn_decoder_loss=0.2703, over 29711.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1752, cr_loss=0.4101, attn_decoder_loss=0.2693, over 5024427.14 frames. ], batch size: 82, lr: 1.15e-02, grad_scale: 16.0 2024-09-17 06:25:13,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=164500.0, ans=0.125 2024-09-17 06:25:31,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=164540.0, ans=0.125 2024-09-17 06:25:51,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=164620.0, ans=0.035 2024-09-17 06:25:58,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=164620.0, ans=0.0 2024-09-17 06:26:21,193 INFO [train.py:1198] (1/2) Epoch 10, batch 450, loss[loss=0.2831, ctc_loss=0.1904, cr_loss=0.4665, attn_decoder_loss=0.283, over 29707.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1755, cr_loss=0.4105, attn_decoder_loss=0.2697, over 5185772.01 frames. ], batch size: 83, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:26:47,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.46 vs. limit=22.5 2024-09-17 06:26:48,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=164740.0, ans=0.125 2024-09-17 06:26:50,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2024-09-17 06:27:01,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=164780.0, ans=0.125 2024-09-17 06:27:01,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=164780.0, ans=0.0 2024-09-17 06:27:13,192 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 9.295e+01 1.006e+02 1.063e+02 1.826e+02, threshold=2.013e+02, percent-clipped=0.0 2024-09-17 06:27:16,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=164820.0, ans=0.025 2024-09-17 06:27:29,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=164860.0, ans=0.1 2024-09-17 06:27:37,053 INFO [train.py:1198] (1/2) Epoch 10, batch 500, loss[loss=0.2842, ctc_loss=0.1806, cr_loss=0.4118, attn_decoder_loss=0.2866, over 29438.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1746, cr_loss=0.4096, attn_decoder_loss=0.2688, over 5328350.72 frames. ], batch size: 94, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:27:52,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=164940.0, ans=0.2 2024-09-17 06:28:22,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=165020.0, ans=0.125 2024-09-17 06:28:57,305 INFO [train.py:1198] (1/2) Epoch 10, batch 550, loss[loss=0.2837, ctc_loss=0.1922, cr_loss=0.4345, attn_decoder_loss=0.2842, over 28781.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1745, cr_loss=0.4092, attn_decoder_loss=0.2687, over 5420333.13 frames. ], batch size: 104, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:29:00,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=165100.0, ans=0.5 2024-09-17 06:29:02,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=165100.0, ans=0.2 2024-09-17 06:29:05,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=165100.0, ans=0.2 2024-09-17 06:29:15,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=165140.0, ans=0.2 2024-09-17 06:29:46,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=165220.0, ans=0.0 2024-09-17 06:29:51,813 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.131e+01 9.579e+01 1.029e+02 1.127e+02 2.367e+02, threshold=2.058e+02, percent-clipped=2.0 2024-09-17 06:30:10,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=165260.0, ans=0.125 2024-09-17 06:30:13,110 INFO [train.py:1198] (1/2) Epoch 10, batch 600, loss[loss=0.2901, ctc_loss=0.1909, cr_loss=0.4439, attn_decoder_loss=0.2912, over 29257.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1742, cr_loss=0.4093, attn_decoder_loss=0.2688, over 5507077.33 frames. ], batch size: 100, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:30:26,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=165340.0, ans=0.125 2024-09-17 06:30:31,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=165340.0, ans=0.0 2024-09-17 06:30:35,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff2.min_abs, batch_count=165340.0, ans=0.1 2024-09-17 06:30:45,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2024-09-17 06:30:55,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=165380.0, ans=0.125 2024-09-17 06:30:58,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=12.0 2024-09-17 06:31:19,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=165460.0, ans=10.0 2024-09-17 06:31:27,706 INFO [train.py:1198] (1/2) Epoch 10, batch 650, loss[loss=0.2614, ctc_loss=0.1665, cr_loss=0.3955, attn_decoder_loss=0.2632, over 29758.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1732, cr_loss=0.4078, attn_decoder_loss=0.2679, over 5584589.73 frames. ], batch size: 81, lr: 1.14e-02, grad_scale: 4.0 2024-09-17 06:31:53,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff3.min_abs, batch_count=165540.0, ans=0.2 2024-09-17 06:32:07,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=165580.0, ans=0.125 2024-09-17 06:32:19,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=165620.0, ans=0.0 2024-09-17 06:32:23,906 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.258e+01 9.852e+01 1.047e+02 1.585e+02, threshold=1.970e+02, percent-clipped=0.0 2024-09-17 06:32:24,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=165620.0, ans=0.2 2024-09-17 06:32:25,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=165620.0, ans=0.2 2024-09-17 06:32:26,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=12.0 2024-09-17 06:32:47,994 INFO [train.py:1198] (1/2) Epoch 10, batch 700, loss[loss=0.2619, ctc_loss=0.1697, cr_loss=0.4002, attn_decoder_loss=0.2632, over 29542.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1741, cr_loss=0.4094, attn_decoder_loss=0.2687, over 5636214.23 frames. ], batch size: 76, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:32:50,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-09-17 06:33:00,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=165700.0, ans=0.125 2024-09-17 06:33:04,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=165740.0, ans=0.125 2024-09-17 06:33:14,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-09-17 06:33:15,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=165740.0, ans=0.2 2024-09-17 06:33:43,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=165820.0, ans=0.025 2024-09-17 06:33:50,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=165860.0, ans=0.0 2024-09-17 06:33:51,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=165860.0, ans=0.1 2024-09-17 06:34:03,338 INFO [train.py:1198] (1/2) Epoch 10, batch 750, loss[loss=0.2682, ctc_loss=0.1672, cr_loss=0.4169, attn_decoder_loss=0.2701, over 29714.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1736, cr_loss=0.4089, attn_decoder_loss=0.2683, over 5674319.71 frames. ], batch size: 82, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:34:14,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-09-17 06:34:39,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=165980.0, ans=0.0 2024-09-17 06:35:00,788 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.819e+01 1.062e+02 1.153e+02 3.541e+02, threshold=2.124e+02, percent-clipped=2.0 2024-09-17 06:35:18,939 INFO [train.py:1198] (1/2) Epoch 10, batch 800, loss[loss=0.2377, ctc_loss=0.146, cr_loss=0.3668, attn_decoder_loss=0.2397, over 29599.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1737, cr_loss=0.4097, attn_decoder_loss=0.2685, over 5705656.55 frames. ], batch size: 73, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:35:27,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.75 vs. limit=15.0 2024-09-17 06:35:35,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=166140.0, ans=0.1 2024-09-17 06:35:53,949 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:35:56,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2024-09-17 06:35:58,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=166180.0, ans=0.125 2024-09-17 06:36:09,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.84 vs. limit=22.5 2024-09-17 06:36:12,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=166220.0, ans=0.07 2024-09-17 06:36:34,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-09-17 06:36:36,135 INFO [train.py:1198] (1/2) Epoch 10, batch 850, loss[loss=0.2831, ctc_loss=0.1873, cr_loss=0.4067, attn_decoder_loss=0.2847, over 29705.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1733, cr_loss=0.4085, attn_decoder_loss=0.268, over 5736269.02 frames. ], batch size: 89, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:37:03,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=166340.0, ans=0.125 2024-09-17 06:37:12,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166380.0, ans=0.1 2024-09-17 06:37:27,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=166420.0, ans=0.125 2024-09-17 06:37:37,422 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.461e+01 9.790e+01 1.072e+02 1.196e+02 1.464e+02, threshold=2.145e+02, percent-clipped=0.0 2024-09-17 06:37:37,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=166460.0, ans=0.125 2024-09-17 06:37:41,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=17.12 vs. limit=15.0 2024-09-17 06:37:43,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=166460.0, ans=0.0 2024-09-17 06:37:48,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=166460.0, ans=0.125 2024-09-17 06:37:50,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=166460.0, ans=0.0 2024-09-17 06:37:54,164 INFO [train.py:1198] (1/2) Epoch 10, batch 900, loss[loss=0.2549, ctc_loss=0.1708, cr_loss=0.4023, attn_decoder_loss=0.2553, over 29599.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1736, cr_loss=0.4092, attn_decoder_loss=0.2681, over 5742819.18 frames. ], batch size: 73, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:38:07,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=166540.0, ans=0.025 2024-09-17 06:38:14,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-17 06:38:33,773 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:38:35,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=166580.0, ans=0.0 2024-09-17 06:38:38,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=166620.0, ans=0.0 2024-09-17 06:38:47,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=166620.0, ans=0.125 2024-09-17 06:38:47,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=166620.0, ans=0.125 2024-09-17 06:38:56,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=166660.0, ans=0.07 2024-09-17 06:39:09,489 INFO [train.py:1198] (1/2) Epoch 10, batch 950, loss[loss=0.2504, ctc_loss=0.1658, cr_loss=0.3933, attn_decoder_loss=0.251, over 29492.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.174, cr_loss=0.4098, attn_decoder_loss=0.2686, over 5744469.84 frames. ], batch size: 74, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:39:11,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2024-09-17 06:39:18,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.19 vs. limit=10.0 2024-09-17 06:39:18,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=166700.0, ans=0.0 2024-09-17 06:39:40,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.66 vs. limit=15.0 2024-09-17 06:39:48,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-09-17 06:39:59,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=166820.0, ans=0.2 2024-09-17 06:40:06,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.30 vs. limit=15.0 2024-09-17 06:40:09,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.298e+01 9.770e+01 1.085e+02 1.240e+02 2.634e+02, threshold=2.170e+02, percent-clipped=2.0 2024-09-17 06:40:26,945 INFO [train.py:1198] (1/2) Epoch 10, batch 1000, loss[loss=0.2591, ctc_loss=0.1714, cr_loss=0.3871, attn_decoder_loss=0.2602, over 29508.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1752, cr_loss=0.41, attn_decoder_loss=0.2696, over 5739304.80 frames. ], batch size: 77, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:40:59,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=166980.0, ans=0.125 2024-09-17 06:41:10,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.45 vs. limit=15.0 2024-09-17 06:41:17,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167020.0, ans=0.1 2024-09-17 06:41:26,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=167020.0, ans=0.2 2024-09-17 06:41:34,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=167060.0, ans=0.0 2024-09-17 06:41:35,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=167060.0, ans=0.125 2024-09-17 06:41:38,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167060.0, ans=0.1 2024-09-17 06:41:40,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167060.0, ans=0.1 2024-09-17 06:41:44,651 INFO [train.py:1198] (1/2) Epoch 10, batch 1050, loss[loss=0.2678, ctc_loss=0.175, cr_loss=0.4168, attn_decoder_loss=0.2688, over 29669.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.174, cr_loss=0.4088, attn_decoder_loss=0.2686, over 5745736.36 frames. ], batch size: 85, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:42:16,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=167180.0, ans=0.1 2024-09-17 06:42:18,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=167180.0, ans=0.0 2024-09-17 06:42:28,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-09-17 06:42:45,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.318e+01 9.401e+01 9.855e+01 1.069e+02 2.033e+02, threshold=1.971e+02, percent-clipped=0.0 2024-09-17 06:42:47,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=167260.0, ans=0.0 2024-09-17 06:43:00,924 INFO [train.py:1198] (1/2) Epoch 10, batch 1100, loss[loss=0.2518, ctc_loss=0.1531, cr_loss=0.3733, attn_decoder_loss=0.2544, over 29457.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1736, cr_loss=0.4082, attn_decoder_loss=0.2682, over 5758028.42 frames. ], batch size: 78, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:43:07,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=167300.0, ans=0.125 2024-09-17 06:43:08,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=167300.0, ans=0.0 2024-09-17 06:43:11,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=167300.0, ans=0.0 2024-09-17 06:43:36,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-09-17 06:43:39,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.06 vs. limit=22.5 2024-09-17 06:43:50,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=167420.0, ans=0.0 2024-09-17 06:43:52,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=167420.0, ans=0.2 2024-09-17 06:43:54,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2024-09-17 06:43:59,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=167460.0, ans=0.125 2024-09-17 06:44:08,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167460.0, ans=0.1 2024-09-17 06:44:18,491 INFO [train.py:1198] (1/2) Epoch 10, batch 1150, loss[loss=0.2645, ctc_loss=0.174, cr_loss=0.3965, attn_decoder_loss=0.2657, over 29441.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1736, cr_loss=0.4085, attn_decoder_loss=0.2683, over 5755343.74 frames. ], batch size: 78, lr: 1.14e-02, grad_scale: 4.0 2024-09-17 06:44:31,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.57 vs. limit=22.5 2024-09-17 06:44:45,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=167540.0, ans=0.0 2024-09-17 06:44:50,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=167580.0, ans=0.0 2024-09-17 06:45:12,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=167620.0, ans=0.0 2024-09-17 06:45:15,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=167620.0, ans=0.0 2024-09-17 06:45:22,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.006e+01 9.612e+01 1.039e+02 1.179e+02 2.688e+02, threshold=2.078e+02, percent-clipped=2.0 2024-09-17 06:45:33,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=167660.0, ans=0.125 2024-09-17 06:45:36,427 INFO [train.py:1198] (1/2) Epoch 10, batch 1200, loss[loss=0.2676, ctc_loss=0.1612, cr_loss=0.4091, attn_decoder_loss=0.2703, over 29671.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1743, cr_loss=0.4088, attn_decoder_loss=0.269, over 5748178.95 frames. ], batch size: 85, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:45:38,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167700.0, ans=0.1 2024-09-17 06:45:39,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=167700.0, ans=0.2 2024-09-17 06:46:02,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=167740.0, ans=0.0 2024-09-17 06:46:09,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.37 vs. limit=15.0 2024-09-17 06:46:10,039 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:46:16,042 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:46:31,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=167820.0, ans=0.0 2024-09-17 06:46:47,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167860.0, ans=0.1 2024-09-17 06:46:51,948 INFO [train.py:1198] (1/2) Epoch 10, batch 1250, loss[loss=0.2913, ctc_loss=0.1998, cr_loss=0.4382, attn_decoder_loss=0.2918, over 29544.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1743, cr_loss=0.4094, attn_decoder_loss=0.2693, over 5774584.13 frames. ], batch size: 92, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:46:58,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=167900.0, ans=0.125 2024-09-17 06:47:11,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=167940.0, ans=0.125 2024-09-17 06:47:23,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=167980.0, ans=0.125 2024-09-17 06:47:48,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=168020.0, ans=0.1 2024-09-17 06:47:49,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=168020.0, ans=0.125 2024-09-17 06:47:56,413 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.293e+01 9.363e+01 1.028e+02 1.124e+02 2.251e+02, threshold=2.057e+02, percent-clipped=1.0 2024-09-17 06:47:58,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=168060.0, ans=0.125 2024-09-17 06:48:05,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=168060.0, ans=0.1 2024-09-17 06:48:08,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=168100.0, ans=0.125 2024-09-17 06:48:09,976 INFO [train.py:1198] (1/2) Epoch 10, batch 1300, loss[loss=0.2849, ctc_loss=0.1813, cr_loss=0.4279, attn_decoder_loss=0.2869, over 28411.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1732, cr_loss=0.4089, attn_decoder_loss=0.2684, over 5779487.77 frames. ], batch size: 111, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:48:22,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=168100.0, ans=0.125 2024-09-17 06:48:35,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=168140.0, ans=0.2 2024-09-17 06:48:42,806 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:49:20,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=168260.0, ans=0.2 2024-09-17 06:49:27,981 INFO [train.py:1198] (1/2) Epoch 10, batch 1350, loss[loss=0.2618, ctc_loss=0.1621, cr_loss=0.3744, attn_decoder_loss=0.2646, over 29752.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1732, cr_loss=0.4087, attn_decoder_loss=0.2684, over 5795018.21 frames. ], batch size: 81, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:49:35,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.31 vs. limit=12.0 2024-09-17 06:49:36,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.37 vs. limit=12.0 2024-09-17 06:49:44,748 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:49:49,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=168340.0, ans=0.125 2024-09-17 06:49:59,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2024-09-17 06:50:02,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=168380.0, ans=0.125 2024-09-17 06:50:10,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.97 vs. limit=15.0 2024-09-17 06:50:29,343 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.801e+01 9.605e+01 1.036e+02 1.132e+02 1.597e+02, threshold=2.072e+02, percent-clipped=0.0 2024-09-17 06:50:42,759 INFO [train.py:1198] (1/2) Epoch 10, batch 1400, loss[loss=0.2304, ctc_loss=0.1448, cr_loss=0.3508, attn_decoder_loss=0.2321, over 29565.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1731, cr_loss=0.4087, attn_decoder_loss=0.2684, over 5806263.47 frames. ], batch size: 69, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:51:05,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=168540.0, ans=0.0 2024-09-17 06:51:20,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=168580.0, ans=0.125 2024-09-17 06:51:40,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=168620.0, ans=0.0 2024-09-17 06:52:00,292 INFO [train.py:1198] (1/2) Epoch 10, batch 1450, loss[loss=0.2803, ctc_loss=0.1865, cr_loss=0.4393, attn_decoder_loss=0.2809, over 29461.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1739, cr_loss=0.4099, attn_decoder_loss=0.2691, over 5803037.05 frames. ], batch size: 94, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:52:03,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=168700.0, ans=0.0 2024-09-17 06:52:12,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=168700.0, ans=0.125 2024-09-17 06:52:26,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=168740.0, ans=0.125 2024-09-17 06:52:26,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.32 vs. limit=22.5 2024-09-17 06:52:27,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=168740.0, ans=0.2 2024-09-17 06:52:52,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=168820.0, ans=0.125 2024-09-17 06:52:56,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2024-09-17 06:53:02,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2024-09-17 06:53:06,096 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.081e+01 9.586e+01 1.053e+02 1.129e+02 3.740e+02, threshold=2.106e+02, percent-clipped=3.0 2024-09-17 06:53:18,281 INFO [train.py:1198] (1/2) Epoch 10, batch 1500, loss[loss=0.2741, ctc_loss=0.1791, cr_loss=0.4094, attn_decoder_loss=0.2755, over 29630.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1742, cr_loss=0.4102, attn_decoder_loss=0.2698, over 5804479.03 frames. ], batch size: 86, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:53:34,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2024-09-17 06:53:37,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=22.5 2024-09-17 06:53:45,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=168940.0, ans=0.0 2024-09-17 06:53:45,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=168940.0, ans=0.0 2024-09-17 06:53:49,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=12.0 2024-09-17 06:53:55,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=168980.0, ans=0.1 2024-09-17 06:53:56,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=168980.0, ans=0.125 2024-09-17 06:53:58,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=168980.0, ans=0.0 2024-09-17 06:54:00,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2024-09-17 06:54:02,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=169020.0, ans=0.1 2024-09-17 06:54:34,393 INFO [train.py:1198] (1/2) Epoch 10, batch 1550, loss[loss=0.2788, ctc_loss=0.1858, cr_loss=0.4233, attn_decoder_loss=0.2798, over 29521.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1749, cr_loss=0.4106, attn_decoder_loss=0.2699, over 5780267.15 frames. ], batch size: 90, lr: 1.13e-02, grad_scale: 4.0 2024-09-17 06:54:39,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=169100.0, ans=0.125 2024-09-17 06:54:43,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169100.0, ans=0.1 2024-09-17 06:55:00,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=169140.0, ans=0.2 2024-09-17 06:55:11,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-09-17 06:55:17,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.63 vs. limit=22.5 2024-09-17 06:55:18,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=169220.0, ans=0.125 2024-09-17 06:55:22,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=169220.0, ans=0.125 2024-09-17 06:55:25,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=169220.0, ans=0.125 2024-09-17 06:55:26,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2024-09-17 06:55:41,103 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.001e+01 9.536e+01 1.067e+02 1.173e+02 2.612e+02, threshold=2.133e+02, percent-clipped=1.0 2024-09-17 06:55:44,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=169260.0, ans=0.2 2024-09-17 06:55:49,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=169260.0, ans=0.07 2024-09-17 06:55:51,703 INFO [train.py:1198] (1/2) Epoch 10, batch 1600, loss[loss=0.2663, ctc_loss=0.1693, cr_loss=0.403, attn_decoder_loss=0.2681, over 29666.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1742, cr_loss=0.4089, attn_decoder_loss=0.269, over 5763615.60 frames. ], batch size: 85, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:55:53,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=169300.0, ans=0.0 2024-09-17 06:55:55,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=12.0 2024-09-17 06:56:01,079 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:56:02,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=169300.0, ans=0.125 2024-09-17 06:56:05,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=169340.0, ans=0.04949747468305833 2024-09-17 06:56:58,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.22 vs. limit=15.0 2024-09-17 06:57:09,468 INFO [train.py:1198] (1/2) Epoch 10, batch 1650, loss[loss=0.2692, ctc_loss=0.1667, cr_loss=0.3886, attn_decoder_loss=0.272, over 29709.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.1737, cr_loss=0.4076, attn_decoder_loss=0.2687, over 5757712.77 frames. ], batch size: 89, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:57:13,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=169500.0, ans=0.04949747468305833 2024-09-17 06:57:20,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=15.0 2024-09-17 06:57:36,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=169540.0, ans=0.125 2024-09-17 06:57:53,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=169620.0, ans=0.125 2024-09-17 06:58:12,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=169660.0, ans=0.125 2024-09-17 06:58:13,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169660.0, ans=0.1 2024-09-17 06:58:14,706 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.189e+01 9.368e+01 9.821e+01 1.048e+02 1.434e+02, threshold=1.964e+02, percent-clipped=0.0 2024-09-17 06:58:25,137 INFO [train.py:1198] (1/2) Epoch 10, batch 1700, loss[loss=0.2348, ctc_loss=0.1403, cr_loss=0.3623, attn_decoder_loss=0.2373, over 29525.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.173, cr_loss=0.4073, attn_decoder_loss=0.2681, over 5778835.17 frames. ], batch size: 69, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:58:32,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=169700.0, ans=0.125 2024-09-17 06:59:14,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=169820.0, ans=0.125 2024-09-17 06:59:22,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=169820.0, ans=0.0 2024-09-17 06:59:26,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=169860.0, ans=0.1 2024-09-17 06:59:32,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=169860.0, ans=0.125 2024-09-17 06:59:42,516 INFO [train.py:1198] (1/2) Epoch 10, batch 1750, loss[loss=0.2327, ctc_loss=0.1461, cr_loss=0.3728, attn_decoder_loss=0.2341, over 29320.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1726, cr_loss=0.407, attn_decoder_loss=0.2677, over 5786972.94 frames. ], batch size: 67, lr: 1.13e-02, grad_scale: 4.0 2024-09-17 07:00:04,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=169940.0, ans=0.95 2024-09-17 07:00:08,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=169940.0, ans=0.2 2024-09-17 07:00:10,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=169940.0, ans=0.125 2024-09-17 07:00:16,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=169980.0, ans=0.0 2024-09-17 07:00:25,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=169980.0, ans=0.0 2024-09-17 07:00:42,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=170020.0, ans=0.125 2024-09-17 07:00:43,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=170060.0, ans=0.1 2024-09-17 07:00:51,161 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 9.325e+01 9.999e+01 1.093e+02 1.950e+02, threshold=2.000e+02, percent-clipped=0.0 2024-09-17 07:01:00,147 INFO [train.py:1198] (1/2) Epoch 10, batch 1800, loss[loss=0.2683, ctc_loss=0.1698, cr_loss=0.4222, attn_decoder_loss=0.2699, over 29694.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.1728, cr_loss=0.4072, attn_decoder_loss=0.2679, over 5791119.93 frames. ], batch size: 83, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:01:00,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=170100.0, ans=0.95 2024-09-17 07:01:02,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=170100.0, ans=0.125 2024-09-17 07:01:19,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=12.0 2024-09-17 07:01:21,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=170140.0, ans=0.1 2024-09-17 07:02:09,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=170260.0, ans=0.125 2024-09-17 07:02:15,931 INFO [train.py:1198] (1/2) Epoch 10, batch 1850, loss[loss=0.2858, ctc_loss=0.1942, cr_loss=0.439, attn_decoder_loss=0.2862, over 29614.00 frames. ], tot_loss[loss=0.2661, ctc_loss=0.1723, cr_loss=0.4062, attn_decoder_loss=0.2675, over 5798564.45 frames. ], batch size: 86, lr: 1.13e-02, grad_scale: 4.0 2024-09-17 07:02:31,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2024-09-17 07:02:40,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=170340.0, ans=0.125 2024-09-17 07:02:43,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170340.0, ans=0.1 2024-09-17 07:03:25,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.88 vs. limit=10.0 2024-09-17 07:03:26,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.871e+01 9.382e+01 1.051e+02 1.159e+02 3.606e+02, threshold=2.101e+02, percent-clipped=3.0 2024-09-17 07:03:33,539 INFO [train.py:1198] (1/2) Epoch 10, batch 1900, loss[loss=0.2847, ctc_loss=0.1928, cr_loss=0.4402, attn_decoder_loss=0.2851, over 29712.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1729, cr_loss=0.4074, attn_decoder_loss=0.2684, over 5805886.54 frames. ], batch size: 89, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:03:39,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170500.0, ans=0.1 2024-09-17 07:03:41,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=170500.0, ans=0.025 2024-09-17 07:03:45,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=170500.0, ans=0.025 2024-09-17 07:03:45,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=170500.0, ans=0.125 2024-09-17 07:03:51,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=170540.0, ans=0.125 2024-09-17 07:04:15,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=170580.0, ans=0.125 2024-09-17 07:04:15,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=170580.0, ans=0.025 2024-09-17 07:04:16,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=170580.0, ans=0.0 2024-09-17 07:04:19,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=170620.0, ans=0.125 2024-09-17 07:04:20,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-09-17 07:04:38,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=170660.0, ans=0.125 2024-09-17 07:04:41,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170660.0, ans=0.1 2024-09-17 07:04:49,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2024-09-17 07:04:51,552 INFO [train.py:1198] (1/2) Epoch 10, batch 1950, loss[loss=0.2549, ctc_loss=0.1658, cr_loss=0.4143, attn_decoder_loss=0.2556, over 29461.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1736, cr_loss=0.4096, attn_decoder_loss=0.2698, over 5820919.56 frames. ], batch size: 78, lr: 1.13e-02, grad_scale: 4.0 2024-09-17 07:04:59,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170700.0, ans=0.1 2024-09-17 07:05:14,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=170740.0, ans=0.0 2024-09-17 07:05:41,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=170820.0, ans=0.125 2024-09-17 07:05:44,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2024-09-17 07:05:46,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.09 vs. limit=15.0 2024-09-17 07:05:50,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-09-17 07:06:00,805 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.428e+01 1.003e+02 1.077e+02 1.161e+02 3.833e+02, threshold=2.155e+02, percent-clipped=2.0 2024-09-17 07:06:06,881 INFO [train.py:1198] (1/2) Epoch 10, batch 2000, loss[loss=0.2412, ctc_loss=0.1553, cr_loss=0.3638, attn_decoder_loss=0.2427, over 29332.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1746, cr_loss=0.4105, attn_decoder_loss=0.2704, over 5797978.47 frames. ], batch size: 67, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:06:22,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=170940.0, ans=0.125 2024-09-17 07:06:25,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=170940.0, ans=0.125 2024-09-17 07:06:50,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=171020.0, ans=0.0 2024-09-17 07:07:14,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=171060.0, ans=0.0 2024-09-17 07:07:24,544 INFO [train.py:1198] (1/2) Epoch 10, batch 2050, loss[loss=0.2321, ctc_loss=0.1378, cr_loss=0.3595, attn_decoder_loss=0.2346, over 29416.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1736, cr_loss=0.409, attn_decoder_loss=0.2691, over 5790717.10 frames. ], batch size: 70, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:07:26,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=171100.0, ans=0.0 2024-09-17 07:07:26,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=171100.0, ans=10.0 2024-09-17 07:07:35,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=171100.0, ans=0.2 2024-09-17 07:07:42,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=171140.0, ans=0.0 2024-09-17 07:07:53,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=171180.0, ans=0.125 2024-09-17 07:08:04,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=171180.0, ans=0.07 2024-09-17 07:08:16,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=171220.0, ans=0.05 2024-09-17 07:08:37,667 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.251e+01 9.516e+01 1.015e+02 1.092e+02 1.956e+02, threshold=2.031e+02, percent-clipped=0.0 2024-09-17 07:08:42,375 INFO [train.py:1198] (1/2) Epoch 10, batch 2100, loss[loss=0.2699, ctc_loss=0.1696, cr_loss=0.4255, attn_decoder_loss=0.2716, over 29742.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.173, cr_loss=0.4091, attn_decoder_loss=0.2686, over 5801797.66 frames. ], batch size: 81, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:08:50,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.14 vs. limit=10.0 2024-09-17 07:09:07,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=171340.0, ans=0.125 2024-09-17 07:09:09,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=171340.0, ans=0.2 2024-09-17 07:09:15,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=171380.0, ans=0.0 2024-09-17 07:09:20,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=171380.0, ans=0.125 2024-09-17 07:09:26,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=171420.0, ans=0.0 2024-09-17 07:09:38,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=171420.0, ans=0.125 2024-09-17 07:09:46,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.89 vs. limit=15.0 2024-09-17 07:09:56,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=171500.0, ans=0.125 2024-09-17 07:09:57,458 INFO [train.py:1198] (1/2) Epoch 10, batch 2150, loss[loss=0.2614, ctc_loss=0.1813, cr_loss=0.4212, attn_decoder_loss=0.261, over 29453.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.172, cr_loss=0.407, attn_decoder_loss=0.2677, over 5815709.88 frames. ], batch size: 78, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:09:59,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-09-17 07:10:00,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=171500.0, ans=0.0 2024-09-17 07:10:03,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=171500.0, ans=0.1 2024-09-17 07:10:46,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=171620.0, ans=0.125 2024-09-17 07:11:11,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.141e+01 9.514e+01 1.011e+02 1.070e+02 3.193e+02, threshold=2.022e+02, percent-clipped=1.0 2024-09-17 07:11:15,098 INFO [train.py:1198] (1/2) Epoch 10, batch 2200, loss[loss=0.256, ctc_loss=0.1526, cr_loss=0.3716, attn_decoder_loss=0.2592, over 29641.00 frames. ], tot_loss[loss=0.2662, ctc_loss=0.172, cr_loss=0.4077, attn_decoder_loss=0.2676, over 5812581.93 frames. ], batch size: 86, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:11:33,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=171740.0, ans=0.125 2024-09-17 07:11:36,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=171740.0, ans=0.125 2024-09-17 07:11:48,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=171780.0, ans=0.025 2024-09-17 07:12:01,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=22.5 2024-09-17 07:12:08,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=171820.0, ans=0.1 2024-09-17 07:12:31,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=171900.0, ans=0.025 2024-09-17 07:12:32,671 INFO [train.py:1198] (1/2) Epoch 10, batch 2250, loss[loss=0.2848, ctc_loss=0.1871, cr_loss=0.4341, attn_decoder_loss=0.286, over 29720.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.1717, cr_loss=0.4074, attn_decoder_loss=0.2677, over 5811595.00 frames. ], batch size: 82, lr: 1.12e-02, grad_scale: 4.0 2024-09-17 07:12:40,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=171900.0, ans=0.125 2024-09-17 07:12:50,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2024-09-17 07:13:03,107 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:13:21,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=172020.0, ans=0.025 2024-09-17 07:13:24,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=172020.0, ans=0.2 2024-09-17 07:13:33,752 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:13:45,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=172060.0, ans=0.1 2024-09-17 07:13:46,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.071e+01 9.786e+01 1.069e+02 1.181e+02 2.871e+02, threshold=2.139e+02, percent-clipped=1.0 2024-09-17 07:13:48,472 INFO [train.py:1198] (1/2) Epoch 10, batch 2300, loss[loss=0.2506, ctc_loss=0.1583, cr_loss=0.3916, attn_decoder_loss=0.2521, over 29337.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1709, cr_loss=0.4057, attn_decoder_loss=0.2666, over 5799718.72 frames. ], batch size: 71, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:13:59,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=172100.0, ans=0.0 2024-09-17 07:14:18,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=172180.0, ans=0.125 2024-09-17 07:15:01,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=172260.0, ans=0.125 2024-09-17 07:15:05,738 INFO [train.py:1198] (1/2) Epoch 10, batch 2350, loss[loss=0.2767, ctc_loss=0.1782, cr_loss=0.4176, attn_decoder_loss=0.2784, over 29697.00 frames. ], tot_loss[loss=0.2654, ctc_loss=0.1713, cr_loss=0.4068, attn_decoder_loss=0.2669, over 5805206.50 frames. ], batch size: 83, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:15:14,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=172300.0, ans=0.0 2024-09-17 07:15:28,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=172340.0, ans=0.0 2024-09-17 07:16:00,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172420.0, ans=0.1 2024-09-17 07:16:04,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=172460.0, ans=0.0 2024-09-17 07:16:07,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=172460.0, ans=0.125 2024-09-17 07:16:21,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 9.247e+01 1.012e+02 1.078e+02 1.616e+02, threshold=2.023e+02, percent-clipped=0.0 2024-09-17 07:16:23,395 INFO [train.py:1198] (1/2) Epoch 10, batch 2400, loss[loss=0.2446, ctc_loss=0.1478, cr_loss=0.3746, attn_decoder_loss=0.2471, over 29539.00 frames. ], tot_loss[loss=0.2656, ctc_loss=0.1713, cr_loss=0.4073, attn_decoder_loss=0.2671, over 5808922.33 frames. ], batch size: 76, lr: 1.12e-02, grad_scale: 16.0 2024-09-17 07:16:46,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=172540.0, ans=0.125 2024-09-17 07:16:51,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=172540.0, ans=0.125 2024-09-17 07:16:55,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=172580.0, ans=0.125 2024-09-17 07:17:10,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172620.0, ans=0.1 2024-09-17 07:17:18,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=172620.0, ans=12.0 2024-09-17 07:17:35,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2024-09-17 07:17:39,182 INFO [train.py:1198] (1/2) Epoch 10, batch 2450, loss[loss=0.2767, ctc_loss=0.1812, cr_loss=0.4299, attn_decoder_loss=0.2778, over 29695.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1724, cr_loss=0.4085, attn_decoder_loss=0.268, over 5787679.01 frames. ], batch size: 82, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:17:43,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=172700.0, ans=0.2 2024-09-17 07:17:46,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172700.0, ans=0.1 2024-09-17 07:17:51,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=172700.0, ans=0.95 2024-09-17 07:18:06,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=172740.0, ans=0.125 2024-09-17 07:18:12,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=172780.0, ans=0.125 2024-09-17 07:18:32,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=172820.0, ans=0.125 2024-09-17 07:18:33,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2024-09-17 07:18:34,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=172820.0, ans=0.0 2024-09-17 07:18:44,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.02 vs. limit=15.0 2024-09-17 07:18:57,666 INFO [train.py:1198] (1/2) Epoch 10, batch 2500, loss[loss=0.2811, ctc_loss=0.1799, cr_loss=0.4183, attn_decoder_loss=0.2831, over 29643.00 frames. ], tot_loss[loss=0.2662, ctc_loss=0.1718, cr_loss=0.4082, attn_decoder_loss=0.2677, over 5796913.22 frames. ], batch size: 86, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:18:59,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.164e+01 9.501e+01 9.966e+01 1.113e+02 2.388e+02, threshold=1.993e+02, percent-clipped=1.0 2024-09-17 07:18:59,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=172900.0, ans=0.0 2024-09-17 07:19:05,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=172900.0, ans=0.0 2024-09-17 07:19:16,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=172940.0, ans=0.125 2024-09-17 07:19:20,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=172940.0, ans=0.0 2024-09-17 07:19:26,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.21 vs. limit=22.5 2024-09-17 07:19:31,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=172980.0, ans=0.2 2024-09-17 07:19:40,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=172980.0, ans=0.125 2024-09-17 07:20:15,466 INFO [train.py:1198] (1/2) Epoch 10, batch 2550, loss[loss=0.235, ctc_loss=0.1428, cr_loss=0.3513, attn_decoder_loss=0.2375, over 29349.00 frames. ], tot_loss[loss=0.2661, ctc_loss=0.1717, cr_loss=0.4081, attn_decoder_loss=0.2676, over 5800012.72 frames. ], batch size: 67, lr: 1.12e-02, grad_scale: 4.0 2024-09-17 07:20:24,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=173100.0, ans=10.0 2024-09-17 07:20:36,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173140.0, ans=0.1 2024-09-17 07:20:42,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=173140.0, ans=0.125 2024-09-17 07:20:56,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=173180.0, ans=0.125 2024-09-17 07:21:14,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=173260.0, ans=0.125 2024-09-17 07:21:20,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=173260.0, ans=0.2 2024-09-17 07:21:24,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.43 vs. limit=10.0 2024-09-17 07:21:27,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=173260.0, ans=0.0 2024-09-17 07:21:30,616 INFO [train.py:1198] (1/2) Epoch 10, batch 2600, loss[loss=0.2563, ctc_loss=0.1608, cr_loss=0.3915, attn_decoder_loss=0.2582, over 29425.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.172, cr_loss=0.4085, attn_decoder_loss=0.2682, over 5796088.88 frames. ], batch size: 78, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:21:33,520 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.476e+01 9.594e+01 1.032e+02 1.139e+02 3.672e+02, threshold=2.065e+02, percent-clipped=4.0 2024-09-17 07:21:36,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=173300.0, ans=0.0 2024-09-17 07:21:45,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=173340.0, ans=0.125 2024-09-17 07:22:02,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=173380.0, ans=0.05 2024-09-17 07:22:08,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=173380.0, ans=0.0 2024-09-17 07:22:14,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2024-09-17 07:22:17,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=173420.0, ans=0.125 2024-09-17 07:22:33,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=173460.0, ans=0.125 2024-09-17 07:22:45,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=173460.0, ans=0.0 2024-09-17 07:22:47,843 INFO [train.py:1198] (1/2) Epoch 10, batch 2650, loss[loss=0.2842, ctc_loss=0.1848, cr_loss=0.4075, attn_decoder_loss=0.2861, over 29328.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.1726, cr_loss=0.409, attn_decoder_loss=0.2687, over 5802258.91 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:22:51,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2024-09-17 07:22:52,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=173500.0, ans=0.125 2024-09-17 07:22:58,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=173500.0, ans=0.125 2024-09-17 07:22:58,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=173500.0, ans=0.025 2024-09-17 07:23:00,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=173500.0, ans=0.2 2024-09-17 07:23:00,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=173500.0, ans=0.0 2024-09-17 07:23:06,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=173540.0, ans=0.125 2024-09-17 07:23:11,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-09-17 07:23:28,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=173580.0, ans=0.2 2024-09-17 07:24:03,243 INFO [train.py:1198] (1/2) Epoch 10, batch 2700, loss[loss=0.2721, ctc_loss=0.1707, cr_loss=0.4097, attn_decoder_loss=0.2743, over 29527.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1726, cr_loss=0.4086, attn_decoder_loss=0.2685, over 5798586.31 frames. ], batch size: 87, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:24:08,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 9.630e+01 1.023e+02 1.091e+02 1.557e+02, threshold=2.045e+02, percent-clipped=0.0 2024-09-17 07:24:20,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=173740.0, ans=0.0 2024-09-17 07:24:33,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.01 vs. limit=22.5 2024-09-17 07:24:40,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=173780.0, ans=0.0 2024-09-17 07:24:51,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=173820.0, ans=0.125 2024-09-17 07:25:21,531 INFO [train.py:1198] (1/2) Epoch 10, batch 2750, loss[loss=0.2648, ctc_loss=0.1701, cr_loss=0.4362, attn_decoder_loss=0.2657, over 29519.00 frames. ], tot_loss[loss=0.266, ctc_loss=0.1717, cr_loss=0.4077, attn_decoder_loss=0.2674, over 5796601.82 frames. ], batch size: 75, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:25:26,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=173900.0, ans=0.125 2024-09-17 07:25:28,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=173900.0, ans=0.07 2024-09-17 07:25:37,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=173940.0, ans=0.1 2024-09-17 07:25:40,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=173940.0, ans=0.5 2024-09-17 07:25:47,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=173940.0, ans=0.09899494936611666 2024-09-17 07:25:49,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=173940.0, ans=0.125 2024-09-17 07:25:52,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.17 vs. limit=15.0 2024-09-17 07:25:57,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=173980.0, ans=0.0 2024-09-17 07:25:57,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=173980.0, ans=0.125 2024-09-17 07:25:59,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=173980.0, ans=0.0 2024-09-17 07:26:03,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=173980.0, ans=0.07 2024-09-17 07:26:19,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=174020.0, ans=0.125 2024-09-17 07:26:19,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2024-09-17 07:26:31,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174060.0, ans=0.1 2024-09-17 07:26:39,175 INFO [train.py:1198] (1/2) Epoch 10, batch 2800, loss[loss=0.308, ctc_loss=0.2337, cr_loss=0.4241, attn_decoder_loss=0.3068, over 20430.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1723, cr_loss=0.4083, attn_decoder_loss=0.2678, over 5777796.80 frames. ], batch size: 211, lr: 1.12e-02, grad_scale: 16.0 2024-09-17 07:26:42,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=174100.0, ans=0.2 2024-09-17 07:26:43,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.595e+01 1.029e+02 1.148e+02 1.291e+02 2.335e+02, threshold=2.295e+02, percent-clipped=2.0 2024-09-17 07:27:02,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=15.0 2024-09-17 07:27:32,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=174220.0, ans=0.125 2024-09-17 07:27:47,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=174260.0, ans=0.125 2024-09-17 07:27:54,278 INFO [train.py:1198] (1/2) Epoch 10, batch 2850, loss[loss=0.2553, ctc_loss=0.173, cr_loss=0.3978, attn_decoder_loss=0.2556, over 29521.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1733, cr_loss=0.4086, attn_decoder_loss=0.2686, over 5762432.17 frames. ], batch size: 77, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:27:58,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=174300.0, ans=0.125 2024-09-17 07:28:17,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=174340.0, ans=0.125 2024-09-17 07:28:23,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174340.0, ans=0.1 2024-09-17 07:28:24,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-09-17 07:28:33,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=174380.0, ans=10.0 2024-09-17 07:29:03,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174460.0, ans=0.1 2024-09-17 07:29:12,373 INFO [train.py:1198] (1/2) Epoch 10, batch 2900, loss[loss=0.2511, ctc_loss=0.151, cr_loss=0.3818, attn_decoder_loss=0.2537, over 29429.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1731, cr_loss=0.4097, attn_decoder_loss=0.2693, over 5788283.80 frames. ], batch size: 79, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:29:14,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=174500.0, ans=0.125 2024-09-17 07:29:15,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=174500.0, ans=0.0 2024-09-17 07:29:18,303 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.890e+01 9.394e+01 1.011e+02 1.079e+02 3.902e+02, threshold=2.022e+02, percent-clipped=2.0 2024-09-17 07:29:31,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=30.26 vs. limit=22.5 2024-09-17 07:29:33,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=174540.0, ans=0.2 2024-09-17 07:29:40,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=174540.0, ans=0.025 2024-09-17 07:29:43,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=174580.0, ans=0.0 2024-09-17 07:29:44,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174580.0, ans=0.1 2024-09-17 07:30:19,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174660.0, ans=0.1 2024-09-17 07:30:21,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174660.0, ans=0.1 2024-09-17 07:30:30,072 INFO [train.py:1198] (1/2) Epoch 10, batch 2950, loss[loss=0.2522, ctc_loss=0.1587, cr_loss=0.3832, attn_decoder_loss=0.2541, over 29513.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.172, cr_loss=0.4075, attn_decoder_loss=0.268, over 5782025.69 frames. ], batch size: 75, lr: 1.11e-02, grad_scale: 4.0 2024-09-17 07:30:31,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2024-09-17 07:30:52,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.20 vs. limit=10.0 2024-09-17 07:31:19,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=174820.0, ans=0.025 2024-09-17 07:31:44,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-17 07:31:46,170 INFO [train.py:1198] (1/2) Epoch 10, batch 3000, loss[loss=0.2674, ctc_loss=0.1736, cr_loss=0.4124, attn_decoder_loss=0.2687, over 29751.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.172, cr_loss=0.4076, attn_decoder_loss=0.268, over 5783398.69 frames. ], batch size: 81, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:31:46,171 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 07:32:05,337 INFO [train.py:1230] (1/2) Epoch 10, validation: loss=0.2137, ctc_loss=0.04855, cr_loss=4.713e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-17 07:32:05,338 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 07:32:14,593 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.331e+01 9.561e+01 1.037e+02 1.121e+02 2.530e+02, threshold=2.075e+02, percent-clipped=2.0 2024-09-17 07:32:31,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=174940.0, ans=0.0 2024-09-17 07:32:38,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=174980.0, ans=0.125 2024-09-17 07:32:51,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=175020.0, ans=0.0 2024-09-17 07:32:54,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.49 vs. limit=12.0 2024-09-17 07:32:58,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=175020.0, ans=0.0 2024-09-17 07:33:07,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=175060.0, ans=0.0 2024-09-17 07:33:11,974 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:33:20,909 INFO [train.py:1198] (1/2) Epoch 10, batch 3050, loss[loss=0.2583, ctc_loss=0.1677, cr_loss=0.4123, attn_decoder_loss=0.2592, over 29516.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1732, cr_loss=0.4101, attn_decoder_loss=0.2688, over 5776939.34 frames. ], batch size: 76, lr: 1.11e-02, grad_scale: 4.0 2024-09-17 07:33:21,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=175100.0, ans=0.2 2024-09-17 07:33:35,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175140.0, ans=0.1 2024-09-17 07:33:58,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=175180.0, ans=0.04949747468305833 2024-09-17 07:34:16,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2024-09-17 07:34:27,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=175260.0, ans=0.2 2024-09-17 07:34:31,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=175260.0, ans=0.09899494936611666 2024-09-17 07:34:36,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=175260.0, ans=0.07 2024-09-17 07:34:39,151 INFO [train.py:1198] (1/2) Epoch 10, batch 3100, loss[loss=0.2922, ctc_loss=0.2042, cr_loss=0.4705, attn_decoder_loss=0.2915, over 29249.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1727, cr_loss=0.4091, attn_decoder_loss=0.2684, over 5776896.17 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:34:48,268 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.394e+01 9.542e+01 1.021e+02 1.174e+02 1.946e+02, threshold=2.041e+02, percent-clipped=0.0 2024-09-17 07:34:51,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175300.0, ans=0.1 2024-09-17 07:34:52,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2024-09-17 07:35:44,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=175460.0, ans=0.2 2024-09-17 07:35:52,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=175460.0, ans=0.125 2024-09-17 07:35:53,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.89 vs. limit=10.0 2024-09-17 07:35:57,191 INFO [train.py:1198] (1/2) Epoch 10, batch 3150, loss[loss=0.2865, ctc_loss=0.1849, cr_loss=0.4372, attn_decoder_loss=0.288, over 28886.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1724, cr_loss=0.4084, attn_decoder_loss=0.2683, over 5782204.40 frames. ], batch size: 104, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:36:15,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=175540.0, ans=0.0 2024-09-17 07:36:21,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=175540.0, ans=0.0 2024-09-17 07:36:24,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=175540.0, ans=0.1 2024-09-17 07:36:29,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.24 vs. limit=15.0 2024-09-17 07:36:41,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-17 07:37:06,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=175660.0, ans=0.125 2024-09-17 07:37:12,407 INFO [train.py:1198] (1/2) Epoch 10, batch 3200, loss[loss=0.2718, ctc_loss=0.1811, cr_loss=0.4361, attn_decoder_loss=0.2722, over 29420.00 frames. ], tot_loss[loss=0.2662, ctc_loss=0.1717, cr_loss=0.4079, attn_decoder_loss=0.2676, over 5791796.29 frames. ], batch size: 79, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:37:24,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.118e+01 9.421e+01 9.970e+01 1.120e+02 1.872e+02, threshold=1.994e+02, percent-clipped=0.0 2024-09-17 07:37:37,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=175740.0, ans=0.025 2024-09-17 07:37:43,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=175780.0, ans=0.0 2024-09-17 07:37:52,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=175780.0, ans=10.0 2024-09-17 07:37:54,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=175780.0, ans=0.125 2024-09-17 07:37:55,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=175780.0, ans=0.0 2024-09-17 07:38:17,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.54 vs. limit=22.5 2024-09-17 07:38:30,165 INFO [train.py:1198] (1/2) Epoch 10, batch 3250, loss[loss=0.2771, ctc_loss=0.1855, cr_loss=0.4442, attn_decoder_loss=0.2774, over 29689.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1721, cr_loss=0.4089, attn_decoder_loss=0.2682, over 5798572.66 frames. ], batch size: 84, lr: 1.11e-02, grad_scale: 4.0 2024-09-17 07:38:33,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=175900.0, ans=0.2 2024-09-17 07:38:39,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=175900.0, ans=0.125 2024-09-17 07:38:48,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=175940.0, ans=0.125 2024-09-17 07:38:50,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=175940.0, ans=0.0 2024-09-17 07:38:57,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175940.0, ans=0.1 2024-09-17 07:39:48,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=176060.0, ans=0.0 2024-09-17 07:39:54,684 INFO [train.py:1198] (1/2) Epoch 10, batch 3300, loss[loss=0.2743, ctc_loss=0.1732, cr_loss=0.4103, attn_decoder_loss=0.2764, over 28224.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1707, cr_loss=0.4067, attn_decoder_loss=0.2666, over 5795791.41 frames. ], batch size: 111, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:40:06,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 9.364e+01 1.005e+02 1.120e+02 3.139e+02, threshold=2.009e+02, percent-clipped=4.0 2024-09-17 07:40:26,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=176180.0, ans=0.125 2024-09-17 07:40:29,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=176180.0, ans=0.125 2024-09-17 07:40:53,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=176260.0, ans=0.1 2024-09-17 07:41:04,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=176260.0, ans=0.95 2024-09-17 07:41:09,879 INFO [train.py:1198] (1/2) Epoch 10, batch 3350, loss[loss=0.2804, ctc_loss=0.1829, cr_loss=0.4227, attn_decoder_loss=0.2819, over 28885.00 frames. ], tot_loss[loss=0.266, ctc_loss=0.1718, cr_loss=0.4078, attn_decoder_loss=0.2674, over 5772120.76 frames. ], batch size: 104, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:41:14,873 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:41:23,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=176340.0, ans=0.0 2024-09-17 07:41:28,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=176340.0, ans=0.125 2024-09-17 07:41:28,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=176340.0, ans=0.125 2024-09-17 07:41:31,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=176340.0, ans=0.0 2024-09-17 07:41:45,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=176380.0, ans=0.0 2024-09-17 07:41:46,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=176380.0, ans=0.0 2024-09-17 07:42:00,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-09-17 07:42:06,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=176420.0, ans=0.125 2024-09-17 07:42:19,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=176460.0, ans=0.125 2024-09-17 07:42:27,210 INFO [train.py:1198] (1/2) Epoch 10, batch 3400, loss[loss=0.2318, ctc_loss=0.1392, cr_loss=0.3345, attn_decoder_loss=0.2346, over 29337.00 frames. ], tot_loss[loss=0.2661, ctc_loss=0.1721, cr_loss=0.4083, attn_decoder_loss=0.2675, over 5764395.54 frames. ], batch size: 67, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:42:30,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=176500.0, ans=0.07 2024-09-17 07:42:33,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.89 vs. limit=10.0 2024-09-17 07:42:34,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=176500.0, ans=0.0 2024-09-17 07:42:39,316 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.353e+01 9.300e+01 1.006e+02 1.112e+02 2.316e+02, threshold=2.013e+02, percent-clipped=1.0 2024-09-17 07:42:48,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=176540.0, ans=0.025 2024-09-17 07:42:54,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=176540.0, ans=0.07 2024-09-17 07:42:59,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=176580.0, ans=0.0 2024-09-17 07:43:02,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=176580.0, ans=0.125 2024-09-17 07:43:14,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=176620.0, ans=0.125 2024-09-17 07:43:29,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=176660.0, ans=0.125 2024-09-17 07:43:35,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=176660.0, ans=0.125 2024-09-17 07:43:44,828 INFO [train.py:1198] (1/2) Epoch 10, batch 3450, loss[loss=0.2731, ctc_loss=0.1741, cr_loss=0.3789, attn_decoder_loss=0.2756, over 28332.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1722, cr_loss=0.4084, attn_decoder_loss=0.2678, over 5773051.05 frames. ], batch size: 111, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:43:51,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=176700.0, ans=0.025 2024-09-17 07:43:51,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=176700.0, ans=0.0 2024-09-17 07:44:04,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=176740.0, ans=0.025 2024-09-17 07:44:07,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=176740.0, ans=0.125 2024-09-17 07:44:32,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=176820.0, ans=0.125 2024-09-17 07:44:40,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=176820.0, ans=15.0 2024-09-17 07:44:49,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=176860.0, ans=0.025 2024-09-17 07:45:00,784 INFO [train.py:1198] (1/2) Epoch 10, batch 3500, loss[loss=0.2388, ctc_loss=0.1473, cr_loss=0.3769, attn_decoder_loss=0.2406, over 29325.00 frames. ], tot_loss[loss=0.2659, ctc_loss=0.1719, cr_loss=0.4074, attn_decoder_loss=0.2673, over 5774958.14 frames. ], batch size: 71, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:45:07,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=176900.0, ans=0.125 2024-09-17 07:45:11,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=176900.0, ans=0.09899494936611666 2024-09-17 07:45:12,849 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 9.560e+01 1.051e+02 1.170e+02 3.242e+02, threshold=2.102e+02, percent-clipped=4.0 2024-09-17 07:45:23,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=176940.0, ans=0.025 2024-09-17 07:45:26,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=176940.0, ans=0.025 2024-09-17 07:45:29,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=176980.0, ans=0.05 2024-09-17 07:45:30,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176980.0, ans=0.1 2024-09-17 07:45:32,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=176980.0, ans=0.125 2024-09-17 07:46:08,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177060.0, ans=0.1 2024-09-17 07:46:17,481 INFO [train.py:1198] (1/2) Epoch 10, batch 3550, loss[loss=0.2724, ctc_loss=0.1705, cr_loss=0.4193, attn_decoder_loss=0.2744, over 29701.00 frames. ], tot_loss[loss=0.2654, ctc_loss=0.1711, cr_loss=0.4067, attn_decoder_loss=0.2669, over 5782597.60 frames. ], batch size: 89, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:46:24,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.28 vs. limit=10.0 2024-09-17 07:46:26,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=177100.0, ans=0.125 2024-09-17 07:46:31,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=177140.0, ans=0.125 2024-09-17 07:46:39,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=177140.0, ans=0.1 2024-09-17 07:46:44,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=177140.0, ans=0.125 2024-09-17 07:47:08,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=177220.0, ans=0.125 2024-09-17 07:47:23,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.93 vs. limit=15.0 2024-09-17 07:47:31,489 INFO [train.py:1198] (1/2) Epoch 10, batch 3600, loss[loss=0.2544, ctc_loss=0.1602, cr_loss=0.3855, attn_decoder_loss=0.2563, over 29501.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1714, cr_loss=0.4077, attn_decoder_loss=0.2671, over 5791413.26 frames. ], batch size: 77, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 07:47:31,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=177300.0, ans=0.125 2024-09-17 07:47:38,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=15.0 2024-09-17 07:47:44,996 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 9.296e+01 9.828e+01 1.086e+02 1.804e+02, threshold=1.966e+02, percent-clipped=0.0 2024-09-17 07:47:53,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=177340.0, ans=0.125 2024-09-17 07:47:54,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=177340.0, ans=0.0 2024-09-17 07:47:56,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-09-17 07:48:01,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=177380.0, ans=0.0 2024-09-17 07:48:31,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=177460.0, ans=0.125 2024-09-17 07:48:37,319 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:48:41,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=177460.0, ans=0.125 2024-09-17 07:48:41,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=177460.0, ans=0.0 2024-09-17 07:48:45,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.85 vs. limit=6.0 2024-09-17 07:48:45,924 INFO [train.py:1198] (1/2) Epoch 10, batch 3650, loss[loss=0.2825, ctc_loss=0.1866, cr_loss=0.436, attn_decoder_loss=0.2835, over 29519.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1708, cr_loss=0.4069, attn_decoder_loss=0.2667, over 5794887.10 frames. ], batch size: 90, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:49:15,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177540.0, ans=0.1 2024-09-17 07:49:27,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=177580.0, ans=0.0 2024-09-17 07:49:48,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=177660.0, ans=0.2 2024-09-17 07:49:51,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=177660.0, ans=0.09899494936611666 2024-09-17 07:50:03,437 INFO [train.py:1198] (1/2) Epoch 10, batch 3700, loss[loss=0.2774, ctc_loss=0.1794, cr_loss=0.4303, attn_decoder_loss=0.2788, over 29708.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.171, cr_loss=0.4076, attn_decoder_loss=0.2672, over 5805087.84 frames. ], batch size: 84, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:50:05,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=177700.0, ans=0.125 2024-09-17 07:50:11,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=177700.0, ans=0.0 2024-09-17 07:50:16,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.275e+01 9.841e+01 1.076e+02 3.002e+02, threshold=1.968e+02, percent-clipped=1.0 2024-09-17 07:50:18,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=177740.0, ans=0.2 2024-09-17 07:50:36,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=177780.0, ans=0.0 2024-09-17 07:50:37,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=177780.0, ans=0.125 2024-09-17 07:50:41,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2024-09-17 07:50:45,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=177780.0, ans=0.2 2024-09-17 07:50:49,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=177820.0, ans=0.125 2024-09-17 07:50:52,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=177820.0, ans=0.0 2024-09-17 07:51:17,406 INFO [train.py:1198] (1/2) Epoch 10, batch 3750, loss[loss=0.2383, ctc_loss=0.1473, cr_loss=0.3606, attn_decoder_loss=0.2404, over 29313.00 frames. ], tot_loss[loss=0.2655, ctc_loss=0.1705, cr_loss=0.4064, attn_decoder_loss=0.267, over 5808613.38 frames. ], batch size: 67, lr: 1.10e-02, grad_scale: 4.0 2024-09-17 07:51:20,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=177900.0, ans=0.0 2024-09-17 07:51:28,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=177900.0, ans=0.125 2024-09-17 07:51:38,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=177940.0, ans=0.0 2024-09-17 07:51:40,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=177940.0, ans=0.0 2024-09-17 07:51:41,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177940.0, ans=0.1 2024-09-17 07:51:46,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=177980.0, ans=0.125 2024-09-17 07:51:56,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=177980.0, ans=0.0 2024-09-17 07:52:02,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=178020.0, ans=0.125 2024-09-17 07:52:10,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=178020.0, ans=0.125 2024-09-17 07:52:26,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=178060.0, ans=0.0 2024-09-17 07:52:33,644 INFO [train.py:1198] (1/2) Epoch 10, batch 3800, loss[loss=0.2776, ctc_loss=0.1788, cr_loss=0.4378, attn_decoder_loss=0.2788, over 29618.00 frames. ], tot_loss[loss=0.2656, ctc_loss=0.1708, cr_loss=0.4069, attn_decoder_loss=0.267, over 5798302.90 frames. ], batch size: 86, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:52:48,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 9.594e+01 1.015e+02 1.096e+02 4.461e+02, threshold=2.030e+02, percent-clipped=1.0 2024-09-17 07:53:09,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=178180.0, ans=0.95 2024-09-17 07:53:48,160 INFO [train.py:1198] (1/2) Epoch 10, batch 3850, loss[loss=0.2811, ctc_loss=0.1847, cr_loss=0.4204, attn_decoder_loss=0.2824, over 29267.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1704, cr_loss=0.4071, attn_decoder_loss=0.2666, over 5812093.64 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:54:03,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=178340.0, ans=0.125 2024-09-17 07:54:07,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=178340.0, ans=0.2 2024-09-17 07:54:10,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=178340.0, ans=0.0 2024-09-17 07:54:14,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2024-09-17 07:54:31,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=178420.0, ans=0.0 2024-09-17 07:54:38,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=12.0 2024-09-17 07:54:45,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.81 vs. limit=22.5 2024-09-17 07:54:48,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=178460.0, ans=0.125 2024-09-17 07:54:57,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=178460.0, ans=0.125 2024-09-17 07:54:58,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=178460.0, ans=0.0 2024-09-17 07:55:04,011 INFO [train.py:1198] (1/2) Epoch 10, batch 3900, loss[loss=0.2697, ctc_loss=0.171, cr_loss=0.4147, attn_decoder_loss=0.2715, over 29634.00 frames. ], tot_loss[loss=0.2656, ctc_loss=0.1706, cr_loss=0.4077, attn_decoder_loss=0.2671, over 5816280.21 frames. ], batch size: 86, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:55:07,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=178500.0, ans=0.125 2024-09-17 07:55:13,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=178500.0, ans=0.0 2024-09-17 07:55:17,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=178540.0, ans=0.125 2024-09-17 07:55:20,367 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 9.621e+01 1.032e+02 1.104e+02 1.342e+02, threshold=2.064e+02, percent-clipped=0.0 2024-09-17 07:55:25,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=178540.0, ans=0.07 2024-09-17 07:55:28,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-09-17 07:55:46,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=178580.0, ans=0.125 2024-09-17 07:56:18,294 INFO [train.py:1198] (1/2) Epoch 10, batch 3950, loss[loss=0.2841, ctc_loss=0.1821, cr_loss=0.4238, attn_decoder_loss=0.2861, over 29433.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.17, cr_loss=0.4067, attn_decoder_loss=0.2668, over 5836318.59 frames. ], batch size: 97, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:56:28,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178700.0, ans=0.1 2024-09-17 07:56:37,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=178740.0, ans=0.125 2024-09-17 07:56:46,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=178780.0, ans=0.2 2024-09-17 07:56:48,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=178780.0, ans=0.125 2024-09-17 07:56:58,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178780.0, ans=0.1 2024-09-17 07:57:19,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-09-17 07:57:33,103 INFO [train.py:1198] (1/2) Epoch 10, batch 4000, loss[loss=0.2462, ctc_loss=0.1464, cr_loss=0.3637, attn_decoder_loss=0.2492, over 29522.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1701, cr_loss=0.4062, attn_decoder_loss=0.2668, over 5813538.08 frames. ], batch size: 74, lr: 1.10e-02, grad_scale: 16.0 2024-09-17 07:57:34,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=178900.0, ans=0.0 2024-09-17 07:57:39,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=178900.0, ans=0.0 2024-09-17 07:57:50,420 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 9.319e+01 1.030e+02 1.152e+02 2.635e+02, threshold=2.059e+02, percent-clipped=1.0 2024-09-17 07:57:55,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=178940.0, ans=0.5 2024-09-17 07:58:31,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.13 vs. limit=10.0 2024-09-17 07:58:32,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=179060.0, ans=0.125 2024-09-17 07:58:47,046 INFO [train.py:1198] (1/2) Epoch 10, batch 4050, loss[loss=0.3008, ctc_loss=0.2307, cr_loss=0.4624, attn_decoder_loss=0.2983, over 19903.00 frames. ], tot_loss[loss=0.2649, ctc_loss=0.1703, cr_loss=0.4058, attn_decoder_loss=0.2664, over 5796218.60 frames. ], batch size: 209, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:58:47,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=179100.0, ans=0.125 2024-09-17 07:59:02,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=179140.0, ans=0.125 2024-09-17 07:59:28,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=179180.0, ans=0.0 2024-09-17 08:00:01,775 INFO [train.py:1198] (1/2) Epoch 10, batch 4100, loss[loss=0.288, ctc_loss=0.1859, cr_loss=0.4276, attn_decoder_loss=0.2898, over 29537.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1706, cr_loss=0.4063, attn_decoder_loss=0.2665, over 5791019.28 frames. ], batch size: 90, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:00:20,761 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.192e+01 9.136e+01 9.895e+01 1.094e+02 2.839e+02, threshold=1.979e+02, percent-clipped=1.0 2024-09-17 08:00:25,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=179340.0, ans=0.1 2024-09-17 08:00:31,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=179380.0, ans=0.125 2024-09-17 08:00:33,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=179380.0, ans=0.0 2024-09-17 08:01:15,651 INFO [train.py:1198] (1/2) Epoch 10, batch 4150, loss[loss=0.2608, ctc_loss=0.173, cr_loss=0.4203, attn_decoder_loss=0.2612, over 29502.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1706, cr_loss=0.4072, attn_decoder_loss=0.2664, over 5796588.61 frames. ], batch size: 77, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:01:21,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=179500.0, ans=0.07 2024-09-17 08:01:31,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=179540.0, ans=0.125 2024-09-17 08:01:33,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=179540.0, ans=0.125 2024-09-17 08:01:42,107 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:01:47,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=179580.0, ans=0.125 2024-09-17 08:02:05,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=179620.0, ans=0.1 2024-09-17 08:02:30,746 INFO [train.py:1198] (1/2) Epoch 10, batch 4200, loss[loss=0.2852, ctc_loss=0.1895, cr_loss=0.4308, attn_decoder_loss=0.2862, over 29501.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1712, cr_loss=0.4078, attn_decoder_loss=0.2667, over 5798667.63 frames. ], batch size: 90, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:02:31,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=179700.0, ans=0.0 2024-09-17 08:02:34,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=179700.0, ans=0.125 2024-09-17 08:02:48,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2024-09-17 08:02:50,146 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 9.480e+01 1.011e+02 1.105e+02 3.367e+02, threshold=2.021e+02, percent-clipped=4.0 2024-09-17 08:02:54,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=179740.0, ans=0.05 2024-09-17 08:03:35,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=179860.0, ans=0.1 2024-09-17 08:03:37,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179860.0, ans=0.0 2024-09-17 08:03:44,285 INFO [train.py:1198] (1/2) Epoch 10, batch 4250, loss[loss=0.2449, ctc_loss=0.1505, cr_loss=0.3709, attn_decoder_loss=0.2471, over 29499.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1707, cr_loss=0.4073, attn_decoder_loss=0.2668, over 5804903.83 frames. ], batch size: 74, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:03:53,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-17 08:04:14,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=22.5 2024-09-17 08:04:34,610 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:04:34,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=180020.0, ans=0.0 2024-09-17 08:04:34,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=180020.0, ans=0.0 2024-09-17 08:04:36,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=180020.0, ans=0.025 2024-09-17 08:04:39,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.65 vs. limit=15.0 2024-09-17 08:04:43,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=180060.0, ans=0.125 2024-09-17 08:04:43,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=180060.0, ans=0.0 2024-09-17 08:04:57,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=180060.0, ans=0.125 2024-09-17 08:05:00,431 INFO [train.py:1198] (1/2) Epoch 10, batch 4300, loss[loss=0.2835, ctc_loss=0.201, cr_loss=0.4693, attn_decoder_loss=0.2822, over 29520.00 frames. ], tot_loss[loss=0.2654, ctc_loss=0.171, cr_loss=0.4075, attn_decoder_loss=0.2669, over 5795377.84 frames. ], batch size: 87, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:05:19,787 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 9.707e+01 1.038e+02 1.136e+02 2.980e+02, threshold=2.076e+02, percent-clipped=1.0 2024-09-17 08:05:20,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180140.0, ans=0.1 2024-09-17 08:05:34,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180180.0, ans=0.1 2024-09-17 08:06:14,234 INFO [train.py:1198] (1/2) Epoch 10, batch 4350, loss[loss=0.2905, ctc_loss=0.1923, cr_loss=0.4586, attn_decoder_loss=0.2913, over 29495.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.174, cr_loss=0.4131, attn_decoder_loss=0.2704, over 5797375.18 frames. ], batch size: 97, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:06:17,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=180300.0, ans=0.125 2024-09-17 08:06:18,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=180300.0, ans=0.0 2024-09-17 08:07:04,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=180420.0, ans=0.0 2024-09-17 08:07:12,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.18 vs. limit=15.0 2024-09-17 08:07:19,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.72 vs. limit=15.0 2024-09-17 08:07:25,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=180460.0, ans=0.0 2024-09-17 08:07:27,839 INFO [train.py:1198] (1/2) Epoch 10, batch 4400, loss[loss=0.2856, ctc_loss=0.1895, cr_loss=0.4459, attn_decoder_loss=0.2864, over 27368.00 frames. ], tot_loss[loss=0.2713, ctc_loss=0.1757, cr_loss=0.4154, attn_decoder_loss=0.2726, over 5766930.51 frames. ], batch size: 124, lr: 1.10e-02, grad_scale: 16.0 2024-09-17 08:07:33,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=180500.0, ans=10.0 2024-09-17 08:07:47,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180540.0, ans=0.1 2024-09-17 08:07:48,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.64 vs. limit=15.0 2024-09-17 08:07:48,753 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.847e+01 9.762e+01 1.026e+02 1.096e+02 2.982e+02, threshold=2.053e+02, percent-clipped=1.0 2024-09-17 08:07:59,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff3.min_abs, batch_count=180580.0, ans=0.2 2024-09-17 08:08:08,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=22.5 2024-09-17 08:08:13,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2024-09-17 08:08:20,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=180620.0, ans=0.2 2024-09-17 08:08:23,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-09-17 08:08:37,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=180660.0, ans=0.125 2024-09-17 08:08:41,979 INFO [train.py:1198] (1/2) Epoch 10, batch 4450, loss[loss=0.3014, ctc_loss=0.2331, cr_loss=0.4711, attn_decoder_loss=0.2985, over 20832.00 frames. ], tot_loss[loss=0.2746, ctc_loss=0.1812, cr_loss=0.4194, attn_decoder_loss=0.2756, over 5573756.38 frames. ], batch size: 212, lr: 1.10e-02, grad_scale: 4.0 2024-09-17 08:08:49,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=180700.0, ans=0.0 2024-09-17 08:08:52,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=180700.0, ans=0.0 2024-09-17 08:08:59,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=180740.0, ans=0.0 2024-09-17 08:09:05,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=180740.0, ans=0.0 2024-09-17 08:09:10,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=180740.0, ans=0.0 2024-09-17 08:09:16,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=180780.0, ans=0.0 2024-09-17 08:09:52,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-17 08:09:58,569 INFO [train.py:1198] (1/2) Epoch 10, batch 4500, loss[loss=0.3014, ctc_loss=0.2292, cr_loss=0.4513, attn_decoder_loss=0.2994, over 20157.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1886, cr_loss=0.4221, attn_decoder_loss=0.2791, over 5233978.80 frames. ], batch size: 209, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:10:06,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=180900.0, ans=0.125 2024-09-17 08:10:21,314 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.766e+01 1.077e+02 1.142e+02 1.231e+02 1.732e+02, threshold=2.283e+02, percent-clipped=0.0 2024-09-17 08:10:27,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=180980.0, ans=0.125 2024-09-17 08:11:32,458 INFO [train.py:1198] (1/2) Epoch 11, batch 0, loss[loss=0.2518, ctc_loss=0.1452, cr_loss=0.3697, attn_decoder_loss=0.2554, over 29602.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1452, cr_loss=0.3697, attn_decoder_loss=0.2554, over 29602.00 frames. ], batch size: 73, lr: 1.05e-02, grad_scale: 16.0 2024-09-17 08:11:32,458 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 08:11:47,881 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8253, 3.5256, 3.8017, 3.8032], device='cuda:1') 2024-09-17 08:11:50,860 INFO [train.py:1230] (1/2) Epoch 11, validation: loss=0.2172, ctc_loss=0.0495, cr_loss=4.7e-15, attn_decoder_loss=0.2358, over 944034.00 frames. 2024-09-17 08:11:50,861 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 08:12:28,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=181080.0, ans=0.125 2024-09-17 08:12:47,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=181120.0, ans=0.2 2024-09-17 08:12:47,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=181120.0, ans=0.2 2024-09-17 08:12:52,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=181120.0, ans=0.125 2024-09-17 08:13:10,335 INFO [train.py:1198] (1/2) Epoch 11, batch 50, loss[loss=0.243, ctc_loss=0.1542, cr_loss=0.3818, attn_decoder_loss=0.2444, over 29436.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1744, cr_loss=0.4118, attn_decoder_loss=0.2704, over 1268509.70 frames. ], batch size: 70, lr: 1.05e-02, grad_scale: 8.0 2024-09-17 08:13:34,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=181240.0, ans=0.125 2024-09-17 08:14:03,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=181320.0, ans=0.2 2024-09-17 08:14:13,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 9.758e+01 1.123e+02 1.302e+02 1.602e+03, threshold=2.247e+02, percent-clipped=5.0 2024-09-17 08:14:19,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.53 vs. limit=15.0 2024-09-17 08:14:25,851 INFO [train.py:1198] (1/2) Epoch 11, batch 100, loss[loss=0.2565, ctc_loss=0.1687, cr_loss=0.4076, attn_decoder_loss=0.2571, over 29519.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1743, cr_loss=0.4128, attn_decoder_loss=0.2714, over 2252184.27 frames. ], batch size: 76, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:14:45,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=181440.0, ans=0.125 2024-09-17 08:14:49,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=181440.0, ans=0.0 2024-09-17 08:14:53,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=181440.0, ans=0.125 2024-09-17 08:14:56,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=181480.0, ans=0.025 2024-09-17 08:15:04,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.38 vs. limit=15.0 2024-09-17 08:15:06,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=181480.0, ans=0.125 2024-09-17 08:15:23,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-17 08:15:41,048 INFO [train.py:1198] (1/2) Epoch 11, batch 150, loss[loss=0.2363, ctc_loss=0.1475, cr_loss=0.3825, attn_decoder_loss=0.2377, over 29450.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1715, cr_loss=0.4092, attn_decoder_loss=0.2685, over 3047034.20 frames. ], batch size: 70, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:16:03,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=181640.0, ans=0.125 2024-09-17 08:16:04,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.32 vs. limit=22.5 2024-09-17 08:16:22,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=181680.0, ans=0.025 2024-09-17 08:16:27,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=181680.0, ans=0.125 2024-09-17 08:16:30,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181720.0, ans=0.1 2024-09-17 08:16:34,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=181720.0, ans=0.125 2024-09-17 08:16:49,224 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.120e+01 9.120e+01 9.727e+01 1.024e+02 1.360e+02, threshold=1.945e+02, percent-clipped=0.0 2024-09-17 08:16:52,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=181760.0, ans=0.0 2024-09-17 08:16:52,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=181760.0, ans=0.125 2024-09-17 08:17:01,150 INFO [train.py:1198] (1/2) Epoch 11, batch 200, loss[loss=0.2926, ctc_loss=0.1994, cr_loss=0.4648, attn_decoder_loss=0.2926, over 27139.00 frames. ], tot_loss[loss=0.2658, ctc_loss=0.1708, cr_loss=0.4089, attn_decoder_loss=0.2673, over 3657926.46 frames. ], batch size: 124, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:17:14,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181840.0, ans=0.1 2024-09-17 08:17:27,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.67 vs. limit=15.0 2024-09-17 08:17:57,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=181920.0, ans=0.0 2024-09-17 08:18:04,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=181960.0, ans=0.125 2024-09-17 08:18:05,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.48 vs. limit=15.0 2024-09-17 08:18:10,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=181960.0, ans=0.0 2024-09-17 08:18:14,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.73 vs. limit=15.0 2024-09-17 08:18:16,502 INFO [train.py:1198] (1/2) Epoch 11, batch 250, loss[loss=0.2837, ctc_loss=0.1916, cr_loss=0.4475, attn_decoder_loss=0.2839, over 29184.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1697, cr_loss=0.4071, attn_decoder_loss=0.2667, over 4140286.32 frames. ], batch size: 100, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:18:21,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=182000.0, ans=0.2 2024-09-17 08:18:27,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=182000.0, ans=0.125 2024-09-17 08:18:36,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182040.0, ans=0.1 2024-09-17 08:18:41,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=182040.0, ans=0.0 2024-09-17 08:18:44,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=182040.0, ans=0.0 2024-09-17 08:18:50,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=182080.0, ans=0.125 2024-09-17 08:19:11,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=182120.0, ans=0.125 2024-09-17 08:19:20,162 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 9.171e+01 1.004e+02 1.090e+02 1.755e+02, threshold=2.009e+02, percent-clipped=0.0 2024-09-17 08:19:25,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.49 vs. limit=12.0 2024-09-17 08:19:32,194 INFO [train.py:1198] (1/2) Epoch 11, batch 300, loss[loss=0.2833, ctc_loss=0.1806, cr_loss=0.4213, attn_decoder_loss=0.2854, over 29544.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1687, cr_loss=0.4059, attn_decoder_loss=0.266, over 4509720.39 frames. ], batch size: 92, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:19:45,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=182200.0, ans=0.2 2024-09-17 08:20:32,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=22.5 2024-09-17 08:20:38,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2024-09-17 08:20:52,576 INFO [train.py:1198] (1/2) Epoch 11, batch 350, loss[loss=0.2409, ctc_loss=0.1426, cr_loss=0.3626, attn_decoder_loss=0.2437, over 29348.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1693, cr_loss=0.4069, attn_decoder_loss=0.2666, over 4795498.04 frames. ], batch size: 71, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:21:03,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=182400.0, ans=0.0 2024-09-17 08:21:08,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=182440.0, ans=0.125 2024-09-17 08:21:12,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=182440.0, ans=0.125 2024-09-17 08:21:20,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=182440.0, ans=0.0 2024-09-17 08:21:21,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=182480.0, ans=0.125 2024-09-17 08:21:29,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=182480.0, ans=0.5 2024-09-17 08:21:38,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=182520.0, ans=0.125 2024-09-17 08:21:41,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=182520.0, ans=0.125 2024-09-17 08:21:55,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.030e+01 9.345e+01 9.958e+01 1.088e+02 1.726e+02, threshold=1.992e+02, percent-clipped=0.0 2024-09-17 08:21:57,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=182560.0, ans=0.2 2024-09-17 08:21:58,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=12.0 2024-09-17 08:22:07,838 INFO [train.py:1198] (1/2) Epoch 11, batch 400, loss[loss=0.2744, ctc_loss=0.1739, cr_loss=0.4184, attn_decoder_loss=0.2762, over 29706.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1689, cr_loss=0.4063, attn_decoder_loss=0.2662, over 5026337.10 frames. ], batch size: 82, lr: 1.04e-02, grad_scale: 16.0 2024-09-17 08:22:14,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=182600.0, ans=0.025 2024-09-17 08:22:27,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=182640.0, ans=0.0 2024-09-17 08:22:45,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=182680.0, ans=0.125 2024-09-17 08:23:03,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=182720.0, ans=0.025 2024-09-17 08:23:05,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182720.0, ans=0.1 2024-09-17 08:23:21,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.18 vs. limit=15.0 2024-09-17 08:23:23,311 INFO [train.py:1198] (1/2) Epoch 11, batch 450, loss[loss=0.277, ctc_loss=0.1679, cr_loss=0.4015, attn_decoder_loss=0.2801, over 29687.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1691, cr_loss=0.4067, attn_decoder_loss=0.2663, over 5187334.12 frames. ], batch size: 83, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:23:28,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=182800.0, ans=0.125 2024-09-17 08:23:40,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.04 vs. limit=10.0 2024-09-17 08:24:00,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=182880.0, ans=0.0 2024-09-17 08:24:06,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182880.0, ans=0.1 2024-09-17 08:24:11,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=182920.0, ans=0.125 2024-09-17 08:24:27,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.19 vs. limit=10.0 2024-09-17 08:24:29,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=12.0 2024-09-17 08:24:32,891 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 9.150e+01 9.879e+01 1.056e+02 3.994e+02, threshold=1.976e+02, percent-clipped=1.0 2024-09-17 08:24:36,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=182960.0, ans=0.1 2024-09-17 08:24:43,366 INFO [train.py:1198] (1/2) Epoch 11, batch 500, loss[loss=0.282, ctc_loss=0.1858, cr_loss=0.4298, attn_decoder_loss=0.2832, over 29488.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1679, cr_loss=0.4054, attn_decoder_loss=0.2652, over 5330994.26 frames. ], batch size: 94, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:25:01,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-09-17 08:25:24,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=183080.0, ans=0.125 2024-09-17 08:25:43,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2024-09-17 08:25:57,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=183200.0, ans=0.05 2024-09-17 08:25:59,363 INFO [train.py:1198] (1/2) Epoch 11, batch 550, loss[loss=0.2772, ctc_loss=0.1832, cr_loss=0.449, attn_decoder_loss=0.2777, over 28810.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1683, cr_loss=0.4054, attn_decoder_loss=0.2653, over 5423991.37 frames. ], batch size: 104, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:26:02,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=183200.0, ans=0.2 2024-09-17 08:26:22,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=183240.0, ans=0.0 2024-09-17 08:26:34,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=183280.0, ans=0.2 2024-09-17 08:26:39,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=183280.0, ans=0.025 2024-09-17 08:26:43,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=183320.0, ans=0.125 2024-09-17 08:26:48,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=183320.0, ans=0.125 2024-09-17 08:27:04,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.204e+01 9.712e+01 1.043e+02 1.936e+02, threshold=1.942e+02, percent-clipped=0.0 2024-09-17 08:27:14,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=183400.0, ans=0.125 2024-09-17 08:27:15,708 INFO [train.py:1198] (1/2) Epoch 11, batch 600, loss[loss=0.2828, ctc_loss=0.1801, cr_loss=0.4211, attn_decoder_loss=0.2848, over 29303.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1681, cr_loss=0.4053, attn_decoder_loss=0.2655, over 5511059.45 frames. ], batch size: 100, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:27:16,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-09-17 08:27:17,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=183400.0, ans=0.125 2024-09-17 08:27:31,850 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:27:32,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2024-09-17 08:27:52,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=183480.0, ans=0.125 2024-09-17 08:27:55,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=183480.0, ans=0.2 2024-09-17 08:28:02,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.45 vs. limit=22.5 2024-09-17 08:28:21,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=183560.0, ans=0.2 2024-09-17 08:28:35,709 INFO [train.py:1198] (1/2) Epoch 11, batch 650, loss[loss=0.2645, ctc_loss=0.1683, cr_loss=0.3979, attn_decoder_loss=0.2663, over 29779.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1674, cr_loss=0.4045, attn_decoder_loss=0.2652, over 5587763.67 frames. ], batch size: 81, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:28:58,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=183640.0, ans=0.0 2024-09-17 08:29:15,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=183680.0, ans=0.0 2024-09-17 08:29:23,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=183720.0, ans=0.125 2024-09-17 08:29:32,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=183720.0, ans=0.125 2024-09-17 08:29:36,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=183760.0, ans=0.95 2024-09-17 08:29:42,306 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 9.127e+01 9.643e+01 1.047e+02 1.455e+02, threshold=1.929e+02, percent-clipped=0.0 2024-09-17 08:29:51,516 INFO [train.py:1198] (1/2) Epoch 11, batch 700, loss[loss=0.2428, ctc_loss=0.1393, cr_loss=0.3888, attn_decoder_loss=0.2456, over 29529.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1678, cr_loss=0.4054, attn_decoder_loss=0.2658, over 5638983.08 frames. ], batch size: 76, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:29:51,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=183800.0, ans=0.2 2024-09-17 08:30:05,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=183840.0, ans=0.125 2024-09-17 08:30:08,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=183840.0, ans=0.125 2024-09-17 08:30:22,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.03 vs. limit=22.5 2024-09-17 08:30:45,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=183920.0, ans=0.2 2024-09-17 08:31:08,136 INFO [train.py:1198] (1/2) Epoch 11, batch 750, loss[loss=0.2605, ctc_loss=0.1589, cr_loss=0.3909, attn_decoder_loss=0.2631, over 29726.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.168, cr_loss=0.4051, attn_decoder_loss=0.2656, over 5678097.60 frames. ], batch size: 82, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:31:13,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.27 vs. limit=22.5 2024-09-17 08:31:15,903 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:31:18,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=184000.0, ans=0.125 2024-09-17 08:31:18,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=184000.0, ans=0.09899494936611666 2024-09-17 08:31:20,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=184000.0, ans=0.0 2024-09-17 08:31:24,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=184040.0, ans=0.2 2024-09-17 08:31:50,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=22.5 2024-09-17 08:31:55,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=184120.0, ans=0.0 2024-09-17 08:31:56,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=12.0 2024-09-17 08:32:16,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.265e+01 9.471e+01 1.047e+02 1.151e+02 2.834e+02, threshold=2.094e+02, percent-clipped=4.0 2024-09-17 08:32:28,008 INFO [train.py:1198] (1/2) Epoch 11, batch 800, loss[loss=0.2284, ctc_loss=0.1386, cr_loss=0.368, attn_decoder_loss=0.2302, over 29609.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.168, cr_loss=0.4049, attn_decoder_loss=0.2653, over 5707625.00 frames. ], batch size: 73, lr: 1.04e-02, grad_scale: 16.0 2024-09-17 08:32:40,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=184200.0, ans=0.0 2024-09-17 08:32:43,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=184240.0, ans=0.07 2024-09-17 08:32:46,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=184240.0, ans=0.125 2024-09-17 08:33:09,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2024-09-17 08:33:22,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=184320.0, ans=0.025 2024-09-17 08:33:23,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=184320.0, ans=0.0 2024-09-17 08:33:38,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2024-09-17 08:33:39,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.09 vs. limit=15.0 2024-09-17 08:33:42,951 INFO [train.py:1198] (1/2) Epoch 11, batch 850, loss[loss=0.2721, ctc_loss=0.1698, cr_loss=0.3964, attn_decoder_loss=0.2747, over 29695.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1676, cr_loss=0.4046, attn_decoder_loss=0.2649, over 5736129.36 frames. ], batch size: 89, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:34:01,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=184440.0, ans=0.125 2024-09-17 08:34:13,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=184480.0, ans=0.125 2024-09-17 08:34:40,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=184520.0, ans=0.125 2024-09-17 08:34:45,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=184560.0, ans=0.125 2024-09-17 08:34:45,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184560.0, ans=0.1 2024-09-17 08:34:50,947 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.310e+01 9.399e+01 9.999e+01 1.067e+02 1.963e+02, threshold=2.000e+02, percent-clipped=0.0 2024-09-17 08:34:58,534 INFO [train.py:1198] (1/2) Epoch 11, batch 900, loss[loss=0.2379, ctc_loss=0.1428, cr_loss=0.3513, attn_decoder_loss=0.2407, over 29598.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1678, cr_loss=0.4048, attn_decoder_loss=0.2653, over 5740857.17 frames. ], batch size: 73, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:35:02,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=184600.0, ans=0.1 2024-09-17 08:35:07,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=184600.0, ans=0.125 2024-09-17 08:35:12,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=184640.0, ans=0.1 2024-09-17 08:35:52,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184720.0, ans=0.1 2024-09-17 08:35:52,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=184720.0, ans=0.125 2024-09-17 08:35:55,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184720.0, ans=0.1 2024-09-17 08:35:58,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=184720.0, ans=0.2 2024-09-17 08:36:04,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=184760.0, ans=0.125 2024-09-17 08:36:16,597 INFO [train.py:1198] (1/2) Epoch 11, batch 950, loss[loss=0.2477, ctc_loss=0.155, cr_loss=0.3791, attn_decoder_loss=0.2496, over 29524.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1687, cr_loss=0.4063, attn_decoder_loss=0.2661, over 5741322.70 frames. ], batch size: 74, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:37:05,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=184920.0, ans=0.025 2024-09-17 08:37:18,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=184960.0, ans=0.0 2024-09-17 08:37:26,876 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 9.626e+01 1.055e+02 1.179e+02 5.157e+02, threshold=2.111e+02, percent-clipped=4.0 2024-09-17 08:37:34,429 INFO [train.py:1198] (1/2) Epoch 11, batch 1000, loss[loss=0.2573, ctc_loss=0.1687, cr_loss=0.4266, attn_decoder_loss=0.2576, over 29507.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1694, cr_loss=0.4073, attn_decoder_loss=0.2667, over 5734439.55 frames. ], batch size: 77, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:37:34,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=185000.0, ans=0.2 2024-09-17 08:38:03,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=185080.0, ans=0.125 2024-09-17 08:38:05,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=185080.0, ans=0.125 2024-09-17 08:38:06,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=185080.0, ans=0.025 2024-09-17 08:38:12,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185080.0, ans=0.1 2024-09-17 08:38:18,899 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:38:45,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=185160.0, ans=0.2 2024-09-17 08:38:50,011 INFO [train.py:1198] (1/2) Epoch 11, batch 1050, loss[loss=0.276, ctc_loss=0.1705, cr_loss=0.421, attn_decoder_loss=0.2784, over 29682.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1689, cr_loss=0.4067, attn_decoder_loss=0.266, over 5744093.36 frames. ], batch size: 85, lr: 1.03e-02, grad_scale: 4.0 2024-09-17 08:39:27,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=185280.0, ans=0.125 2024-09-17 08:39:40,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=185320.0, ans=0.2 2024-09-17 08:39:46,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185320.0, ans=0.1 2024-09-17 08:39:58,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=185360.0, ans=0.2 2024-09-17 08:39:58,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=185360.0, ans=0.1 2024-09-17 08:40:01,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.188e+01 9.776e+01 1.031e+02 2.876e+02, threshold=1.955e+02, percent-clipped=0.0 2024-09-17 08:40:06,444 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:40:07,590 INFO [train.py:1198] (1/2) Epoch 11, batch 1100, loss[loss=0.2427, ctc_loss=0.1538, cr_loss=0.3891, attn_decoder_loss=0.2439, over 29453.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1681, cr_loss=0.4056, attn_decoder_loss=0.2652, over 5756130.37 frames. ], batch size: 78, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:40:09,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=185400.0, ans=0.125 2024-09-17 08:40:14,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2024-09-17 08:40:26,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=185440.0, ans=0.1 2024-09-17 08:40:28,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-09-17 08:40:34,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=185440.0, ans=0.125 2024-09-17 08:40:38,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=185480.0, ans=0.0 2024-09-17 08:40:40,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=185480.0, ans=0.2 2024-09-17 08:40:46,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2024-09-17 08:41:09,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=185560.0, ans=0.0 2024-09-17 08:41:25,822 INFO [train.py:1198] (1/2) Epoch 11, batch 1150, loss[loss=0.253, ctc_loss=0.1599, cr_loss=0.3815, attn_decoder_loss=0.2549, over 29420.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1682, cr_loss=0.4053, attn_decoder_loss=0.2653, over 5753344.86 frames. ], batch size: 78, lr: 1.03e-02, grad_scale: 4.0 2024-09-17 08:42:00,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.14 vs. limit=10.0 2024-09-17 08:42:12,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185720.0, ans=0.1 2024-09-17 08:42:21,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.25 vs. limit=12.0 2024-09-17 08:42:30,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=185760.0, ans=0.0 2024-09-17 08:42:37,685 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.117e+01 9.543e+01 1.008e+02 1.096e+02 1.940e+02, threshold=2.016e+02, percent-clipped=1.0 2024-09-17 08:42:40,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2024-09-17 08:42:42,178 INFO [train.py:1198] (1/2) Epoch 11, batch 1200, loss[loss=0.2759, ctc_loss=0.1699, cr_loss=0.4325, attn_decoder_loss=0.278, over 29688.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1688, cr_loss=0.406, attn_decoder_loss=0.2662, over 5745455.05 frames. ], batch size: 85, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:42:44,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2024-09-17 08:43:09,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=185840.0, ans=0.0 2024-09-17 08:43:18,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2024-09-17 08:43:21,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=185880.0, ans=0.125 2024-09-17 08:43:22,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=185880.0, ans=0.125 2024-09-17 08:44:00,293 INFO [train.py:1198] (1/2) Epoch 11, batch 1250, loss[loss=0.2915, ctc_loss=0.1988, cr_loss=0.4599, attn_decoder_loss=0.2916, over 29512.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.169, cr_loss=0.4071, attn_decoder_loss=0.2667, over 5774264.30 frames. ], batch size: 92, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:44:03,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=186000.0, ans=0.125 2024-09-17 08:44:18,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=186040.0, ans=0.2 2024-09-17 08:45:04,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=186160.0, ans=0.0 2024-09-17 08:45:06,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=186160.0, ans=0.125 2024-09-17 08:45:10,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=186160.0, ans=0.07 2024-09-17 08:45:13,658 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.030e+01 9.338e+01 9.972e+01 1.044e+02 2.073e+02, threshold=1.994e+02, percent-clipped=1.0 2024-09-17 08:45:18,173 INFO [train.py:1198] (1/2) Epoch 11, batch 1300, loss[loss=0.2853, ctc_loss=0.1926, cr_loss=0.4464, attn_decoder_loss=0.2857, over 28272.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1683, cr_loss=0.4056, attn_decoder_loss=0.2657, over 5778267.68 frames. ], batch size: 111, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:45:18,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=186200.0, ans=0.0 2024-09-17 08:45:20,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.61 vs. limit=15.0 2024-09-17 08:45:21,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=186200.0, ans=0.125 2024-09-17 08:45:22,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2024-09-17 08:45:40,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2024-09-17 08:45:56,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=186280.0, ans=0.125 2024-09-17 08:46:20,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=186360.0, ans=0.025 2024-09-17 08:46:21,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=186360.0, ans=0.025 2024-09-17 08:46:22,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-09-17 08:46:23,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2024-09-17 08:46:34,058 INFO [train.py:1198] (1/2) Epoch 11, batch 1350, loss[loss=0.2689, ctc_loss=0.1756, cr_loss=0.4326, attn_decoder_loss=0.2696, over 29780.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1681, cr_loss=0.4061, attn_decoder_loss=0.2657, over 5793990.71 frames. ], batch size: 81, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:46:50,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=186440.0, ans=0.1 2024-09-17 08:46:55,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=186440.0, ans=0.125 2024-09-17 08:47:06,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=186480.0, ans=0.125 2024-09-17 08:47:09,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=186480.0, ans=0.125 2024-09-17 08:47:30,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=186520.0, ans=0.0 2024-09-17 08:47:30,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=186520.0, ans=0.2 2024-09-17 08:47:32,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=186520.0, ans=10.0 2024-09-17 08:47:46,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.722e+01 1.047e+02 1.106e+02 2.453e+02, threshold=2.093e+02, percent-clipped=1.0 2024-09-17 08:47:51,379 INFO [train.py:1198] (1/2) Epoch 11, batch 1400, loss[loss=0.2278, ctc_loss=0.1386, cr_loss=0.3691, attn_decoder_loss=0.2295, over 29586.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1683, cr_loss=0.4067, attn_decoder_loss=0.2657, over 5805801.75 frames. ], batch size: 69, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:48:12,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186640.0, ans=0.1 2024-09-17 08:48:17,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=186640.0, ans=0.125 2024-09-17 08:48:24,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=186680.0, ans=0.09899494936611666 2024-09-17 08:48:28,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=186680.0, ans=0.09899494936611666 2024-09-17 08:48:30,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=186680.0, ans=0.0 2024-09-17 08:48:30,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=186680.0, ans=0.125 2024-09-17 08:48:43,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=186720.0, ans=0.0 2024-09-17 08:48:46,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=186720.0, ans=0.125 2024-09-17 08:48:51,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=186720.0, ans=0.0 2024-09-17 08:48:54,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=186760.0, ans=0.125 2024-09-17 08:48:55,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=186760.0, ans=0.125 2024-09-17 08:48:57,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186760.0, ans=0.1 2024-09-17 08:48:57,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=186760.0, ans=22.5 2024-09-17 08:49:09,111 INFO [train.py:1198] (1/2) Epoch 11, batch 1450, loss[loss=0.2882, ctc_loss=0.1885, cr_loss=0.431, attn_decoder_loss=0.2897, over 29401.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.168, cr_loss=0.406, attn_decoder_loss=0.2658, over 5802898.27 frames. ], batch size: 94, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:49:54,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=186920.0, ans=0.0 2024-09-17 08:50:00,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=186920.0, ans=0.2 2024-09-17 08:50:18,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=186960.0, ans=0.125 2024-09-17 08:50:21,154 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.456e+01 1.033e+02 1.133e+02 2.904e+02, threshold=2.066e+02, percent-clipped=2.0 2024-09-17 08:50:24,332 INFO [train.py:1198] (1/2) Epoch 11, batch 1500, loss[loss=0.2727, ctc_loss=0.1624, cr_loss=0.4107, attn_decoder_loss=0.2759, over 29623.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1682, cr_loss=0.4067, attn_decoder_loss=0.2662, over 5803430.61 frames. ], batch size: 86, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:50:24,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=187000.0, ans=0.0 2024-09-17 08:51:37,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=187160.0, ans=0.1 2024-09-17 08:51:43,025 INFO [train.py:1198] (1/2) Epoch 11, batch 1550, loss[loss=0.2855, ctc_loss=0.182, cr_loss=0.4345, attn_decoder_loss=0.2873, over 29511.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1687, cr_loss=0.4067, attn_decoder_loss=0.2664, over 5778779.43 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:51:43,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=187200.0, ans=22.5 2024-09-17 08:52:10,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=187240.0, ans=0.025 2024-09-17 08:52:40,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=187320.0, ans=0.125 2024-09-17 08:52:43,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=187320.0, ans=0.125 2024-09-17 08:52:52,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=187360.0, ans=0.125 2024-09-17 08:52:56,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=12.0 2024-09-17 08:52:57,788 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.021e+01 9.413e+01 1.008e+02 1.137e+02 5.479e+02, threshold=2.016e+02, percent-clipped=2.0 2024-09-17 08:53:00,774 INFO [train.py:1198] (1/2) Epoch 11, batch 1600, loss[loss=0.2695, ctc_loss=0.1608, cr_loss=0.3825, attn_decoder_loss=0.2731, over 29670.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1684, cr_loss=0.4062, attn_decoder_loss=0.2662, over 5761721.74 frames. ], batch size: 85, lr: 1.03e-02, grad_scale: 16.0 2024-09-17 08:53:05,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=187400.0, ans=0.0 2024-09-17 08:53:19,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187440.0, ans=0.1 2024-09-17 08:53:23,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=187440.0, ans=0.125 2024-09-17 08:53:28,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.71 vs. limit=15.0 2024-09-17 08:53:33,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=187480.0, ans=0.125 2024-09-17 08:53:40,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=187480.0, ans=0.1 2024-09-17 08:53:43,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2024-09-17 08:53:50,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=187520.0, ans=0.0 2024-09-17 08:53:53,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=187520.0, ans=0.2 2024-09-17 08:53:57,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2024-09-17 08:54:16,158 INFO [train.py:1198] (1/2) Epoch 11, batch 1650, loss[loss=0.2823, ctc_loss=0.1864, cr_loss=0.4475, attn_decoder_loss=0.283, over 29695.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1681, cr_loss=0.4055, attn_decoder_loss=0.2659, over 5757628.40 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:54:21,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=187600.0, ans=0.125 2024-09-17 08:55:13,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=187720.0, ans=0.0 2024-09-17 08:55:18,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.86 vs. limit=10.0 2024-09-17 08:55:32,180 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 9.133e+01 1.002e+02 1.058e+02 1.581e+02, threshold=2.003e+02, percent-clipped=0.0 2024-09-17 08:55:32,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=187800.0, ans=0.125 2024-09-17 08:55:33,699 INFO [train.py:1198] (1/2) Epoch 11, batch 1700, loss[loss=0.2356, ctc_loss=0.1452, cr_loss=0.3682, attn_decoder_loss=0.2374, over 29557.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1676, cr_loss=0.4051, attn_decoder_loss=0.2656, over 5778690.06 frames. ], batch size: 69, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:56:04,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=187880.0, ans=0.125 2024-09-17 08:56:17,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.96 vs. limit=15.0 2024-09-17 08:56:32,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.41 vs. limit=10.0 2024-09-17 08:56:47,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=187960.0, ans=0.125 2024-09-17 08:56:51,928 INFO [train.py:1198] (1/2) Epoch 11, batch 1750, loss[loss=0.2277, ctc_loss=0.1382, cr_loss=0.3388, attn_decoder_loss=0.2301, over 29300.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1671, cr_loss=0.405, attn_decoder_loss=0.2652, over 5784870.29 frames. ], batch size: 67, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:57:05,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188040.0, ans=0.1 2024-09-17 08:57:22,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=188080.0, ans=0.07 2024-09-17 08:57:42,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=188120.0, ans=0.0 2024-09-17 08:57:45,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=188120.0, ans=0.125 2024-09-17 08:57:48,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2024-09-17 08:58:04,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=188160.0, ans=0.0 2024-09-17 08:58:05,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.041e+01 9.226e+01 9.775e+01 1.040e+02 1.595e+02, threshold=1.955e+02, percent-clipped=0.0 2024-09-17 08:58:07,339 INFO [train.py:1198] (1/2) Epoch 11, batch 1800, loss[loss=0.2705, ctc_loss=0.1787, cr_loss=0.4382, attn_decoder_loss=0.271, over 29701.00 frames. ], tot_loss[loss=0.2634, ctc_loss=0.1671, cr_loss=0.4047, attn_decoder_loss=0.2651, over 5788729.85 frames. ], batch size: 83, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:58:31,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=188240.0, ans=0.05 2024-09-17 08:59:03,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=188320.0, ans=0.125 2024-09-17 08:59:24,938 INFO [train.py:1198] (1/2) Epoch 11, batch 1850, loss[loss=0.2703, ctc_loss=0.1766, cr_loss=0.4035, attn_decoder_loss=0.2717, over 29634.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1669, cr_loss=0.4049, attn_decoder_loss=0.2648, over 5794942.55 frames. ], batch size: 86, lr: 1.03e-02, grad_scale: 4.0 2024-09-17 08:59:37,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=188400.0, ans=0.125 2024-09-17 08:59:41,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188440.0, ans=0.1 2024-09-17 08:59:44,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=188440.0, ans=0.025 2024-09-17 08:59:46,437 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:00:02,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=188480.0, ans=10.0 2024-09-17 09:00:13,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=188520.0, ans=0.0 2024-09-17 09:00:13,947 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:00:39,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=188560.0, ans=0.0 2024-09-17 09:00:42,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.959e+01 9.168e+01 9.699e+01 1.053e+02 1.276e+02, threshold=1.940e+02, percent-clipped=0.0 2024-09-17 09:00:42,160 INFO [train.py:1198] (1/2) Epoch 11, batch 1900, loss[loss=0.2776, ctc_loss=0.1752, cr_loss=0.4355, attn_decoder_loss=0.2794, over 29707.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1677, cr_loss=0.4058, attn_decoder_loss=0.2657, over 5802827.99 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 09:00:48,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188600.0, ans=0.1 2024-09-17 09:00:48,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=188600.0, ans=0.2 2024-09-17 09:00:58,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=188640.0, ans=0.0 2024-09-17 09:01:01,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.98 vs. limit=15.0 2024-09-17 09:01:21,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=188680.0, ans=0.05 2024-09-17 09:01:36,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=188720.0, ans=0.0 2024-09-17 09:01:38,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=188720.0, ans=0.125 2024-09-17 09:01:54,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=188760.0, ans=0.2 2024-09-17 09:01:57,954 INFO [train.py:1198] (1/2) Epoch 11, batch 1950, loss[loss=0.2602, ctc_loss=0.1594, cr_loss=0.4041, attn_decoder_loss=0.2624, over 29448.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1686, cr_loss=0.4074, attn_decoder_loss=0.2669, over 5818130.21 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 4.0 2024-09-17 09:02:31,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188880.0, ans=0.1 2024-09-17 09:02:49,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-09-17 09:02:52,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=188920.0, ans=0.0 2024-09-17 09:03:03,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=188960.0, ans=0.0 2024-09-17 09:03:03,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=188960.0, ans=0.0 2024-09-17 09:03:15,374 INFO [train.py:1198] (1/2) Epoch 11, batch 2000, loss[loss=0.2397, ctc_loss=0.1495, cr_loss=0.3848, attn_decoder_loss=0.2411, over 29360.00 frames. ], tot_loss[loss=0.2658, ctc_loss=0.1694, cr_loss=0.4086, attn_decoder_loss=0.2674, over 5796056.47 frames. ], batch size: 67, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:03:16,926 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.350e+01 9.444e+01 9.987e+01 1.091e+02 4.605e+02, threshold=1.997e+02, percent-clipped=2.0 2024-09-17 09:03:35,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=189040.0, ans=0.07 2024-09-17 09:03:41,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=189040.0, ans=0.025 2024-09-17 09:04:07,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=189120.0, ans=0.125 2024-09-17 09:04:12,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.28 vs. limit=10.0 2024-09-17 09:04:33,453 INFO [train.py:1198] (1/2) Epoch 11, batch 2050, loss[loss=0.2467, ctc_loss=0.1603, cr_loss=0.3989, attn_decoder_loss=0.2474, over 29441.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1688, cr_loss=0.4074, attn_decoder_loss=0.2663, over 5788653.97 frames. ], batch size: 70, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:04:36,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=189200.0, ans=0.1 2024-09-17 09:04:42,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=189200.0, ans=0.0 2024-09-17 09:04:43,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2024-09-17 09:04:55,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=12.0 2024-09-17 09:05:17,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=189320.0, ans=0.125 2024-09-17 09:05:29,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=189320.0, ans=0.0 2024-09-17 09:05:49,129 INFO [train.py:1198] (1/2) Epoch 11, batch 2100, loss[loss=0.2593, ctc_loss=0.163, cr_loss=0.3946, attn_decoder_loss=0.2612, over 29771.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.168, cr_loss=0.4062, attn_decoder_loss=0.2657, over 5800730.76 frames. ], batch size: 81, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:05:50,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.970e+01 9.676e+01 1.062e+02 4.848e+02, threshold=1.935e+02, percent-clipped=1.0 2024-09-17 09:06:19,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=189480.0, ans=0.025 2024-09-17 09:06:23,661 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.32 vs. limit=22.5 2024-09-17 09:06:36,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=189520.0, ans=0.125 2024-09-17 09:06:51,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=189560.0, ans=0.125 2024-09-17 09:06:54,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=189560.0, ans=0.125 2024-09-17 09:07:03,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=189560.0, ans=0.125 2024-09-17 09:07:03,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=189560.0, ans=0.0 2024-09-17 09:07:06,746 INFO [train.py:1198] (1/2) Epoch 11, batch 2150, loss[loss=0.2559, ctc_loss=0.1615, cr_loss=0.4065, attn_decoder_loss=0.2574, over 29436.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1666, cr_loss=0.4044, attn_decoder_loss=0.2648, over 5815307.44 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:07:36,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189680.0, ans=0.1 2024-09-17 09:07:48,280 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:07:49,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=189680.0, ans=0.125 2024-09-17 09:07:54,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=189720.0, ans=0.09899494936611666 2024-09-17 09:08:00,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=189720.0, ans=0.125 2024-09-17 09:08:01,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=189720.0, ans=0.125 2024-09-17 09:08:08,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=189760.0, ans=0.025 2024-09-17 09:08:15,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=189760.0, ans=0.0 2024-09-17 09:08:24,832 INFO [train.py:1198] (1/2) Epoch 11, batch 2200, loss[loss=0.2744, ctc_loss=0.1771, cr_loss=0.4234, attn_decoder_loss=0.2759, over 29614.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1663, cr_loss=0.404, attn_decoder_loss=0.2647, over 5812071.58 frames. ], batch size: 86, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:08:26,326 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.251e+01 9.350e+01 9.957e+01 1.083e+02 2.059e+02, threshold=1.991e+02, percent-clipped=1.0 2024-09-17 09:08:27,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=189800.0, ans=0.015 2024-09-17 09:09:13,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=189920.0, ans=0.125 2024-09-17 09:09:22,808 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:09:27,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.46 vs. limit=15.0 2024-09-17 09:09:40,359 INFO [train.py:1198] (1/2) Epoch 11, batch 2250, loss[loss=0.2667, ctc_loss=0.164, cr_loss=0.4052, attn_decoder_loss=0.2691, over 29724.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1661, cr_loss=0.4037, attn_decoder_loss=0.2646, over 5811500.67 frames. ], batch size: 82, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:09:46,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=190000.0, ans=0.125 2024-09-17 09:09:52,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=190000.0, ans=0.125 2024-09-17 09:10:11,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=190080.0, ans=0.025 2024-09-17 09:10:23,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=190080.0, ans=0.0 2024-09-17 09:10:38,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=190120.0, ans=0.09899494936611666 2024-09-17 09:10:46,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-17 09:10:47,495 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:10:51,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=190160.0, ans=0.125 2024-09-17 09:10:51,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=190160.0, ans=0.0 2024-09-17 09:10:55,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=190160.0, ans=0.0 2024-09-17 09:10:57,790 INFO [train.py:1198] (1/2) Epoch 11, batch 2300, loss[loss=0.2343, ctc_loss=0.1395, cr_loss=0.343, attn_decoder_loss=0.2372, over 29303.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1653, cr_loss=0.4025, attn_decoder_loss=0.2635, over 5800076.63 frames. ], batch size: 71, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:10:59,281 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 9.241e+01 9.973e+01 1.088e+02 2.493e+02, threshold=1.995e+02, percent-clipped=2.0 2024-09-17 09:11:17,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=190240.0, ans=0.125 2024-09-17 09:11:23,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=190240.0, ans=0.95 2024-09-17 09:11:30,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.18 vs. limit=15.0 2024-09-17 09:11:32,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=190280.0, ans=0.125 2024-09-17 09:11:45,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=190320.0, ans=0.1 2024-09-17 09:11:48,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=190320.0, ans=10.0 2024-09-17 09:11:55,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=190320.0, ans=0.2 2024-09-17 09:12:15,944 INFO [train.py:1198] (1/2) Epoch 11, batch 2350, loss[loss=0.2707, ctc_loss=0.1742, cr_loss=0.4289, attn_decoder_loss=0.2718, over 29677.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.1655, cr_loss=0.4033, attn_decoder_loss=0.2639, over 5805388.29 frames. ], batch size: 83, lr: 1.02e-02, grad_scale: 4.0 2024-09-17 09:12:17,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=190400.0, ans=0.015 2024-09-17 09:12:25,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=190400.0, ans=0.1 2024-09-17 09:12:32,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190440.0, ans=0.1 2024-09-17 09:12:43,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=190440.0, ans=0.04949747468305833 2024-09-17 09:13:03,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=190520.0, ans=0.125 2024-09-17 09:13:31,664 INFO [train.py:1198] (1/2) Epoch 11, batch 2400, loss[loss=0.2569, ctc_loss=0.1599, cr_loss=0.3903, attn_decoder_loss=0.259, over 29544.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.166, cr_loss=0.4042, attn_decoder_loss=0.2644, over 5808397.85 frames. ], batch size: 76, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:13:34,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.144e+01 9.902e+01 1.071e+02 1.818e+02, threshold=1.980e+02, percent-clipped=0.0 2024-09-17 09:13:42,580 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:13:43,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=190600.0, ans=0.1 2024-09-17 09:14:13,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=190680.0, ans=0.025 2024-09-17 09:14:27,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=190720.0, ans=0.125 2024-09-17 09:14:50,025 INFO [train.py:1198] (1/2) Epoch 11, batch 2450, loss[loss=0.2715, ctc_loss=0.17, cr_loss=0.4251, attn_decoder_loss=0.2733, over 29713.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1673, cr_loss=0.4057, attn_decoder_loss=0.2657, over 5785307.67 frames. ], batch size: 82, lr: 1.02e-02, grad_scale: 4.0 2024-09-17 09:14:56,329 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:14:59,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=190800.0, ans=0.07 2024-09-17 09:15:17,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=190840.0, ans=0.125 2024-09-17 09:15:36,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=190920.0, ans=0.0 2024-09-17 09:15:38,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=190920.0, ans=0.025 2024-09-17 09:15:41,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=190920.0, ans=0.125 2024-09-17 09:15:50,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=190960.0, ans=0.0 2024-09-17 09:15:50,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=190960.0, ans=0.2 2024-09-17 09:16:03,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.34 vs. limit=10.0 2024-09-17 09:16:07,526 INFO [train.py:1198] (1/2) Epoch 11, batch 2500, loss[loss=0.2843, ctc_loss=0.1936, cr_loss=0.4227, attn_decoder_loss=0.285, over 29611.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1675, cr_loss=0.4058, attn_decoder_loss=0.2657, over 5795457.66 frames. ], batch size: 86, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:16:10,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=191000.0, ans=0.0 2024-09-17 09:16:12,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 9.413e+01 9.956e+01 1.120e+02 1.816e+02, threshold=1.991e+02, percent-clipped=0.0 2024-09-17 09:16:12,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=191000.0, ans=0.125 2024-09-17 09:17:15,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.87 vs. limit=15.0 2024-09-17 09:17:23,740 INFO [train.py:1198] (1/2) Epoch 11, batch 2550, loss[loss=0.2371, ctc_loss=0.149, cr_loss=0.3869, attn_decoder_loss=0.2383, over 29361.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.167, cr_loss=0.4051, attn_decoder_loss=0.2656, over 5799101.05 frames. ], batch size: 67, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:17:28,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=191200.0, ans=0.0 2024-09-17 09:17:36,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=191200.0, ans=0.125 2024-09-17 09:18:00,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=191280.0, ans=0.125 2024-09-17 09:18:02,565 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:18:12,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=191320.0, ans=0.0 2024-09-17 09:18:17,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=191320.0, ans=0.0 2024-09-17 09:18:37,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=191360.0, ans=0.125 2024-09-17 09:18:41,956 INFO [train.py:1198] (1/2) Epoch 11, batch 2600, loss[loss=0.2589, ctc_loss=0.1629, cr_loss=0.4015, attn_decoder_loss=0.2606, over 29452.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1669, cr_loss=0.4046, attn_decoder_loss=0.2659, over 5795392.93 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:18:46,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.056e+01 9.455e+01 1.019e+02 1.112e+02 3.211e+02, threshold=2.037e+02, percent-clipped=2.0 2024-09-17 09:19:07,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=191440.0, ans=0.1 2024-09-17 09:19:12,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=5.00 vs. limit=12.0 2024-09-17 09:19:14,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2024-09-17 09:19:15,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.66 vs. limit=15.0 2024-09-17 09:19:33,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=191520.0, ans=0.025 2024-09-17 09:19:37,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=191520.0, ans=0.025 2024-09-17 09:19:45,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=191560.0, ans=0.125 2024-09-17 09:19:49,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=191560.0, ans=0.0 2024-09-17 09:19:59,103 INFO [train.py:1198] (1/2) Epoch 11, batch 2650, loss[loss=0.2956, ctc_loss=0.2026, cr_loss=0.4549, attn_decoder_loss=0.2959, over 29274.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.167, cr_loss=0.4048, attn_decoder_loss=0.266, over 5802051.26 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:20:07,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=191600.0, ans=0.0 2024-09-17 09:20:53,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=191720.0, ans=0.1 2024-09-17 09:21:01,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=191760.0, ans=0.0 2024-09-17 09:21:14,609 INFO [train.py:1198] (1/2) Epoch 11, batch 2700, loss[loss=0.271, ctc_loss=0.1738, cr_loss=0.421, attn_decoder_loss=0.2725, over 29549.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1673, cr_loss=0.4055, attn_decoder_loss=0.2662, over 5797549.66 frames. ], batch size: 87, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:21:20,544 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.206e+01 9.835e+01 1.075e+02 2.605e+02, threshold=1.967e+02, percent-clipped=2.0 2024-09-17 09:21:25,388 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:21:25,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=191800.0, ans=0.0 2024-09-17 09:21:40,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=22.5 2024-09-17 09:21:40,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=191840.0, ans=0.025 2024-09-17 09:21:53,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=191880.0, ans=0.125 2024-09-17 09:21:58,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2024-09-17 09:22:39,949 INFO [train.py:1198] (1/2) Epoch 11, batch 2750, loss[loss=0.2526, ctc_loss=0.1623, cr_loss=0.3879, attn_decoder_loss=0.254, over 29522.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1663, cr_loss=0.403, attn_decoder_loss=0.2649, over 5794782.04 frames. ], batch size: 75, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:22:40,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=192000.0, ans=0.0 2024-09-17 09:22:56,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=192040.0, ans=0.035 2024-09-17 09:23:13,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=192080.0, ans=0.125 2024-09-17 09:23:45,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=192160.0, ans=0.2 2024-09-17 09:23:55,129 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:23:57,818 INFO [train.py:1198] (1/2) Epoch 11, batch 2800, loss[loss=0.2999, ctc_loss=0.2264, cr_loss=0.4107, attn_decoder_loss=0.299, over 20613.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1664, cr_loss=0.4027, attn_decoder_loss=0.2648, over 5775519.88 frames. ], batch size: 211, lr: 1.02e-02, grad_scale: 16.0 2024-09-17 09:24:00,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.96 vs. limit=15.0 2024-09-17 09:24:05,063 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.696e+01 8.863e+01 9.648e+01 1.109e+02 4.510e+02, threshold=1.930e+02, percent-clipped=4.0 2024-09-17 09:24:10,042 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:24:11,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=192240.0, ans=0.0 2024-09-17 09:24:22,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.58 vs. limit=10.0 2024-09-17 09:24:27,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.01 vs. limit=5.0 2024-09-17 09:24:31,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=192280.0, ans=10.0 2024-09-17 09:24:32,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=192280.0, ans=0.2 2024-09-17 09:24:52,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=192320.0, ans=0.0 2024-09-17 09:24:57,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.07 vs. limit=15.0 2024-09-17 09:25:13,027 INFO [train.py:1198] (1/2) Epoch 11, batch 2850, loss[loss=0.2543, ctc_loss=0.1584, cr_loss=0.3918, attn_decoder_loss=0.2562, over 29518.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.167, cr_loss=0.4034, attn_decoder_loss=0.2654, over 5761729.59 frames. ], batch size: 77, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:25:30,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=192440.0, ans=0.125 2024-09-17 09:25:31,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=192440.0, ans=0.025 2024-09-17 09:25:34,223 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.48 vs. limit=6.0 2024-09-17 09:25:46,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.83 vs. limit=22.5 2024-09-17 09:26:00,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=192520.0, ans=0.2 2024-09-17 09:26:08,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.10 vs. limit=15.0 2024-09-17 09:26:12,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=192520.0, ans=0.0 2024-09-17 09:26:28,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=192560.0, ans=0.125 2024-09-17 09:26:29,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=192600.0, ans=0.1 2024-09-17 09:26:30,890 INFO [train.py:1198] (1/2) Epoch 11, batch 2900, loss[loss=0.2607, ctc_loss=0.1684, cr_loss=0.4199, attn_decoder_loss=0.2616, over 29431.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1679, cr_loss=0.4063, attn_decoder_loss=0.2665, over 5787030.66 frames. ], batch size: 79, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:26:37,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=192600.0, ans=0.125 2024-09-17 09:26:38,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 9.694e+01 1.018e+02 1.122e+02 2.522e+02, threshold=2.035e+02, percent-clipped=2.0 2024-09-17 09:27:01,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=192680.0, ans=0.125 2024-09-17 09:27:13,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=192680.0, ans=0.125 2024-09-17 09:27:15,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=192720.0, ans=0.025 2024-09-17 09:27:25,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=192720.0, ans=0.04949747468305833 2024-09-17 09:27:49,119 INFO [train.py:1198] (1/2) Epoch 11, batch 2950, loss[loss=0.2532, ctc_loss=0.1569, cr_loss=0.3938, attn_decoder_loss=0.2551, over 29504.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1664, cr_loss=0.4036, attn_decoder_loss=0.2648, over 5780919.04 frames. ], batch size: 75, lr: 1.01e-02, grad_scale: 4.0 2024-09-17 09:28:03,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=192840.0, ans=0.0 2024-09-17 09:28:21,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=192880.0, ans=0.125 2024-09-17 09:28:41,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-09-17 09:28:44,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.98 vs. limit=10.0 2024-09-17 09:29:04,911 INFO [train.py:1198] (1/2) Epoch 11, batch 3000, loss[loss=0.2661, ctc_loss=0.1634, cr_loss=0.408, attn_decoder_loss=0.2685, over 29736.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1664, cr_loss=0.4039, attn_decoder_loss=0.2649, over 5781505.62 frames. ], batch size: 81, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:29:04,911 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 09:29:24,075 INFO [train.py:1230] (1/2) Epoch 11, validation: loss=0.2124, ctc_loss=0.04636, cr_loss=4.851e-15, attn_decoder_loss=0.2308, over 944034.00 frames. 2024-09-17 09:29:24,076 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 09:29:33,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 9.274e+01 9.995e+01 1.117e+02 3.922e+02, threshold=1.999e+02, percent-clipped=3.0 2024-09-17 09:30:23,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=193160.0, ans=0.125 2024-09-17 09:30:39,849 INFO [train.py:1198] (1/2) Epoch 11, batch 3050, loss[loss=0.2531, ctc_loss=0.1546, cr_loss=0.3968, attn_decoder_loss=0.2552, over 29537.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1673, cr_loss=0.4054, attn_decoder_loss=0.2659, over 5774551.51 frames. ], batch size: 76, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:30:46,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=193200.0, ans=0.125 2024-09-17 09:30:49,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=193200.0, ans=0.125 2024-09-17 09:30:55,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=193240.0, ans=0.2 2024-09-17 09:31:02,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-17 09:31:25,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=193320.0, ans=0.125 2024-09-17 09:31:57,183 INFO [train.py:1198] (1/2) Epoch 11, batch 3100, loss[loss=0.2773, ctc_loss=0.1789, cr_loss=0.4188, attn_decoder_loss=0.2789, over 29276.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1667, cr_loss=0.4038, attn_decoder_loss=0.2653, over 5774195.00 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:32:01,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=193400.0, ans=0.2 2024-09-17 09:32:02,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-09-17 09:32:07,731 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.928e+01 9.888e+01 1.137e+02 1.275e+02 2.184e+02, threshold=2.273e+02, percent-clipped=1.0 2024-09-17 09:32:57,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2024-09-17 09:32:57,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=193560.0, ans=0.2 2024-09-17 09:33:15,293 INFO [train.py:1198] (1/2) Epoch 11, batch 3150, loss[loss=0.2693, ctc_loss=0.1677, cr_loss=0.4129, attn_decoder_loss=0.2714, over 28909.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1673, cr_loss=0.4053, attn_decoder_loss=0.2656, over 5781193.28 frames. ], batch size: 104, lr: 1.01e-02, grad_scale: 4.0 2024-09-17 09:33:17,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193600.0, ans=0.1 2024-09-17 09:33:18,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=193600.0, ans=0.125 2024-09-17 09:33:24,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=193600.0, ans=0.0 2024-09-17 09:33:31,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2024-09-17 09:33:34,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=193640.0, ans=0.0 2024-09-17 09:33:59,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=193720.0, ans=0.125 2024-09-17 09:34:00,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=193720.0, ans=0.125 2024-09-17 09:34:06,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=193720.0, ans=0.0 2024-09-17 09:34:06,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=193720.0, ans=0.125 2024-09-17 09:34:30,663 INFO [train.py:1198] (1/2) Epoch 11, batch 3200, loss[loss=0.2565, ctc_loss=0.1538, cr_loss=0.3972, attn_decoder_loss=0.2591, over 29412.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1665, cr_loss=0.4048, attn_decoder_loss=0.265, over 5791758.89 frames. ], batch size: 79, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:34:30,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=193800.0, ans=0.125 2024-09-17 09:34:35,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=193800.0, ans=0.125 2024-09-17 09:34:42,603 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.244e+01 9.694e+01 1.030e+02 2.478e+02, threshold=1.939e+02, percent-clipped=1.0 2024-09-17 09:34:44,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=193840.0, ans=0.025 2024-09-17 09:34:47,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=193840.0, ans=0.125 2024-09-17 09:34:56,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2024-09-17 09:35:00,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=193880.0, ans=0.125 2024-09-17 09:35:06,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-09-17 09:35:20,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=193920.0, ans=0.035 2024-09-17 09:35:21,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=193920.0, ans=0.1 2024-09-17 09:35:33,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=193960.0, ans=0.0 2024-09-17 09:35:36,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=193960.0, ans=0.5 2024-09-17 09:35:49,417 INFO [train.py:1198] (1/2) Epoch 11, batch 3250, loss[loss=0.2715, ctc_loss=0.1746, cr_loss=0.4169, attn_decoder_loss=0.273, over 29702.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1667, cr_loss=0.4052, attn_decoder_loss=0.2654, over 5798178.31 frames. ], batch size: 84, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:35:57,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2024-09-17 09:35:58,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=194000.0, ans=0.0 2024-09-17 09:36:07,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.39 vs. limit=15.0 2024-09-17 09:36:10,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=194040.0, ans=0.0 2024-09-17 09:36:42,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=194120.0, ans=0.125 2024-09-17 09:36:45,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194120.0, ans=0.1 2024-09-17 09:37:07,223 INFO [train.py:1198] (1/2) Epoch 11, batch 3300, loss[loss=0.2915, ctc_loss=0.1973, cr_loss=0.4409, attn_decoder_loss=0.2922, over 28460.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1664, cr_loss=0.4044, attn_decoder_loss=0.2647, over 5796386.01 frames. ], batch size: 112, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:37:19,488 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.489e+01 1.035e+02 1.154e+02 2.549e+02, threshold=2.070e+02, percent-clipped=1.0 2024-09-17 09:37:28,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=194240.0, ans=0.025 2024-09-17 09:37:33,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=194240.0, ans=0.1 2024-09-17 09:37:36,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=194280.0, ans=0.125 2024-09-17 09:37:40,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=194280.0, ans=0.125 2024-09-17 09:37:46,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=194280.0, ans=0.0 2024-09-17 09:37:52,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=194320.0, ans=0.0 2024-09-17 09:38:09,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=194360.0, ans=0.125 2024-09-17 09:38:21,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=194400.0, ans=0.125 2024-09-17 09:38:22,983 INFO [train.py:1198] (1/2) Epoch 11, batch 3350, loss[loss=0.2799, ctc_loss=0.1824, cr_loss=0.4062, attn_decoder_loss=0.2817, over 28828.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1673, cr_loss=0.4051, attn_decoder_loss=0.2655, over 5773236.96 frames. ], batch size: 104, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:39:32,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-09-17 09:39:40,730 INFO [train.py:1198] (1/2) Epoch 11, batch 3400, loss[loss=0.2318, ctc_loss=0.1336, cr_loss=0.3408, attn_decoder_loss=0.2351, over 29337.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1669, cr_loss=0.404, attn_decoder_loss=0.2653, over 5766562.58 frames. ], batch size: 67, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:39:45,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=194600.0, ans=0.2 2024-09-17 09:39:50,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=194600.0, ans=0.125 2024-09-17 09:39:52,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2024-09-17 09:39:52,871 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 9.232e+01 1.004e+02 1.095e+02 3.484e+02, threshold=2.008e+02, percent-clipped=1.0 2024-09-17 09:40:15,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=194680.0, ans=0.0 2024-09-17 09:40:17,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=194680.0, ans=0.2 2024-09-17 09:40:20,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=194680.0, ans=0.125 2024-09-17 09:40:57,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=194800.0, ans=0.0 2024-09-17 09:40:58,435 INFO [train.py:1198] (1/2) Epoch 11, batch 3450, loss[loss=0.2673, ctc_loss=0.1642, cr_loss=0.3864, attn_decoder_loss=0.2702, over 28179.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1665, cr_loss=0.4038, attn_decoder_loss=0.2654, over 5775810.27 frames. ], batch size: 111, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:40:58,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=194800.0, ans=0.0 2024-09-17 09:41:20,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-09-17 09:41:42,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=194920.0, ans=0.125 2024-09-17 09:41:53,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=194920.0, ans=0.2 2024-09-17 09:41:57,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=194960.0, ans=0.025 2024-09-17 09:41:58,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.54 vs. limit=10.0 2024-09-17 09:42:08,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=194960.0, ans=0.0 2024-09-17 09:42:13,895 INFO [train.py:1198] (1/2) Epoch 11, batch 3500, loss[loss=0.2362, ctc_loss=0.1449, cr_loss=0.3744, attn_decoder_loss=0.238, over 29313.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1662, cr_loss=0.4035, attn_decoder_loss=0.2647, over 5778897.29 frames. ], batch size: 71, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:42:21,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=195000.0, ans=0.015 2024-09-17 09:42:22,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=195000.0, ans=0.0 2024-09-17 09:42:26,192 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.959e+01 9.210e+01 9.867e+01 1.123e+02 1.745e+02, threshold=1.973e+02, percent-clipped=0.0 2024-09-17 09:42:38,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195040.0, ans=0.1 2024-09-17 09:42:41,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=195040.0, ans=0.0 2024-09-17 09:42:47,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.12 vs. limit=15.0 2024-09-17 09:43:23,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=195160.0, ans=0.125 2024-09-17 09:43:29,167 INFO [train.py:1198] (1/2) Epoch 11, batch 3550, loss[loss=0.2673, ctc_loss=0.1635, cr_loss=0.4022, attn_decoder_loss=0.2699, over 29717.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1663, cr_loss=0.4037, attn_decoder_loss=0.2648, over 5784456.20 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:43:41,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=195200.0, ans=0.1 2024-09-17 09:43:52,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=195240.0, ans=0.035 2024-09-17 09:44:04,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2024-09-17 09:44:08,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=195280.0, ans=0.1 2024-09-17 09:44:08,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=195280.0, ans=0.125 2024-09-17 09:44:12,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=195280.0, ans=0.0 2024-09-17 09:44:14,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=195320.0, ans=0.0 2024-09-17 09:44:35,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=195360.0, ans=0.2 2024-09-17 09:44:45,312 INFO [train.py:1198] (1/2) Epoch 11, batch 3600, loss[loss=0.2529, ctc_loss=0.159, cr_loss=0.3865, attn_decoder_loss=0.2547, over 29493.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1659, cr_loss=0.4031, attn_decoder_loss=0.2647, over 5793146.43 frames. ], batch size: 77, lr: 1.01e-02, grad_scale: 16.0 2024-09-17 09:44:58,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.269e+01 9.046e+01 9.949e+01 1.066e+02 3.484e+02, threshold=1.990e+02, percent-clipped=1.0 2024-09-17 09:45:09,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=195440.0, ans=0.125 2024-09-17 09:45:19,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.14 vs. limit=10.0 2024-09-17 09:45:34,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=195520.0, ans=0.04949747468305833 2024-09-17 09:45:40,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=195520.0, ans=0.125 2024-09-17 09:45:51,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-17 09:45:58,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=195560.0, ans=0.125 2024-09-17 09:46:01,229 INFO [train.py:1198] (1/2) Epoch 11, batch 3650, loss[loss=0.2708, ctc_loss=0.1738, cr_loss=0.4263, attn_decoder_loss=0.2721, over 29492.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1658, cr_loss=0.4034, attn_decoder_loss=0.2644, over 5795747.85 frames. ], batch size: 90, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:46:08,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=195600.0, ans=0.2 2024-09-17 09:46:13,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=195600.0, ans=0.125 2024-09-17 09:46:17,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195640.0, ans=0.1 2024-09-17 09:46:22,263 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:46:25,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=195640.0, ans=0.125 2024-09-17 09:46:28,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=195640.0, ans=0.125 2024-09-17 09:46:32,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=195680.0, ans=0.125 2024-09-17 09:46:34,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=195680.0, ans=0.2 2024-09-17 09:46:35,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=195680.0, ans=0.0 2024-09-17 09:46:52,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2024-09-17 09:46:54,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-09-17 09:46:57,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=195720.0, ans=0.125 2024-09-17 09:47:05,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=195760.0, ans=0.025 2024-09-17 09:47:14,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=195800.0, ans=0.025 2024-09-17 09:47:15,837 INFO [train.py:1198] (1/2) Epoch 11, batch 3700, loss[loss=0.27, ctc_loss=0.1652, cr_loss=0.3919, attn_decoder_loss=0.273, over 29700.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1661, cr_loss=0.4039, attn_decoder_loss=0.2649, over 5806568.30 frames. ], batch size: 84, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:47:16,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=195800.0, ans=0.125 2024-09-17 09:47:29,200 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.197e+01 9.899e+01 1.076e+02 2.230e+02, threshold=1.980e+02, percent-clipped=1.0 2024-09-17 09:47:39,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=195840.0, ans=0.125 2024-09-17 09:47:47,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=195880.0, ans=0.0 2024-09-17 09:47:55,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=195880.0, ans=0.125 2024-09-17 09:48:30,220 INFO [train.py:1198] (1/2) Epoch 11, batch 3750, loss[loss=0.2281, ctc_loss=0.1386, cr_loss=0.348, attn_decoder_loss=0.2303, over 29379.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1656, cr_loss=0.4033, attn_decoder_loss=0.2645, over 5810637.82 frames. ], batch size: 67, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:49:01,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=196080.0, ans=0.125 2024-09-17 09:49:10,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=196080.0, ans=0.025 2024-09-17 09:49:22,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196120.0, ans=0.1 2024-09-17 09:49:23,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196120.0, ans=0.1 2024-09-17 09:49:34,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=196160.0, ans=0.0 2024-09-17 09:49:38,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=196160.0, ans=0.125 2024-09-17 09:49:44,097 INFO [train.py:1198] (1/2) Epoch 11, batch 3800, loss[loss=0.265, ctc_loss=0.1616, cr_loss=0.4087, attn_decoder_loss=0.2674, over 29643.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1653, cr_loss=0.4032, attn_decoder_loss=0.2642, over 5800265.13 frames. ], batch size: 86, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:49:57,599 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.230e+01 9.992e+01 1.078e+02 1.190e+02 1.793e+02, threshold=2.156e+02, percent-clipped=0.0 2024-09-17 09:49:57,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=196240.0, ans=0.2 2024-09-17 09:50:03,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=196240.0, ans=0.025 2024-09-17 09:50:11,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=196240.0, ans=0.07 2024-09-17 09:50:33,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=196320.0, ans=0.125 2024-09-17 09:50:47,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=196360.0, ans=0.125 2024-09-17 09:50:48,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196360.0, ans=0.1 2024-09-17 09:51:00,331 INFO [train.py:1198] (1/2) Epoch 11, batch 3850, loss[loss=0.2869, ctc_loss=0.1862, cr_loss=0.4145, attn_decoder_loss=0.2889, over 29270.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1653, cr_loss=0.4037, attn_decoder_loss=0.2642, over 5814533.93 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:51:29,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=196480.0, ans=15.0 2024-09-17 09:51:34,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=196480.0, ans=0.0 2024-09-17 09:51:45,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=196520.0, ans=0.04949747468305833 2024-09-17 09:51:46,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=196520.0, ans=0.125 2024-09-17 09:51:54,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=196520.0, ans=0.125 2024-09-17 09:52:04,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=196560.0, ans=0.2 2024-09-17 09:52:05,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196560.0, ans=0.1 2024-09-17 09:52:07,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-09-17 09:52:16,111 INFO [train.py:1198] (1/2) Epoch 11, batch 3900, loss[loss=0.2814, ctc_loss=0.1734, cr_loss=0.3995, attn_decoder_loss=0.2846, over 29619.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1655, cr_loss=0.4045, attn_decoder_loss=0.2645, over 5818547.26 frames. ], batch size: 86, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:52:23,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=196600.0, ans=0.0 2024-09-17 09:52:25,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.63 vs. limit=22.5 2024-09-17 09:52:29,225 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.236e+01 9.717e+01 1.078e+02 1.405e+02, threshold=1.943e+02, percent-clipped=0.0 2024-09-17 09:52:44,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=196680.0, ans=0.0 2024-09-17 09:52:57,682 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:52:59,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=196720.0, ans=0.025 2024-09-17 09:53:13,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=196760.0, ans=0.125 2024-09-17 09:53:30,384 INFO [train.py:1198] (1/2) Epoch 11, batch 3950, loss[loss=0.2886, ctc_loss=0.1844, cr_loss=0.4488, attn_decoder_loss=0.2902, over 29453.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1648, cr_loss=0.4032, attn_decoder_loss=0.2644, over 5837318.54 frames. ], batch size: 97, lr: 1.00e-02, grad_scale: 4.0 2024-09-17 09:53:48,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=196840.0, ans=0.125 2024-09-17 09:54:01,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=196880.0, ans=0.2 2024-09-17 09:54:19,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=196920.0, ans=0.125 2024-09-17 09:54:21,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-09-17 09:54:41,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196960.0, ans=0.1 2024-09-17 09:54:43,889 INFO [train.py:1198] (1/2) Epoch 11, batch 4000, loss[loss=0.2556, ctc_loss=0.1634, cr_loss=0.4015, attn_decoder_loss=0.2569, over 29479.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1656, cr_loss=0.4041, attn_decoder_loss=0.2646, over 5815391.16 frames. ], batch size: 74, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:54:47,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.83 vs. limit=15.0 2024-09-17 09:54:58,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.223e+01 9.139e+01 9.851e+01 1.070e+02 1.973e+02, threshold=1.970e+02, percent-clipped=1.0 2024-09-17 09:55:04,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=197040.0, ans=0.025 2024-09-17 09:55:06,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-09-17 09:55:09,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=197040.0, ans=0.2 2024-09-17 09:55:37,532 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:55:44,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=197160.0, ans=0.2 2024-09-17 09:55:52,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-09-17 09:55:59,405 INFO [train.py:1198] (1/2) Epoch 11, batch 4050, loss[loss=0.2876, ctc_loss=0.2122, cr_loss=0.4092, attn_decoder_loss=0.2869, over 20358.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1655, cr_loss=0.4036, attn_decoder_loss=0.2644, over 5799281.86 frames. ], batch size: 210, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:56:02,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=197200.0, ans=0.025 2024-09-17 09:56:09,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=197200.0, ans=0.125 2024-09-17 09:56:09,899 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:56:09,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=197200.0, ans=0.125 2024-09-17 09:56:50,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=197320.0, ans=0.0 2024-09-17 09:56:53,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=197320.0, ans=0.125 2024-09-17 09:56:55,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=197320.0, ans=0.2 2024-09-17 09:56:55,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2024-09-17 09:57:02,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=197360.0, ans=0.0 2024-09-17 09:57:08,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=197360.0, ans=0.125 2024-09-17 09:57:13,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=197400.0, ans=10.0 2024-09-17 09:57:14,014 INFO [train.py:1198] (1/2) Epoch 11, batch 4100, loss[loss=0.2864, ctc_loss=0.1881, cr_loss=0.4368, attn_decoder_loss=0.2876, over 29489.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1657, cr_loss=0.4044, attn_decoder_loss=0.2646, over 5794609.70 frames. ], batch size: 90, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:57:30,193 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 9.399e+01 1.013e+02 1.118e+02 3.429e+02, threshold=2.026e+02, percent-clipped=2.0 2024-09-17 09:58:28,185 INFO [train.py:1198] (1/2) Epoch 11, batch 4150, loss[loss=0.2555, ctc_loss=0.1623, cr_loss=0.3954, attn_decoder_loss=0.2571, over 29494.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.166, cr_loss=0.4049, attn_decoder_loss=0.2645, over 5799893.98 frames. ], batch size: 77, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:58:34,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197600.0, ans=0.1 2024-09-17 09:58:55,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=197640.0, ans=0.125 2024-09-17 09:59:03,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=197680.0, ans=0.125 2024-09-17 09:59:30,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=197760.0, ans=0.0 2024-09-17 09:59:33,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=197760.0, ans=0.0 2024-09-17 09:59:37,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.15 vs. limit=22.5 2024-09-17 09:59:42,926 INFO [train.py:1198] (1/2) Epoch 11, batch 4200, loss[loss=0.2758, ctc_loss=0.1824, cr_loss=0.4404, attn_decoder_loss=0.2764, over 29477.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1657, cr_loss=0.4044, attn_decoder_loss=0.2645, over 5802005.13 frames. ], batch size: 90, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:59:45,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=197800.0, ans=0.0 2024-09-17 09:59:59,217 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 8.983e+01 9.678e+01 1.042e+02 2.526e+02, threshold=1.936e+02, percent-clipped=1.0 2024-09-17 10:00:08,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=197840.0, ans=0.125 2024-09-17 10:00:40,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=197920.0, ans=0.125 2024-09-17 10:00:56,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=198000.0, ans=0.125 2024-09-17 10:00:57,513 INFO [train.py:1198] (1/2) Epoch 11, batch 4250, loss[loss=0.2413, ctc_loss=0.1455, cr_loss=0.3937, attn_decoder_loss=0.2431, over 29504.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1655, cr_loss=0.4043, attn_decoder_loss=0.2648, over 5808187.19 frames. ], batch size: 74, lr: 1.00e-02, grad_scale: 4.0 2024-09-17 10:01:28,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=198080.0, ans=0.125 2024-09-17 10:01:31,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=198080.0, ans=0.0 2024-09-17 10:01:44,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=198120.0, ans=0.125 2024-09-17 10:01:50,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=22.5 2024-09-17 10:02:11,157 INFO [train.py:1198] (1/2) Epoch 11, batch 4300, loss[loss=0.2702, ctc_loss=0.1686, cr_loss=0.4211, attn_decoder_loss=0.2722, over 29527.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1652, cr_loss=0.403, attn_decoder_loss=0.2646, over 5797640.47 frames. ], batch size: 87, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 10:02:11,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=198200.0, ans=0.125 2024-09-17 10:02:17,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=198200.0, ans=0.0 2024-09-17 10:02:26,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=198240.0, ans=0.125 2024-09-17 10:02:29,145 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.197e+01 9.375e+01 1.006e+02 1.083e+02 2.279e+02, threshold=2.011e+02, percent-clipped=1.0 2024-09-17 10:02:40,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2024-09-17 10:03:04,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=198320.0, ans=0.07 2024-09-17 10:03:17,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=198360.0, ans=0.5 2024-09-17 10:03:27,242 INFO [train.py:1198] (1/2) Epoch 11, batch 4350, loss[loss=0.2869, ctc_loss=0.1832, cr_loss=0.4409, attn_decoder_loss=0.2887, over 29449.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.1686, cr_loss=0.4082, attn_decoder_loss=0.2681, over 5799604.64 frames. ], batch size: 97, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 10:03:41,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2024-09-17 10:03:41,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.90 vs. limit=12.0 2024-09-17 10:03:42,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=198440.0, ans=0.125 2024-09-17 10:04:40,214 INFO [train.py:1198] (1/2) Epoch 11, batch 4400, loss[loss=0.2822, ctc_loss=0.1905, cr_loss=0.4274, attn_decoder_loss=0.2829, over 27258.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1707, cr_loss=0.4111, attn_decoder_loss=0.2703, over 5766483.59 frames. ], batch size: 125, lr: 1.00e-02, grad_scale: 16.0 2024-09-17 10:04:59,280 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.516e+01 9.761e+01 1.030e+02 1.162e+02 9.107e+02, threshold=2.060e+02, percent-clipped=3.0 2024-09-17 10:05:01,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=198640.0, ans=0.125 2024-09-17 10:05:03,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=198640.0, ans=0.1 2024-09-17 10:05:18,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=198680.0, ans=0.0 2024-09-17 10:05:21,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=198680.0, ans=0.0 2024-09-17 10:05:22,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-09-17 10:05:24,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=198720.0, ans=0.125 2024-09-17 10:05:25,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=198720.0, ans=0.2 2024-09-17 10:05:55,214 INFO [train.py:1198] (1/2) Epoch 11, batch 4450, loss[loss=0.2988, ctc_loss=0.2244, cr_loss=0.4524, attn_decoder_loss=0.297, over 20058.00 frames. ], tot_loss[loss=0.2721, ctc_loss=0.1764, cr_loss=0.4154, attn_decoder_loss=0.2735, over 5575744.19 frames. ], batch size: 209, lr: 9.99e-03, grad_scale: 8.0 2024-09-17 10:06:04,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=198800.0, ans=0.125 2024-09-17 10:06:09,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=198840.0, ans=0.0 2024-09-17 10:06:19,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=198840.0, ans=0.05 2024-09-17 10:06:19,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=198840.0, ans=0.125 2024-09-17 10:06:40,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=198920.0, ans=0.0 2024-09-17 10:06:43,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=15.0 2024-09-17 10:06:49,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=198920.0, ans=0.125 2024-09-17 10:06:51,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=198920.0, ans=0.125 2024-09-17 10:07:11,092 INFO [train.py:1198] (1/2) Epoch 11, batch 4500, loss[loss=0.2841, ctc_loss=0.2002, cr_loss=0.4225, attn_decoder_loss=0.284, over 21358.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1826, cr_loss=0.4175, attn_decoder_loss=0.2766, over 5238002.99 frames. ], batch size: 209, lr: 9.99e-03, grad_scale: 8.0 2024-09-17 10:07:14,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=199000.0, ans=0.0 2024-09-17 10:07:17,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=199000.0, ans=0.125 2024-09-17 10:07:23,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.42 vs. limit=22.5 2024-09-17 10:07:26,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=199040.0, ans=0.0 2024-09-17 10:07:31,558 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.514e+01 1.082e+02 1.141e+02 1.239e+02 5.446e+02, threshold=2.282e+02, percent-clipped=2.0 2024-09-17 10:08:38,953 INFO [train.py:1198] (1/2) Epoch 12, batch 0, loss[loss=0.2548, ctc_loss=0.161, cr_loss=0.3915, attn_decoder_loss=0.2565, over 29591.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.161, cr_loss=0.3915, attn_decoder_loss=0.2565, over 29591.00 frames. ], batch size: 73, lr: 9.56e-03, grad_scale: 16.0 2024-09-17 10:08:38,954 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 10:08:46,525 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.0861, 3.9318, 3.6337, 3.4470], device='cuda:1') 2024-09-17 10:08:57,356 INFO [train.py:1230] (1/2) Epoch 12, validation: loss=0.2149, ctc_loss=0.04611, cr_loss=4.481e-15, attn_decoder_loss=0.2337, over 944034.00 frames. 2024-09-17 10:08:57,357 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 10:09:18,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=22.5 2024-09-17 10:09:18,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199140.0, ans=0.1 2024-09-17 10:09:20,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=199140.0, ans=0.125 2024-09-17 10:09:31,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=199180.0, ans=0.125 2024-09-17 10:09:51,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=199220.0, ans=0.125 2024-09-17 10:09:53,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=199220.0, ans=0.95 2024-09-17 10:09:57,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2024-09-17 10:09:58,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=199260.0, ans=0.125 2024-09-17 10:10:13,549 INFO [train.py:1198] (1/2) Epoch 12, batch 50, loss[loss=0.2305, ctc_loss=0.1379, cr_loss=0.3819, attn_decoder_loss=0.2323, over 29432.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1681, cr_loss=0.4092, attn_decoder_loss=0.2661, over 1268096.16 frames. ], batch size: 70, lr: 9.56e-03, grad_scale: 8.0 2024-09-17 10:10:15,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=199300.0, ans=0.0 2024-09-17 10:10:35,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=199340.0, ans=0.125 2024-09-17 10:10:44,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=199380.0, ans=0.125 2024-09-17 10:11:03,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.75 vs. limit=15.0 2024-09-17 10:11:04,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=199420.0, ans=0.025 2024-09-17 10:11:16,126 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.442e+01 9.517e+01 1.012e+02 1.140e+02 5.609e+02, threshold=2.023e+02, percent-clipped=2.0 2024-09-17 10:11:16,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=199460.0, ans=0.0 2024-09-17 10:11:33,302 INFO [train.py:1198] (1/2) Epoch 12, batch 100, loss[loss=0.2601, ctc_loss=0.1561, cr_loss=0.39, attn_decoder_loss=0.263, over 29551.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.1692, cr_loss=0.4113, attn_decoder_loss=0.2682, over 2253680.23 frames. ], batch size: 76, lr: 9.56e-03, grad_scale: 8.0 2024-09-17 10:11:57,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=199540.0, ans=0.0 2024-09-17 10:12:47,661 INFO [train.py:1198] (1/2) Epoch 12, batch 150, loss[loss=0.236, ctc_loss=0.1335, cr_loss=0.3657, attn_decoder_loss=0.2393, over 29424.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.166, cr_loss=0.4052, attn_decoder_loss=0.2654, over 3049262.22 frames. ], batch size: 70, lr: 9.55e-03, grad_scale: 8.0 2024-09-17 10:12:47,964 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:13:06,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=199740.0, ans=0.125 2024-09-17 10:13:33,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=199820.0, ans=0.2 2024-09-17 10:13:39,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=199820.0, ans=0.125 2024-09-17 10:13:47,578 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.054e+01 9.523e+01 1.007e+02 1.391e+02, threshold=1.905e+02, percent-clipped=0.0 2024-09-17 10:13:52,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=199860.0, ans=0.09899494936611666 2024-09-17 10:13:55,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=199860.0, ans=0.1 2024-09-17 10:14:02,684 INFO [train.py:1198] (1/2) Epoch 12, batch 200, loss[loss=0.2702, ctc_loss=0.1738, cr_loss=0.4121, attn_decoder_loss=0.2718, over 27144.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1642, cr_loss=0.4031, attn_decoder_loss=0.2635, over 3660327.54 frames. ], batch size: 124, lr: 9.55e-03, grad_scale: 8.0 2024-09-17 10:14:30,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=199940.0, ans=0.1 2024-09-17 10:14:38,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=199980.0, ans=0.0 2024-09-17 10:14:43,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=199980.0, ans=0.0 2024-09-17 10:15:08,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=15.0 2024-09-17 10:15:20,794 INFO [train.py:1198] (1/2) Epoch 12, batch 250, loss[loss=0.2671, ctc_loss=0.1625, cr_loss=0.378, attn_decoder_loss=0.2703, over 29289.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1636, cr_loss=0.4022, attn_decoder_loss=0.2633, over 4142597.14 frames. ], batch size: 100, lr: 9.54e-03, grad_scale: 8.0 2024-09-17 10:15:25,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=200100.0, ans=0.125 2024-09-17 10:15:35,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=200100.0, ans=0.125 2024-09-17 10:15:48,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=200140.0, ans=0.125 2024-09-17 10:16:23,313 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.074e+01 9.707e+01 1.061e+02 3.060e+02, threshold=1.941e+02, percent-clipped=1.0 2024-09-17 10:16:26,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=200260.0, ans=0.025 2024-09-17 10:16:38,682 INFO [train.py:1198] (1/2) Epoch 12, batch 300, loss[loss=0.2878, ctc_loss=0.1908, cr_loss=0.4295, attn_decoder_loss=0.2891, over 29572.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1633, cr_loss=0.402, attn_decoder_loss=0.2632, over 4512163.52 frames. ], batch size: 92, lr: 9.54e-03, grad_scale: 8.0 2024-09-17 10:16:49,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=200300.0, ans=0.0 2024-09-17 10:17:18,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=200380.0, ans=0.125 2024-09-17 10:17:22,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=200420.0, ans=0.1 2024-09-17 10:17:28,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=200420.0, ans=0.125 2024-09-17 10:17:32,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=15.0 2024-09-17 10:17:34,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=200420.0, ans=0.1 2024-09-17 10:17:54,235 INFO [train.py:1198] (1/2) Epoch 12, batch 350, loss[loss=0.2389, ctc_loss=0.1444, cr_loss=0.3762, attn_decoder_loss=0.241, over 29333.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1634, cr_loss=0.402, attn_decoder_loss=0.2635, over 4795911.89 frames. ], batch size: 71, lr: 9.53e-03, grad_scale: 8.0 2024-09-17 10:17:54,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=200500.0, ans=0.125 2024-09-17 10:18:05,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=200500.0, ans=0.125 2024-09-17 10:18:45,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=12.0 2024-09-17 10:18:56,609 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 9.289e+01 1.000e+02 1.114e+02 4.401e+02, threshold=2.000e+02, percent-clipped=4.0 2024-09-17 10:19:04,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2024-09-17 10:19:11,721 INFO [train.py:1198] (1/2) Epoch 12, batch 400, loss[loss=0.2842, ctc_loss=0.1833, cr_loss=0.4478, attn_decoder_loss=0.2854, over 29703.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1625, cr_loss=0.4015, attn_decoder_loss=0.2628, over 5025191.70 frames. ], batch size: 82, lr: 9.53e-03, grad_scale: 16.0 2024-09-17 10:19:12,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=200700.0, ans=0.125 2024-09-17 10:19:15,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=200700.0, ans=0.2 2024-09-17 10:19:30,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=200740.0, ans=0.125 2024-09-17 10:19:32,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=200740.0, ans=0.1 2024-09-17 10:19:34,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2024-09-17 10:19:56,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-17 10:20:07,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=200820.0, ans=0.125 2024-09-17 10:20:13,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=200860.0, ans=0.0 2024-09-17 10:20:21,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=200860.0, ans=0.125 2024-09-17 10:20:30,344 INFO [train.py:1198] (1/2) Epoch 12, batch 450, loss[loss=0.2766, ctc_loss=0.1787, cr_loss=0.4215, attn_decoder_loss=0.2781, over 29671.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1626, cr_loss=0.4017, attn_decoder_loss=0.2629, over 5186767.80 frames. ], batch size: 83, lr: 9.52e-03, grad_scale: 8.0 2024-09-17 10:20:44,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=200940.0, ans=0.125 2024-09-17 10:20:55,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=200940.0, ans=0.125 2024-09-17 10:21:16,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=201020.0, ans=0.0 2024-09-17 10:21:30,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.25 vs. limit=22.5 2024-09-17 10:21:32,801 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.842e+01 9.090e+01 9.686e+01 1.023e+02 4.799e+02, threshold=1.937e+02, percent-clipped=1.0 2024-09-17 10:21:39,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=201060.0, ans=0.0 2024-09-17 10:21:46,300 INFO [train.py:1198] (1/2) Epoch 12, batch 500, loss[loss=0.281, ctc_loss=0.1823, cr_loss=0.4641, attn_decoder_loss=0.2817, over 29404.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1623, cr_loss=0.4016, attn_decoder_loss=0.2626, over 5329603.95 frames. ], batch size: 94, lr: 9.52e-03, grad_scale: 8.0 2024-09-17 10:21:46,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=201100.0, ans=0.125 2024-09-17 10:21:55,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=201100.0, ans=0.125 2024-09-17 10:21:59,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.28 vs. limit=15.0 2024-09-17 10:22:08,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=201140.0, ans=0.2 2024-09-17 10:22:11,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.43 vs. limit=22.5 2024-09-17 10:22:15,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=201140.0, ans=0.125 2024-09-17 10:22:20,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=201180.0, ans=0.025 2024-09-17 10:22:21,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=201180.0, ans=0.05 2024-09-17 10:22:37,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=12.0 2024-09-17 10:22:46,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=201220.0, ans=10.0 2024-09-17 10:22:47,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=201260.0, ans=0.1 2024-09-17 10:22:47,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=201260.0, ans=0.0 2024-09-17 10:22:50,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=201260.0, ans=0.2 2024-09-17 10:22:59,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=201260.0, ans=0.0 2024-09-17 10:23:01,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=201260.0, ans=0.125 2024-09-17 10:23:03,906 INFO [train.py:1198] (1/2) Epoch 12, batch 550, loss[loss=0.278, ctc_loss=0.1677, cr_loss=0.414, attn_decoder_loss=0.2811, over 28896.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1624, cr_loss=0.4014, attn_decoder_loss=0.2626, over 5420818.32 frames. ], batch size: 104, lr: 9.51e-03, grad_scale: 4.0 2024-09-17 10:23:06,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=23.16 vs. limit=15.0 2024-09-17 10:23:39,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.23 vs. limit=6.0 2024-09-17 10:23:54,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2024-09-17 10:24:01,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2024-09-17 10:24:04,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=201420.0, ans=0.0 2024-09-17 10:24:06,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=201460.0, ans=0.0 2024-09-17 10:24:10,194 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.800e+01 9.023e+01 9.660e+01 1.069e+02 2.891e+02, threshold=1.932e+02, percent-clipped=2.0 2024-09-17 10:24:11,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2024-09-17 10:24:13,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=201460.0, ans=0.0 2024-09-17 10:24:22,370 INFO [train.py:1198] (1/2) Epoch 12, batch 600, loss[loss=0.2753, ctc_loss=0.1718, cr_loss=0.4037, attn_decoder_loss=0.2779, over 29216.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1629, cr_loss=0.4017, attn_decoder_loss=0.2631, over 5507473.02 frames. ], batch size: 100, lr: 9.51e-03, grad_scale: 8.0 2024-09-17 10:24:27,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-09-17 10:24:35,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2024-09-17 10:24:42,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.57 vs. limit=15.0 2024-09-17 10:24:58,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.27 vs. limit=22.5 2024-09-17 10:25:08,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.27 vs. limit=22.5 2024-09-17 10:25:20,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2024-09-17 10:25:21,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=201660.0, ans=0.0 2024-09-17 10:25:23,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=201660.0, ans=0.125 2024-09-17 10:25:38,287 INFO [train.py:1198] (1/2) Epoch 12, batch 650, loss[loss=0.2595, ctc_loss=0.1617, cr_loss=0.4141, attn_decoder_loss=0.2612, over 29760.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1621, cr_loss=0.4008, attn_decoder_loss=0.2625, over 5585513.00 frames. ], batch size: 81, lr: 9.50e-03, grad_scale: 8.0 2024-09-17 10:25:40,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.16 vs. limit=15.0 2024-09-17 10:25:52,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=22.5 2024-09-17 10:25:53,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=201740.0, ans=0.0 2024-09-17 10:26:11,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.57 vs. limit=15.0 2024-09-17 10:26:20,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=201780.0, ans=0.0 2024-09-17 10:26:29,306 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:26:30,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=201820.0, ans=0.125 2024-09-17 10:26:43,925 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.946e+01 9.154e+01 9.702e+01 1.037e+02 1.837e+02, threshold=1.940e+02, percent-clipped=0.0 2024-09-17 10:26:56,071 INFO [train.py:1198] (1/2) Epoch 12, batch 700, loss[loss=0.2566, ctc_loss=0.1557, cr_loss=0.3864, attn_decoder_loss=0.2592, over 29550.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1624, cr_loss=0.4011, attn_decoder_loss=0.2629, over 5636445.29 frames. ], batch size: 76, lr: 9.50e-03, grad_scale: 8.0 2024-09-17 10:27:28,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=201980.0, ans=0.125 2024-09-17 10:27:41,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=201980.0, ans=0.025 2024-09-17 10:27:54,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=202020.0, ans=0.125 2024-09-17 10:28:00,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=202060.0, ans=0.0 2024-09-17 10:28:11,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=202060.0, ans=0.125 2024-09-17 10:28:14,109 INFO [train.py:1198] (1/2) Epoch 12, batch 750, loss[loss=0.2693, ctc_loss=0.1631, cr_loss=0.4174, attn_decoder_loss=0.2718, over 29721.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1629, cr_loss=0.4012, attn_decoder_loss=0.2628, over 5675727.33 frames. ], batch size: 82, lr: 9.49e-03, grad_scale: 8.0 2024-09-17 10:28:15,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-17 10:28:21,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202100.0, ans=0.1 2024-09-17 10:28:26,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=202100.0, ans=0.125 2024-09-17 10:28:32,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=202140.0, ans=0.2 2024-09-17 10:28:32,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=12.0 2024-09-17 10:28:33,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=202140.0, ans=0.1 2024-09-17 10:28:38,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=202140.0, ans=0.125 2024-09-17 10:28:39,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=202140.0, ans=0.0 2024-09-17 10:28:40,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.72 vs. limit=15.0 2024-09-17 10:28:51,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=202180.0, ans=0.0 2024-09-17 10:28:53,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=202180.0, ans=0.2 2024-09-17 10:29:09,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=202220.0, ans=22.5 2024-09-17 10:29:17,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.082e+01 9.622e+01 1.037e+02 1.120e+02 2.104e+02, threshold=2.074e+02, percent-clipped=1.0 2024-09-17 10:29:29,714 INFO [train.py:1198] (1/2) Epoch 12, batch 800, loss[loss=0.2444, ctc_loss=0.1457, cr_loss=0.3907, attn_decoder_loss=0.2467, over 29621.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1627, cr_loss=0.4009, attn_decoder_loss=0.2627, over 5707061.16 frames. ], batch size: 73, lr: 9.49e-03, grad_scale: 16.0 2024-09-17 10:29:35,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2024-09-17 10:29:39,068 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:29:45,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2024-09-17 10:29:56,525 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:30:07,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=202380.0, ans=0.125 2024-09-17 10:30:13,543 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:30:39,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=202460.0, ans=0.125 2024-09-17 10:30:47,501 INFO [train.py:1198] (1/2) Epoch 12, batch 850, loss[loss=0.2738, ctc_loss=0.1626, cr_loss=0.4058, attn_decoder_loss=0.2771, over 29699.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1624, cr_loss=0.4005, attn_decoder_loss=0.2624, over 5736412.38 frames. ], batch size: 89, lr: 9.49e-03, grad_scale: 4.0 2024-09-17 10:31:08,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=202540.0, ans=0.1 2024-09-17 10:31:36,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=202620.0, ans=0.025 2024-09-17 10:31:41,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202620.0, ans=0.1 2024-09-17 10:31:55,955 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.132e+01 9.370e+01 1.044e+02 1.217e+02 3.517e+02, threshold=2.088e+02, percent-clipped=3.0 2024-09-17 10:32:02,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202660.0, ans=0.1 2024-09-17 10:32:04,983 INFO [train.py:1198] (1/2) Epoch 12, batch 900, loss[loss=0.2409, ctc_loss=0.1492, cr_loss=0.3588, attn_decoder_loss=0.2431, over 29602.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1627, cr_loss=0.4014, attn_decoder_loss=0.2627, over 5740922.98 frames. ], batch size: 73, lr: 9.48e-03, grad_scale: 8.0 2024-09-17 10:32:56,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=202820.0, ans=0.125 2024-09-17 10:33:07,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-17 10:33:20,314 INFO [train.py:1198] (1/2) Epoch 12, batch 950, loss[loss=0.2502, ctc_loss=0.1515, cr_loss=0.3871, attn_decoder_loss=0.2525, over 29510.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.163, cr_loss=0.401, attn_decoder_loss=0.2629, over 5743601.38 frames. ], batch size: 74, lr: 9.48e-03, grad_scale: 8.0 2024-09-17 10:33:31,110 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:33:45,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=16.81 vs. limit=15.0 2024-09-17 10:34:28,767 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.408e+01 9.367e+01 1.035e+02 1.151e+02 3.076e+02, threshold=2.071e+02, percent-clipped=5.0 2024-09-17 10:34:29,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=203060.0, ans=0.125 2024-09-17 10:34:34,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=203060.0, ans=0.5 2024-09-17 10:34:37,627 INFO [train.py:1198] (1/2) Epoch 12, batch 1000, loss[loss=0.2586, ctc_loss=0.163, cr_loss=0.3967, attn_decoder_loss=0.2604, over 29511.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1639, cr_loss=0.4022, attn_decoder_loss=0.2638, over 5737465.06 frames. ], batch size: 77, lr: 9.47e-03, grad_scale: 8.0 2024-09-17 10:35:03,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=203140.0, ans=0.0 2024-09-17 10:35:03,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=203140.0, ans=0.125 2024-09-17 10:35:03,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=203140.0, ans=0.125 2024-09-17 10:35:16,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=203180.0, ans=0.125 2024-09-17 10:35:22,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=203180.0, ans=0.0 2024-09-17 10:35:43,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.02 vs. limit=15.0 2024-09-17 10:35:55,926 INFO [train.py:1198] (1/2) Epoch 12, batch 1050, loss[loss=0.268, ctc_loss=0.1596, cr_loss=0.3935, attn_decoder_loss=0.2713, over 29678.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1636, cr_loss=0.4023, attn_decoder_loss=0.2633, over 5746797.35 frames. ], batch size: 85, lr: 9.47e-03, grad_scale: 4.0 2024-09-17 10:36:02,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=203300.0, ans=0.1 2024-09-17 10:36:22,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=203340.0, ans=0.125 2024-09-17 10:36:41,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=203420.0, ans=0.125 2024-09-17 10:36:55,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=203460.0, ans=0.0 2024-09-17 10:37:04,274 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.982e+01 9.127e+01 9.739e+01 1.067e+02 1.550e+02, threshold=1.948e+02, percent-clipped=0.0 2024-09-17 10:37:04,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=203460.0, ans=0.0 2024-09-17 10:37:11,898 INFO [train.py:1198] (1/2) Epoch 12, batch 1100, loss[loss=0.2515, ctc_loss=0.1516, cr_loss=0.393, attn_decoder_loss=0.2538, over 29464.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.163, cr_loss=0.4019, attn_decoder_loss=0.263, over 5758190.90 frames. ], batch size: 78, lr: 9.46e-03, grad_scale: 8.0 2024-09-17 10:38:02,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.56 vs. limit=22.5 2024-09-17 10:38:06,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=203620.0, ans=0.125 2024-09-17 10:38:30,364 INFO [train.py:1198] (1/2) Epoch 12, batch 1150, loss[loss=0.2494, ctc_loss=0.1543, cr_loss=0.3731, attn_decoder_loss=0.2516, over 29456.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1632, cr_loss=0.4021, attn_decoder_loss=0.2631, over 5756046.90 frames. ], batch size: 78, lr: 9.46e-03, grad_scale: 8.0 2024-09-17 10:38:53,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=203740.0, ans=0.0 2024-09-17 10:39:07,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=203780.0, ans=0.2 2024-09-17 10:39:08,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=203780.0, ans=0.125 2024-09-17 10:39:13,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=203780.0, ans=10.0 2024-09-17 10:39:27,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-17 10:39:40,832 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.985e+01 9.529e+01 1.043e+02 1.150e+02 3.679e+02, threshold=2.085e+02, percent-clipped=2.0 2024-09-17 10:39:47,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=203900.0, ans=0.125 2024-09-17 10:39:48,421 INFO [train.py:1198] (1/2) Epoch 12, batch 1200, loss[loss=0.2786, ctc_loss=0.1673, cr_loss=0.4536, attn_decoder_loss=0.2809, over 29671.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.1641, cr_loss=0.4034, attn_decoder_loss=0.2641, over 5748132.60 frames. ], batch size: 85, lr: 9.45e-03, grad_scale: 16.0 2024-09-17 10:40:10,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.43 vs. limit=15.0 2024-09-17 10:40:11,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=203940.0, ans=0.2 2024-09-17 10:40:30,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=203980.0, ans=0.125 2024-09-17 10:40:42,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=204020.0, ans=0.125 2024-09-17 10:41:04,803 INFO [train.py:1198] (1/2) Epoch 12, batch 1250, loss[loss=0.2804, ctc_loss=0.1817, cr_loss=0.4238, attn_decoder_loss=0.282, over 29515.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1643, cr_loss=0.4044, attn_decoder_loss=0.2644, over 5775295.28 frames. ], batch size: 92, lr: 9.45e-03, grad_scale: 8.0 2024-09-17 10:41:06,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=204100.0, ans=0.125 2024-09-17 10:41:24,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-09-17 10:41:37,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=204180.0, ans=0.125 2024-09-17 10:42:04,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=204220.0, ans=0.125 2024-09-17 10:42:08,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=204260.0, ans=0.07 2024-09-17 10:42:14,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-09-17 10:42:16,783 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.687e+01 9.051e+01 9.485e+01 1.007e+02 2.061e+02, threshold=1.897e+02, percent-clipped=0.0 2024-09-17 10:42:19,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.88 vs. limit=10.0 2024-09-17 10:42:22,706 INFO [train.py:1198] (1/2) Epoch 12, batch 1300, loss[loss=0.2789, ctc_loss=0.1765, cr_loss=0.4154, attn_decoder_loss=0.281, over 28480.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1637, cr_loss=0.403, attn_decoder_loss=0.2638, over 5778725.34 frames. ], batch size: 112, lr: 9.44e-03, grad_scale: 8.0 2024-09-17 10:42:35,288 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:42:38,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=204340.0, ans=0.2 2024-09-17 10:42:39,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=204340.0, ans=0.125 2024-09-17 10:42:45,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=204340.0, ans=0.125 2024-09-17 10:43:04,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-09-17 10:43:35,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=204460.0, ans=0.125 2024-09-17 10:43:40,861 INFO [train.py:1198] (1/2) Epoch 12, batch 1350, loss[loss=0.2567, ctc_loss=0.1494, cr_loss=0.3641, attn_decoder_loss=0.2606, over 29755.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1627, cr_loss=0.4016, attn_decoder_loss=0.2632, over 5795926.83 frames. ], batch size: 81, lr: 9.44e-03, grad_scale: 8.0 2024-09-17 10:43:50,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.25 vs. limit=15.0 2024-09-17 10:44:25,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=204620.0, ans=0.2 2024-09-17 10:44:49,480 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 9.112e+01 9.581e+01 1.019e+02 1.292e+02, threshold=1.916e+02, percent-clipped=0.0 2024-09-17 10:44:51,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=204660.0, ans=0.025 2024-09-17 10:44:53,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2024-09-17 10:44:55,538 INFO [train.py:1198] (1/2) Epoch 12, batch 1400, loss[loss=0.2213, ctc_loss=0.1216, cr_loss=0.3218, attn_decoder_loss=0.2253, over 29556.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1622, cr_loss=0.4007, attn_decoder_loss=0.2628, over 5806753.09 frames. ], batch size: 69, lr: 9.44e-03, grad_scale: 8.0 2024-09-17 10:45:10,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=204740.0, ans=0.2 2024-09-17 10:45:18,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=204740.0, ans=0.125 2024-09-17 10:46:03,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=204860.0, ans=0.125 2024-09-17 10:46:13,810 INFO [train.py:1198] (1/2) Epoch 12, batch 1450, loss[loss=0.2714, ctc_loss=0.1679, cr_loss=0.3995, attn_decoder_loss=0.274, over 29451.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1629, cr_loss=0.402, attn_decoder_loss=0.2636, over 5804789.23 frames. ], batch size: 94, lr: 9.43e-03, grad_scale: 4.0 2024-09-17 10:46:16,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2024-09-17 10:46:44,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=204980.0, ans=0.0 2024-09-17 10:46:53,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=204980.0, ans=0.07 2024-09-17 10:47:04,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=205020.0, ans=0.125 2024-09-17 10:47:05,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=205020.0, ans=0.125 2024-09-17 10:47:26,848 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.748e+01 9.404e+01 1.003e+02 1.073e+02 8.206e+02, threshold=2.005e+02, percent-clipped=2.0 2024-09-17 10:47:28,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=205060.0, ans=0.125 2024-09-17 10:47:31,572 INFO [train.py:1198] (1/2) Epoch 12, batch 1500, loss[loss=0.2753, ctc_loss=0.1726, cr_loss=0.4141, attn_decoder_loss=0.2775, over 29626.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1629, cr_loss=0.402, attn_decoder_loss=0.2638, over 5805563.93 frames. ], batch size: 86, lr: 9.43e-03, grad_scale: 8.0 2024-09-17 10:47:38,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=205100.0, ans=0.0 2024-09-17 10:48:06,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205180.0, ans=0.1 2024-09-17 10:48:41,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=205260.0, ans=0.125 2024-09-17 10:48:42,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=205260.0, ans=0.125 2024-09-17 10:48:47,594 INFO [train.py:1198] (1/2) Epoch 12, batch 1550, loss[loss=0.2731, ctc_loss=0.1691, cr_loss=0.4089, attn_decoder_loss=0.2755, over 29523.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1632, cr_loss=0.4017, attn_decoder_loss=0.2638, over 5782584.88 frames. ], batch size: 90, lr: 9.42e-03, grad_scale: 8.0 2024-09-17 10:49:02,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=205340.0, ans=15.0 2024-09-17 10:49:03,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=205340.0, ans=0.0 2024-09-17 10:49:35,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=205420.0, ans=0.2 2024-09-17 10:49:51,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=205460.0, ans=0.125 2024-09-17 10:49:54,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=205460.0, ans=0.125 2024-09-17 10:50:01,908 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.192e+01 9.435e+01 1.085e+02 1.264e+02 1.596e+02, threshold=2.170e+02, percent-clipped=0.0 2024-09-17 10:50:04,919 INFO [train.py:1198] (1/2) Epoch 12, batch 1600, loss[loss=0.2612, ctc_loss=0.1583, cr_loss=0.3726, attn_decoder_loss=0.2644, over 29677.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1632, cr_loss=0.401, attn_decoder_loss=0.2637, over 5766074.05 frames. ], batch size: 85, lr: 9.42e-03, grad_scale: 8.0 2024-09-17 10:50:12,819 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:50:44,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=205580.0, ans=0.025 2024-09-17 10:51:03,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=205620.0, ans=0.1 2024-09-17 10:51:08,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=205660.0, ans=0.0 2024-09-17 10:51:23,146 INFO [train.py:1198] (1/2) Epoch 12, batch 1650, loss[loss=0.2697, ctc_loss=0.1629, cr_loss=0.4229, attn_decoder_loss=0.2722, over 29707.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1631, cr_loss=0.4009, attn_decoder_loss=0.2636, over 5760189.51 frames. ], batch size: 89, lr: 9.41e-03, grad_scale: 4.0 2024-09-17 10:51:25,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=205700.0, ans=0.125 2024-09-17 10:51:29,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=205700.0, ans=0.0 2024-09-17 10:51:34,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=205700.0, ans=0.0 2024-09-17 10:51:39,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=205740.0, ans=0.0 2024-09-17 10:51:44,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2024-09-17 10:51:48,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=205740.0, ans=0.025 2024-09-17 10:52:01,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=205780.0, ans=10.0 2024-09-17 10:52:07,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=205820.0, ans=0.1 2024-09-17 10:52:14,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=205820.0, ans=0.0 2024-09-17 10:52:17,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=205820.0, ans=0.125 2024-09-17 10:52:20,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff3.min_abs, batch_count=205820.0, ans=0.2 2024-09-17 10:52:36,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.373e+01 9.988e+01 1.108e+02 2.072e+02, threshold=1.998e+02, percent-clipped=0.0 2024-09-17 10:52:38,331 INFO [train.py:1198] (1/2) Epoch 12, batch 1700, loss[loss=0.2239, ctc_loss=0.127, cr_loss=0.3608, attn_decoder_loss=0.2266, over 29559.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1626, cr_loss=0.401, attn_decoder_loss=0.2633, over 5781300.34 frames. ], batch size: 69, lr: 9.41e-03, grad_scale: 8.0 2024-09-17 10:52:40,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=205900.0, ans=0.125 2024-09-17 10:53:02,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.68 vs. limit=15.0 2024-09-17 10:53:38,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=206020.0, ans=0.125 2024-09-17 10:53:48,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=206060.0, ans=0.0 2024-09-17 10:53:50,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=206060.0, ans=0.0 2024-09-17 10:53:51,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=206060.0, ans=0.07 2024-09-17 10:53:55,971 INFO [train.py:1198] (1/2) Epoch 12, batch 1750, loss[loss=0.2297, ctc_loss=0.1351, cr_loss=0.356, attn_decoder_loss=0.2323, over 29366.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1621, cr_loss=0.4004, attn_decoder_loss=0.2628, over 5789508.89 frames. ], batch size: 67, lr: 9.40e-03, grad_scale: 8.0 2024-09-17 10:53:56,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=206100.0, ans=10.0 2024-09-17 10:54:15,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-09-17 10:54:15,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=206140.0, ans=0.125 2024-09-17 10:54:15,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=206140.0, ans=0.125 2024-09-17 10:54:20,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=206140.0, ans=0.1 2024-09-17 10:55:03,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=206260.0, ans=0.2 2024-09-17 10:55:11,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.922e+01 9.583e+01 1.012e+02 1.403e+02, threshold=1.917e+02, percent-clipped=0.0 2024-09-17 10:55:12,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=206300.0, ans=0.125 2024-09-17 10:55:13,151 INFO [train.py:1198] (1/2) Epoch 12, batch 1800, loss[loss=0.2718, ctc_loss=0.1594, cr_loss=0.3955, attn_decoder_loss=0.2754, over 29686.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1624, cr_loss=0.4006, attn_decoder_loss=0.2631, over 5791697.17 frames. ], batch size: 83, lr: 9.40e-03, grad_scale: 8.0 2024-09-17 10:55:15,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=206300.0, ans=0.2 2024-09-17 10:55:19,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=206300.0, ans=0.5 2024-09-17 10:55:22,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=206300.0, ans=0.2 2024-09-17 10:55:51,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-09-17 10:55:55,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=206380.0, ans=0.1 2024-09-17 10:55:58,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2024-09-17 10:56:03,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=206420.0, ans=0.0 2024-09-17 10:56:20,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=206460.0, ans=0.07 2024-09-17 10:56:29,245 INFO [train.py:1198] (1/2) Epoch 12, batch 1850, loss[loss=0.2758, ctc_loss=0.1767, cr_loss=0.4083, attn_decoder_loss=0.2777, over 29656.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1622, cr_loss=0.4006, attn_decoder_loss=0.2629, over 5797619.89 frames. ], batch size: 86, lr: 9.40e-03, grad_scale: 8.0 2024-09-17 10:57:44,673 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 9.093e+01 9.711e+01 1.054e+02 1.569e+02, threshold=1.942e+02, percent-clipped=0.0 2024-09-17 10:57:46,171 INFO [train.py:1198] (1/2) Epoch 12, batch 1900, loss[loss=0.2731, ctc_loss=0.173, cr_loss=0.4119, attn_decoder_loss=0.275, over 29713.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1623, cr_loss=0.401, attn_decoder_loss=0.2632, over 5805322.39 frames. ], batch size: 89, lr: 9.39e-03, grad_scale: 8.0 2024-09-17 10:58:24,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=206780.0, ans=0.1 2024-09-17 10:58:49,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=206860.0, ans=0.125 2024-09-17 10:58:53,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-17 10:59:01,599 INFO [train.py:1198] (1/2) Epoch 12, batch 1950, loss[loss=0.2496, ctc_loss=0.1524, cr_loss=0.3763, attn_decoder_loss=0.2521, over 29432.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.163, cr_loss=0.4031, attn_decoder_loss=0.2643, over 5819751.50 frames. ], batch size: 78, lr: 9.39e-03, grad_scale: 8.0 2024-09-17 10:59:38,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=206980.0, ans=0.95 2024-09-17 10:59:40,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=206980.0, ans=0.125 2024-09-17 10:59:50,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=207020.0, ans=0.2 2024-09-17 11:00:17,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 9.308e+01 9.738e+01 1.045e+02 2.594e+02, threshold=1.948e+02, percent-clipped=1.0 2024-09-17 11:00:19,010 INFO [train.py:1198] (1/2) Epoch 12, batch 2000, loss[loss=0.2311, ctc_loss=0.1437, cr_loss=0.3762, attn_decoder_loss=0.2325, over 29375.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1643, cr_loss=0.4041, attn_decoder_loss=0.2652, over 5796871.35 frames. ], batch size: 67, lr: 9.38e-03, grad_scale: 16.0 2024-09-17 11:00:54,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=207180.0, ans=0.1 2024-09-17 11:01:08,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=207220.0, ans=0.0 2024-09-17 11:01:37,092 INFO [train.py:1198] (1/2) Epoch 12, batch 2050, loss[loss=0.2272, ctc_loss=0.1385, cr_loss=0.3601, attn_decoder_loss=0.2291, over 29458.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1631, cr_loss=0.4019, attn_decoder_loss=0.2641, over 5789017.47 frames. ], batch size: 70, lr: 9.38e-03, grad_scale: 4.0 2024-09-17 11:01:40,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=207300.0, ans=0.125 2024-09-17 11:01:42,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.28 vs. limit=15.0 2024-09-17 11:02:00,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=207340.0, ans=0.0 2024-09-17 11:02:18,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-09-17 11:02:19,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=207380.0, ans=0.07 2024-09-17 11:02:32,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=207420.0, ans=0.125 2024-09-17 11:02:48,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=207460.0, ans=0.125 2024-09-17 11:02:52,880 INFO [train.py:1198] (1/2) Epoch 12, batch 2100, loss[loss=0.2552, ctc_loss=0.1545, cr_loss=0.3953, attn_decoder_loss=0.2576, over 29752.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1624, cr_loss=0.4009, attn_decoder_loss=0.2631, over 5799629.48 frames. ], batch size: 81, lr: 9.37e-03, grad_scale: 8.0 2024-09-17 11:02:54,455 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.176e+01 9.560e+01 1.030e+02 1.406e+02, threshold=1.912e+02, percent-clipped=0.0 2024-09-17 11:03:08,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.59 vs. limit=15.0 2024-09-17 11:03:30,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=207580.0, ans=0.025 2024-09-17 11:03:34,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=207580.0, ans=0.0 2024-09-17 11:03:39,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=207620.0, ans=0.125 2024-09-17 11:03:46,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=207620.0, ans=0.0 2024-09-17 11:03:49,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=207620.0, ans=0.2 2024-09-17 11:03:51,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=207620.0, ans=0.0 2024-09-17 11:04:00,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=207660.0, ans=0.125 2024-09-17 11:04:01,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=207660.0, ans=0.125 2024-09-17 11:04:09,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=207700.0, ans=0.0 2024-09-17 11:04:10,888 INFO [train.py:1198] (1/2) Epoch 12, batch 2150, loss[loss=0.2611, ctc_loss=0.1673, cr_loss=0.4214, attn_decoder_loss=0.2622, over 29443.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1617, cr_loss=0.4001, attn_decoder_loss=0.2623, over 5813742.10 frames. ], batch size: 78, lr: 9.37e-03, grad_scale: 4.0 2024-09-17 11:04:29,051 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=12.0 2024-09-17 11:04:37,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=207740.0, ans=0.0 2024-09-17 11:04:39,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=207740.0, ans=0.1 2024-09-17 11:04:54,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=207780.0, ans=0.125 2024-09-17 11:04:57,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=207820.0, ans=0.125 2024-09-17 11:05:00,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=207820.0, ans=0.04949747468305833 2024-09-17 11:05:08,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=207820.0, ans=0.2 2024-09-17 11:05:22,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=12.0 2024-09-17 11:05:28,908 INFO [train.py:1198] (1/2) Epoch 12, batch 2200, loss[loss=0.2733, ctc_loss=0.1721, cr_loss=0.3965, attn_decoder_loss=0.2757, over 29617.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1618, cr_loss=0.4006, attn_decoder_loss=0.2623, over 5809589.40 frames. ], batch size: 86, lr: 9.36e-03, grad_scale: 8.0 2024-09-17 11:05:31,915 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 9.159e+01 9.816e+01 1.050e+02 6.382e+02, threshold=1.963e+02, percent-clipped=1.0 2024-09-17 11:05:45,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=207940.0, ans=0.025 2024-09-17 11:05:49,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.72 vs. limit=15.0 2024-09-17 11:05:53,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=207940.0, ans=0.0 2024-09-17 11:06:20,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=208020.0, ans=10.0 2024-09-17 11:06:31,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=208020.0, ans=0.0 2024-09-17 11:06:33,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2024-09-17 11:06:51,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=208100.0, ans=0.07 2024-09-17 11:06:52,580 INFO [train.py:1198] (1/2) Epoch 12, batch 2250, loss[loss=0.2645, ctc_loss=0.1614, cr_loss=0.4033, attn_decoder_loss=0.267, over 29716.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.162, cr_loss=0.4007, attn_decoder_loss=0.2626, over 5810044.85 frames. ], batch size: 82, lr: 9.36e-03, grad_scale: 8.0 2024-09-17 11:07:05,617 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-17 11:07:13,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=208140.0, ans=0.0 2024-09-17 11:07:19,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=208140.0, ans=0.2 2024-09-17 11:07:34,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208180.0, ans=0.1 2024-09-17 11:07:37,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=208180.0, ans=0.125 2024-09-17 11:08:00,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=208260.0, ans=0.0 2024-09-17 11:08:10,391 INFO [train.py:1198] (1/2) Epoch 12, batch 2300, loss[loss=0.2293, ctc_loss=0.1278, cr_loss=0.344, attn_decoder_loss=0.2329, over 29311.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.161, cr_loss=0.3988, attn_decoder_loss=0.2614, over 5799139.00 frames. ], batch size: 71, lr: 9.36e-03, grad_scale: 8.0 2024-09-17 11:08:13,461 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 9.033e+01 9.553e+01 1.076e+02 7.023e+02, threshold=1.911e+02, percent-clipped=3.0 2024-09-17 11:08:18,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=208300.0, ans=0.025 2024-09-17 11:08:23,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2024-09-17 11:08:44,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=208380.0, ans=0.2 2024-09-17 11:08:58,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=208420.0, ans=0.125 2024-09-17 11:09:01,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=208420.0, ans=0.125 2024-09-17 11:09:09,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.25 vs. limit=15.0 2024-09-17 11:09:24,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=15.0 2024-09-17 11:09:28,437 INFO [train.py:1198] (1/2) Epoch 12, batch 2350, loss[loss=0.2691, ctc_loss=0.1688, cr_loss=0.428, attn_decoder_loss=0.2707, over 29699.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1613, cr_loss=0.3996, attn_decoder_loss=0.2618, over 5804995.70 frames. ], batch size: 83, lr: 9.35e-03, grad_scale: 8.0 2024-09-17 11:09:40,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=208500.0, ans=0.0 2024-09-17 11:09:51,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=208540.0, ans=0.125 2024-09-17 11:10:38,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-09-17 11:10:43,832 INFO [train.py:1198] (1/2) Epoch 12, batch 2400, loss[loss=0.2525, ctc_loss=0.1489, cr_loss=0.3794, attn_decoder_loss=0.2556, over 29537.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1617, cr_loss=0.4003, attn_decoder_loss=0.2622, over 5809148.52 frames. ], batch size: 76, lr: 9.35e-03, grad_scale: 16.0 2024-09-17 11:10:49,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.246e+01 9.641e+01 1.033e+02 3.378e+02, threshold=1.928e+02, percent-clipped=1.0 2024-09-17 11:11:11,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2024-09-17 11:11:21,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=208780.0, ans=0.1 2024-09-17 11:11:28,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.59 vs. limit=10.0 2024-09-17 11:11:41,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-17 11:11:56,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=208860.0, ans=0.125 2024-09-17 11:12:02,489 INFO [train.py:1198] (1/2) Epoch 12, batch 2450, loss[loss=0.2594, ctc_loss=0.1653, cr_loss=0.4077, attn_decoder_loss=0.2608, over 29699.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1629, cr_loss=0.4018, attn_decoder_loss=0.2631, over 5786143.13 frames. ], batch size: 82, lr: 9.34e-03, grad_scale: 4.0 2024-09-17 11:12:11,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=208900.0, ans=0.2 2024-09-17 11:12:23,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=208940.0, ans=0.125 2024-09-17 11:12:28,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=208940.0, ans=0.125 2024-09-17 11:12:39,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=208980.0, ans=0.125 2024-09-17 11:12:40,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208980.0, ans=0.1 2024-09-17 11:13:14,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=209060.0, ans=0.125 2024-09-17 11:13:19,886 INFO [train.py:1198] (1/2) Epoch 12, batch 2500, loss[loss=0.271, ctc_loss=0.1696, cr_loss=0.4023, attn_decoder_loss=0.2734, over 29627.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1624, cr_loss=0.4015, attn_decoder_loss=0.2629, over 5795654.03 frames. ], batch size: 86, lr: 9.34e-03, grad_scale: 8.0 2024-09-17 11:13:25,834 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.198e+01 9.207e+01 9.738e+01 1.065e+02 1.820e+02, threshold=1.948e+02, percent-clipped=0.0 2024-09-17 11:13:29,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=209100.0, ans=0.05 2024-09-17 11:13:37,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.01 vs. limit=15.0 2024-09-17 11:13:38,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=209140.0, ans=0.125 2024-09-17 11:13:42,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1.whitening_limit, batch_count=209140.0, ans=10.0 2024-09-17 11:14:13,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=209220.0, ans=0.2 2024-09-17 11:14:24,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2024-09-17 11:14:36,042 INFO [train.py:1198] (1/2) Epoch 12, batch 2550, loss[loss=0.2285, ctc_loss=0.1363, cr_loss=0.3618, attn_decoder_loss=0.2307, over 29345.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1621, cr_loss=0.4012, attn_decoder_loss=0.2628, over 5798836.80 frames. ], batch size: 67, lr: 9.33e-03, grad_scale: 8.0 2024-09-17 11:14:39,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=209300.0, ans=0.125 2024-09-17 11:15:04,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2024-09-17 11:15:26,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=209420.0, ans=0.1 2024-09-17 11:15:40,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=209460.0, ans=0.125 2024-09-17 11:15:53,681 INFO [train.py:1198] (1/2) Epoch 12, batch 2600, loss[loss=0.2562, ctc_loss=0.1565, cr_loss=0.4053, attn_decoder_loss=0.2583, over 29457.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1624, cr_loss=0.402, attn_decoder_loss=0.2633, over 5795261.60 frames. ], batch size: 78, lr: 9.33e-03, grad_scale: 8.0 2024-09-17 11:16:00,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-09-17 11:16:01,132 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.249e+01 9.008e+01 9.501e+01 1.038e+02 1.745e+02, threshold=1.900e+02, percent-clipped=0.0 2024-09-17 11:16:01,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=209500.0, ans=0.025 2024-09-17 11:16:02,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=209500.0, ans=0.0 2024-09-17 11:16:04,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=209500.0, ans=0.0 2024-09-17 11:16:07,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=209540.0, ans=0.125 2024-09-17 11:16:26,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209580.0, ans=0.1 2024-09-17 11:16:56,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=209660.0, ans=0.125 2024-09-17 11:17:01,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=209660.0, ans=0.125 2024-09-17 11:17:04,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=209660.0, ans=0.0 2024-09-17 11:17:08,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=209660.0, ans=0.125 2024-09-17 11:17:11,201 INFO [train.py:1198] (1/2) Epoch 12, batch 2650, loss[loss=0.2726, ctc_loss=0.171, cr_loss=0.4089, attn_decoder_loss=0.2748, over 29166.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1623, cr_loss=0.4024, attn_decoder_loss=0.2636, over 5802144.00 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 8.0 2024-09-17 11:17:32,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=209740.0, ans=0.0 2024-09-17 11:17:34,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=209740.0, ans=0.125 2024-09-17 11:17:48,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209780.0, ans=0.1 2024-09-17 11:18:19,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=209860.0, ans=0.09899494936611666 2024-09-17 11:18:26,278 INFO [train.py:1198] (1/2) Epoch 12, batch 2700, loss[loss=0.2753, ctc_loss=0.1754, cr_loss=0.4322, attn_decoder_loss=0.2768, over 29522.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1624, cr_loss=0.402, attn_decoder_loss=0.2638, over 5797399.10 frames. ], batch size: 87, lr: 9.32e-03, grad_scale: 8.0 2024-09-17 11:18:35,137 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.890e+01 9.487e+01 1.014e+02 1.859e+02, threshold=1.897e+02, percent-clipped=0.0 2024-09-17 11:19:18,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.41 vs. limit=15.0 2024-09-17 11:19:18,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=210020.0, ans=0.125 2024-09-17 11:19:18,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=210020.0, ans=0.0 2024-09-17 11:19:41,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=210060.0, ans=0.04949747468305833 2024-09-17 11:19:44,379 INFO [train.py:1198] (1/2) Epoch 12, batch 2750, loss[loss=0.2427, ctc_loss=0.1403, cr_loss=0.363, attn_decoder_loss=0.246, over 29521.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1614, cr_loss=0.3996, attn_decoder_loss=0.2627, over 5796666.11 frames. ], batch size: 75, lr: 9.32e-03, grad_scale: 8.0 2024-09-17 11:19:59,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=210140.0, ans=0.125 2024-09-17 11:19:59,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=210140.0, ans=0.2 2024-09-17 11:20:12,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2024-09-17 11:20:16,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=210180.0, ans=0.0 2024-09-17 11:20:33,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=210220.0, ans=0.125 2024-09-17 11:20:42,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210220.0, ans=0.1 2024-09-17 11:20:47,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=210260.0, ans=0.0 2024-09-17 11:20:47,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-09-17 11:21:02,209 INFO [train.py:1198] (1/2) Epoch 12, batch 2800, loss[loss=0.2884, ctc_loss=0.21, cr_loss=0.4086, attn_decoder_loss=0.288, over 20433.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1615, cr_loss=0.3997, attn_decoder_loss=0.2627, over 5777806.46 frames. ], batch size: 209, lr: 9.31e-03, grad_scale: 16.0 2024-09-17 11:21:12,652 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.172e+01 9.480e+01 1.026e+02 1.256e+02 4.560e+02, threshold=2.052e+02, percent-clipped=3.0 2024-09-17 11:21:48,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=210420.0, ans=0.2 2024-09-17 11:22:06,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=210460.0, ans=0.125 2024-09-17 11:22:09,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=210460.0, ans=0.125 2024-09-17 11:22:12,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=210460.0, ans=0.2 2024-09-17 11:22:18,254 INFO [train.py:1198] (1/2) Epoch 12, batch 2850, loss[loss=0.2612, ctc_loss=0.1648, cr_loss=0.3998, attn_decoder_loss=0.263, over 29503.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1621, cr_loss=0.3997, attn_decoder_loss=0.2633, over 5763635.36 frames. ], batch size: 77, lr: 9.31e-03, grad_scale: 4.0 2024-09-17 11:22:30,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=210500.0, ans=0.0 2024-09-17 11:22:48,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.77 vs. limit=15.0 2024-09-17 11:23:01,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=210580.0, ans=0.025 2024-09-17 11:23:27,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=210660.0, ans=0.125 2024-09-17 11:23:27,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-09-17 11:23:36,113 INFO [train.py:1198] (1/2) Epoch 12, batch 2900, loss[loss=0.2595, ctc_loss=0.1615, cr_loss=0.4059, attn_decoder_loss=0.2613, over 29428.00 frames. ], tot_loss[loss=0.2622, ctc_loss=0.1625, cr_loss=0.4019, attn_decoder_loss=0.2644, over 5788588.29 frames. ], batch size: 79, lr: 9.30e-03, grad_scale: 8.0 2024-09-17 11:23:48,032 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.952e+01 8.957e+01 9.627e+01 1.010e+02 3.114e+02, threshold=1.925e+02, percent-clipped=2.0 2024-09-17 11:23:55,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=210740.0, ans=0.125 2024-09-17 11:23:58,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=210740.0, ans=0.0 2024-09-17 11:24:02,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=210740.0, ans=0.0 2024-09-17 11:24:21,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2024-09-17 11:24:36,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=22.5 2024-09-17 11:24:52,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=210900.0, ans=0.0 2024-09-17 11:24:53,861 INFO [train.py:1198] (1/2) Epoch 12, batch 2950, loss[loss=0.2489, ctc_loss=0.1566, cr_loss=0.3621, attn_decoder_loss=0.2511, over 29522.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1613, cr_loss=0.3996, attn_decoder_loss=0.2631, over 5781644.93 frames. ], batch size: 75, lr: 9.30e-03, grad_scale: 8.0 2024-09-17 11:25:13,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=210940.0, ans=0.0 2024-09-17 11:25:19,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=210940.0, ans=0.125 2024-09-17 11:25:53,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.25 vs. limit=22.5 2024-09-17 11:26:09,715 INFO [train.py:1198] (1/2) Epoch 12, batch 3000, loss[loss=0.2572, ctc_loss=0.1569, cr_loss=0.3814, attn_decoder_loss=0.2598, over 29742.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1613, cr_loss=0.3997, attn_decoder_loss=0.2629, over 5782120.12 frames. ], batch size: 81, lr: 9.29e-03, grad_scale: 8.0 2024-09-17 11:26:09,715 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 11:26:17,178 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5995, 4.2348, 4.0770, 3.9412], device='cuda:1') 2024-09-17 11:26:28,163 INFO [train.py:1230] (1/2) Epoch 12, validation: loss=0.2128, ctc_loss=0.04571, cr_loss=4.818e-15, attn_decoder_loss=0.2314, over 944034.00 frames. 2024-09-17 11:26:28,163 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 11:26:31,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=211100.0, ans=0.0 2024-09-17 11:26:42,643 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 9.212e+01 9.963e+01 1.087e+02 2.371e+02, threshold=1.993e+02, percent-clipped=1.0 2024-09-17 11:27:48,913 INFO [train.py:1198] (1/2) Epoch 12, batch 3050, loss[loss=0.2482, ctc_loss=0.1569, cr_loss=0.3884, attn_decoder_loss=0.2497, over 29541.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1618, cr_loss=0.4005, attn_decoder_loss=0.2634, over 5776117.06 frames. ], batch size: 76, lr: 9.29e-03, grad_scale: 8.0 2024-09-17 11:27:51,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-17 11:28:01,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=211300.0, ans=0.125 2024-09-17 11:28:20,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.11 vs. limit=12.0 2024-09-17 11:29:00,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=211460.0, ans=0.0 2024-09-17 11:29:02,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.54 vs. limit=10.0 2024-09-17 11:29:04,431 INFO [train.py:1198] (1/2) Epoch 12, batch 3100, loss[loss=0.2731, ctc_loss=0.17, cr_loss=0.3942, attn_decoder_loss=0.2758, over 29243.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1614, cr_loss=0.3997, attn_decoder_loss=0.2629, over 5776144.76 frames. ], batch size: 100, lr: 9.29e-03, grad_scale: 8.0 2024-09-17 11:29:12,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.59 vs. limit=10.0 2024-09-17 11:29:16,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.116e+01 9.262e+01 9.866e+01 1.070e+02 1.746e+02, threshold=1.973e+02, percent-clipped=0.0 2024-09-17 11:29:19,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=211540.0, ans=0.0 2024-09-17 11:29:21,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=211540.0, ans=10.0 2024-09-17 11:29:44,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=12.0 2024-09-17 11:30:02,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=211620.0, ans=0.125 2024-09-17 11:30:11,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=211660.0, ans=0.2 2024-09-17 11:30:19,854 INFO [train.py:1198] (1/2) Epoch 12, batch 3150, loss[loss=0.2863, ctc_loss=0.1785, cr_loss=0.4389, attn_decoder_loss=0.2885, over 28856.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1611, cr_loss=0.3996, attn_decoder_loss=0.2627, over 5782612.67 frames. ], batch size: 104, lr: 9.28e-03, grad_scale: 8.0 2024-09-17 11:30:54,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=211780.0, ans=0.0 2024-09-17 11:31:39,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=211900.0, ans=0.1 2024-09-17 11:31:40,397 INFO [train.py:1198] (1/2) Epoch 12, batch 3200, loss[loss=0.2481, ctc_loss=0.1451, cr_loss=0.3674, attn_decoder_loss=0.2514, over 29763.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1608, cr_loss=0.3992, attn_decoder_loss=0.2623, over 5792330.03 frames. ], batch size: 80, lr: 9.28e-03, grad_scale: 16.0 2024-09-17 11:31:46,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211900.0, ans=0.1 2024-09-17 11:31:53,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 9.023e+01 9.600e+01 1.061e+02 2.809e+02, threshold=1.920e+02, percent-clipped=1.0 2024-09-17 11:31:57,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=211940.0, ans=0.1 2024-09-17 11:32:03,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=211940.0, ans=0.125 2024-09-17 11:32:14,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=211980.0, ans=0.2 2024-09-17 11:32:35,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=212020.0, ans=0.0 2024-09-17 11:32:37,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=212020.0, ans=0.2 2024-09-17 11:32:41,794 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:32:44,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=212060.0, ans=0.05 2024-09-17 11:32:56,862 INFO [train.py:1198] (1/2) Epoch 12, batch 3250, loss[loss=0.2758, ctc_loss=0.1716, cr_loss=0.4231, attn_decoder_loss=0.278, over 29713.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1614, cr_loss=0.4004, attn_decoder_loss=0.263, over 5799306.85 frames. ], batch size: 84, lr: 9.27e-03, grad_scale: 8.0 2024-09-17 11:33:37,922 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:33:40,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=212220.0, ans=0.07 2024-09-17 11:33:46,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=212220.0, ans=0.025 2024-09-17 11:33:51,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=212220.0, ans=0.125 2024-09-17 11:34:11,996 INFO [train.py:1198] (1/2) Epoch 12, batch 3300, loss[loss=0.2714, ctc_loss=0.1656, cr_loss=0.3911, attn_decoder_loss=0.2745, over 28288.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.16, cr_loss=0.3985, attn_decoder_loss=0.2617, over 5797049.23 frames. ], batch size: 111, lr: 9.27e-03, grad_scale: 8.0 2024-09-17 11:34:20,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=212300.0, ans=0.0 2024-09-17 11:34:27,365 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 9.323e+01 1.013e+02 1.133e+02 3.364e+02, threshold=2.026e+02, percent-clipped=1.0 2024-09-17 11:35:04,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=212420.0, ans=0.125 2024-09-17 11:35:17,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=212460.0, ans=0.125 2024-09-17 11:35:32,394 INFO [train.py:1198] (1/2) Epoch 12, batch 3350, loss[loss=0.2691, ctc_loss=0.1703, cr_loss=0.4182, attn_decoder_loss=0.2708, over 28795.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1613, cr_loss=0.4004, attn_decoder_loss=0.2627, over 5772453.92 frames. ], batch size: 104, lr: 9.26e-03, grad_scale: 8.0 2024-09-17 11:35:44,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=212500.0, ans=0.09899494936611666 2024-09-17 11:36:13,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=212580.0, ans=0.0 2024-09-17 11:36:16,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=212620.0, ans=0.125 2024-09-17 11:36:18,314 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:36:28,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=212620.0, ans=0.125 2024-09-17 11:36:36,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=212660.0, ans=0.125 2024-09-17 11:36:40,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=212660.0, ans=0.125 2024-09-17 11:36:42,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=212660.0, ans=0.125 2024-09-17 11:36:48,155 INFO [train.py:1198] (1/2) Epoch 12, batch 3400, loss[loss=0.2337, ctc_loss=0.1425, cr_loss=0.3967, attn_decoder_loss=0.235, over 29365.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1614, cr_loss=0.4002, attn_decoder_loss=0.2626, over 5765405.97 frames. ], batch size: 67, lr: 9.26e-03, grad_scale: 8.0 2024-09-17 11:37:03,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.423e+01 1.002e+02 1.091e+02 2.670e+02, threshold=2.004e+02, percent-clipped=1.0 2024-09-17 11:37:34,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=212820.0, ans=0.1 2024-09-17 11:37:45,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=212820.0, ans=0.2 2024-09-17 11:38:04,056 INFO [train.py:1198] (1/2) Epoch 12, batch 3450, loss[loss=0.2628, ctc_loss=0.162, cr_loss=0.3858, attn_decoder_loss=0.2654, over 28190.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1616, cr_loss=0.4005, attn_decoder_loss=0.2628, over 5773187.69 frames. ], batch size: 111, lr: 9.26e-03, grad_scale: 8.0 2024-09-17 11:38:14,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=212900.0, ans=0.125 2024-09-17 11:38:38,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=212980.0, ans=0.125 2024-09-17 11:39:23,610 INFO [train.py:1198] (1/2) Epoch 12, batch 3500, loss[loss=0.2333, ctc_loss=0.1435, cr_loss=0.3823, attn_decoder_loss=0.2348, over 29326.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1607, cr_loss=0.3998, attn_decoder_loss=0.2618, over 5775698.62 frames. ], batch size: 71, lr: 9.25e-03, grad_scale: 8.0 2024-09-17 11:39:25,391 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:39:28,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=213100.0, ans=0.125 2024-09-17 11:39:40,416 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.771e+01 8.906e+01 9.633e+01 1.043e+02 3.728e+02, threshold=1.927e+02, percent-clipped=3.0 2024-09-17 11:39:44,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.81 vs. limit=15.0 2024-09-17 11:39:45,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-09-17 11:40:10,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=213220.0, ans=0.025 2024-09-17 11:40:10,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=213220.0, ans=0.125 2024-09-17 11:40:22,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2024-09-17 11:40:38,548 INFO [train.py:1198] (1/2) Epoch 12, batch 3550, loss[loss=0.2733, ctc_loss=0.1637, cr_loss=0.4232, attn_decoder_loss=0.2761, over 29707.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1604, cr_loss=0.3991, attn_decoder_loss=0.2619, over 5781422.21 frames. ], batch size: 89, lr: 9.25e-03, grad_scale: 8.0 2024-09-17 11:40:54,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=213340.0, ans=0.0 2024-09-17 11:40:56,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=12.0 2024-09-17 11:41:01,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.39 vs. limit=15.0 2024-09-17 11:41:03,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=213340.0, ans=0.0 2024-09-17 11:41:14,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=213380.0, ans=0.1 2024-09-17 11:41:17,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=213380.0, ans=0.125 2024-09-17 11:41:26,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=213420.0, ans=0.1 2024-09-17 11:41:37,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-09-17 11:41:52,788 INFO [train.py:1198] (1/2) Epoch 12, batch 3600, loss[loss=0.254, ctc_loss=0.1629, cr_loss=0.4137, attn_decoder_loss=0.255, over 29520.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1607, cr_loss=0.4003, attn_decoder_loss=0.2622, over 5790662.93 frames. ], batch size: 77, lr: 9.24e-03, grad_scale: 16.0 2024-09-17 11:42:03,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=213500.0, ans=0.2 2024-09-17 11:42:05,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=213500.0, ans=0.125 2024-09-17 11:42:10,827 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.945e+01 9.059e+01 9.779e+01 1.035e+02 3.079e+02, threshold=1.956e+02, percent-clipped=1.0 2024-09-17 11:42:15,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213540.0, ans=0.1 2024-09-17 11:42:27,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=213580.0, ans=0.125 2024-09-17 11:42:47,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=213620.0, ans=0.0 2024-09-17 11:43:02,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.06 vs. limit=10.0 2024-09-17 11:43:08,008 INFO [train.py:1198] (1/2) Epoch 12, batch 3650, loss[loss=0.2626, ctc_loss=0.159, cr_loss=0.4118, attn_decoder_loss=0.2649, over 29486.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1601, cr_loss=0.3991, attn_decoder_loss=0.2616, over 5793843.80 frames. ], batch size: 90, lr: 9.24e-03, grad_scale: 8.0 2024-09-17 11:43:30,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=213740.0, ans=0.0 2024-09-17 11:43:50,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=213780.0, ans=0.0 2024-09-17 11:43:54,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=213820.0, ans=0.1 2024-09-17 11:44:23,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=213900.0, ans=0.035 2024-09-17 11:44:24,706 INFO [train.py:1198] (1/2) Epoch 12, batch 3700, loss[loss=0.2672, ctc_loss=0.1574, cr_loss=0.3951, attn_decoder_loss=0.2706, over 29708.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1598, cr_loss=0.3993, attn_decoder_loss=0.2615, over 5803692.55 frames. ], batch size: 84, lr: 9.23e-03, grad_scale: 8.0 2024-09-17 11:44:33,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=213900.0, ans=0.125 2024-09-17 11:44:42,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.238e+01 9.737e+01 1.052e+02 3.934e+02, threshold=1.947e+02, percent-clipped=3.0 2024-09-17 11:44:50,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.51 vs. limit=22.5 2024-09-17 11:44:55,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-09-17 11:45:05,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-17 11:45:09,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=214020.0, ans=0.125 2024-09-17 11:45:12,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=214020.0, ans=0.125 2024-09-17 11:45:40,789 INFO [train.py:1198] (1/2) Epoch 12, batch 3750, loss[loss=0.2332, ctc_loss=0.1449, cr_loss=0.3689, attn_decoder_loss=0.2348, over 29340.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1598, cr_loss=0.3987, attn_decoder_loss=0.2612, over 5807747.07 frames. ], batch size: 67, lr: 9.23e-03, grad_scale: 8.0 2024-09-17 11:45:44,129 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:45:57,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=214140.0, ans=0.025 2024-09-17 11:45:59,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=214140.0, ans=0.125 2024-09-17 11:46:04,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=214140.0, ans=0.125 2024-09-17 11:46:09,562 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:46:55,356 INFO [train.py:1198] (1/2) Epoch 12, batch 3800, loss[loss=0.2722, ctc_loss=0.1624, cr_loss=0.4171, attn_decoder_loss=0.2751, over 29611.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1602, cr_loss=0.399, attn_decoder_loss=0.2612, over 5798421.24 frames. ], batch size: 86, lr: 9.23e-03, grad_scale: 8.0 2024-09-17 11:47:08,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=214340.0, ans=0.125 2024-09-17 11:47:11,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=214340.0, ans=0.125 2024-09-17 11:47:13,097 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 9.449e+01 1.046e+02 1.140e+02 2.045e+02, threshold=2.093e+02, percent-clipped=1.0 2024-09-17 11:47:37,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=214380.0, ans=0.125 2024-09-17 11:47:38,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=214420.0, ans=0.0 2024-09-17 11:47:39,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.33 vs. limit=22.5 2024-09-17 11:47:46,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=214420.0, ans=0.125 2024-09-17 11:47:57,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214460.0, ans=0.1 2024-09-17 11:48:06,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2024-09-17 11:48:09,978 INFO [train.py:1198] (1/2) Epoch 12, batch 3850, loss[loss=0.2794, ctc_loss=0.1724, cr_loss=0.4175, attn_decoder_loss=0.282, over 29256.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1597, cr_loss=0.3988, attn_decoder_loss=0.261, over 5812622.27 frames. ], batch size: 100, lr: 9.22e-03, grad_scale: 8.0 2024-09-17 11:48:14,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=214500.0, ans=0.07 2024-09-17 11:48:19,130 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:48:20,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=214500.0, ans=0.025 2024-09-17 11:48:21,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=214500.0, ans=0.0 2024-09-17 11:48:27,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214540.0, ans=0.1 2024-09-17 11:48:36,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214540.0, ans=0.1 2024-09-17 11:48:40,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=214580.0, ans=0.2 2024-09-17 11:48:49,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2024-09-17 11:49:06,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=214620.0, ans=0.125 2024-09-17 11:49:09,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2024-09-17 11:49:09,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=214660.0, ans=0.125 2024-09-17 11:49:20,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.96 vs. limit=12.0 2024-09-17 11:49:24,513 INFO [train.py:1198] (1/2) Epoch 12, batch 3900, loss[loss=0.2616, ctc_loss=0.153, cr_loss=0.3862, attn_decoder_loss=0.2651, over 29613.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1603, cr_loss=0.3998, attn_decoder_loss=0.2617, over 5816171.90 frames. ], batch size: 86, lr: 9.22e-03, grad_scale: 8.0 2024-09-17 11:49:42,146 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.696e+01 9.064e+01 9.520e+01 1.003e+02 3.590e+02, threshold=1.904e+02, percent-clipped=1.0 2024-09-17 11:49:46,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=214740.0, ans=0.125 2024-09-17 11:50:12,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-09-17 11:50:13,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=214820.0, ans=0.025 2024-09-17 11:50:14,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=214820.0, ans=0.125 2024-09-17 11:50:41,386 INFO [train.py:1198] (1/2) Epoch 12, batch 3950, loss[loss=0.2694, ctc_loss=0.166, cr_loss=0.4255, attn_decoder_loss=0.2715, over 29494.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1598, cr_loss=0.3991, attn_decoder_loss=0.2617, over 5835654.27 frames. ], batch size: 97, lr: 9.21e-03, grad_scale: 8.0 2024-09-17 11:51:20,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=214980.0, ans=0.0 2024-09-17 11:51:33,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=215020.0, ans=0.05 2024-09-17 11:51:36,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=215020.0, ans=0.2 2024-09-17 11:51:46,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=215060.0, ans=0.2 2024-09-17 11:51:54,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=215100.0, ans=0.2 2024-09-17 11:51:55,372 INFO [train.py:1198] (1/2) Epoch 12, batch 4000, loss[loss=0.2355, ctc_loss=0.1398, cr_loss=0.3703, attn_decoder_loss=0.2379, over 29524.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1602, cr_loss=0.3992, attn_decoder_loss=0.2618, over 5812530.41 frames. ], batch size: 74, lr: 9.21e-03, grad_scale: 16.0 2024-09-17 11:52:14,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.967e+01 9.592e+01 1.062e+02 2.028e+02, threshold=1.918e+02, percent-clipped=1.0 2024-09-17 11:52:18,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.91 vs. limit=15.0 2024-09-17 11:52:36,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-09-17 11:52:59,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=215260.0, ans=0.125 2024-09-17 11:53:10,007 INFO [train.py:1198] (1/2) Epoch 12, batch 4050, loss[loss=0.2956, ctc_loss=0.2162, cr_loss=0.4173, attn_decoder_loss=0.2951, over 19480.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1609, cr_loss=0.3998, attn_decoder_loss=0.2621, over 5796020.99 frames. ], batch size: 209, lr: 9.21e-03, grad_scale: 8.0 2024-09-17 11:53:54,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=215420.0, ans=0.025 2024-09-17 11:54:06,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=215420.0, ans=0.125 2024-09-17 11:54:12,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=215460.0, ans=0.125 2024-09-17 11:54:19,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-09-17 11:54:22,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=215500.0, ans=0.125 2024-09-17 11:54:23,756 INFO [train.py:1198] (1/2) Epoch 12, batch 4100, loss[loss=0.2677, ctc_loss=0.1634, cr_loss=0.4113, attn_decoder_loss=0.2702, over 29507.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1607, cr_loss=0.3995, attn_decoder_loss=0.2618, over 5791253.91 frames. ], batch size: 90, lr: 9.20e-03, grad_scale: 8.0 2024-09-17 11:54:27,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=12.0 2024-09-17 11:54:39,998 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:54:43,943 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.986e+01 9.324e+01 9.990e+01 1.134e+02 3.141e+02, threshold=1.998e+02, percent-clipped=1.0 2024-09-17 11:54:49,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.50 vs. limit=22.5 2024-09-17 11:55:02,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2024-09-17 11:55:05,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=215580.0, ans=0.0 2024-09-17 11:55:08,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=215620.0, ans=0.035 2024-09-17 11:55:24,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=215660.0, ans=0.125 2024-09-17 11:55:29,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=215660.0, ans=0.025 2024-09-17 11:55:39,595 INFO [train.py:1198] (1/2) Epoch 12, batch 4150, loss[loss=0.2541, ctc_loss=0.1503, cr_loss=0.3845, attn_decoder_loss=0.2571, over 29495.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1603, cr_loss=0.3993, attn_decoder_loss=0.2615, over 5796936.90 frames. ], batch size: 77, lr: 9.20e-03, grad_scale: 8.0 2024-09-17 11:55:42,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=215700.0, ans=0.05 2024-09-17 11:55:48,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=215700.0, ans=0.125 2024-09-17 11:56:00,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=215740.0, ans=0.0 2024-09-17 11:56:00,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=215740.0, ans=0.0 2024-09-17 11:56:09,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=215780.0, ans=0.125 2024-09-17 11:56:10,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=215780.0, ans=0.125 2024-09-17 11:56:11,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=215780.0, ans=6.0 2024-09-17 11:56:53,538 INFO [train.py:1198] (1/2) Epoch 12, batch 4200, loss[loss=0.289, ctc_loss=0.1906, cr_loss=0.4692, attn_decoder_loss=0.2895, over 29495.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1605, cr_loss=0.4, attn_decoder_loss=0.2619, over 5799446.96 frames. ], batch size: 90, lr: 9.19e-03, grad_scale: 8.0 2024-09-17 11:56:55,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=215900.0, ans=0.0 2024-09-17 11:57:12,887 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.094e+01 9.529e+01 1.014e+02 1.072e+02 1.789e+02, threshold=2.028e+02, percent-clipped=0.0 2024-09-17 11:57:31,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2024-09-17 11:57:31,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2024-09-17 11:58:07,691 INFO [train.py:1198] (1/2) Epoch 12, batch 4250, loss[loss=0.2407, ctc_loss=0.1335, cr_loss=0.3538, attn_decoder_loss=0.2448, over 29507.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1604, cr_loss=0.3998, attn_decoder_loss=0.262, over 5805042.59 frames. ], batch size: 74, lr: 9.19e-03, grad_scale: 8.0 2024-09-17 11:58:09,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=216100.0, ans=0.0 2024-09-17 11:58:12,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.99 vs. limit=10.0 2024-09-17 11:58:16,575 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:58:20,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.65 vs. limit=22.5 2024-09-17 11:58:25,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=216140.0, ans=0.125 2024-09-17 11:58:32,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216140.0, ans=0.1 2024-09-17 11:58:36,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=216140.0, ans=0.125 2024-09-17 11:58:37,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=216180.0, ans=0.125 2024-09-17 11:58:43,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216180.0, ans=0.1 2024-09-17 11:58:56,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=216220.0, ans=0.07 2024-09-17 11:59:17,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=216260.0, ans=0.125 2024-09-17 11:59:23,261 INFO [train.py:1198] (1/2) Epoch 12, batch 4300, loss[loss=0.2718, ctc_loss=0.1624, cr_loss=0.401, attn_decoder_loss=0.275, over 29509.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1604, cr_loss=0.4002, attn_decoder_loss=0.2623, over 5794235.36 frames. ], batch size: 87, lr: 9.18e-03, grad_scale: 8.0 2024-09-17 11:59:35,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=216300.0, ans=0.125 2024-09-17 11:59:41,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=216340.0, ans=0.025 2024-09-17 11:59:44,352 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.076e+01 9.375e+01 1.010e+02 1.083e+02 2.799e+02, threshold=2.019e+02, percent-clipped=3.0 2024-09-17 11:59:44,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=216340.0, ans=0.04949747468305833 2024-09-17 12:00:03,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=216380.0, ans=0.0 2024-09-17 12:00:10,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.79 vs. limit=10.0 2024-09-17 12:00:30,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=216460.0, ans=0.2 2024-09-17 12:00:33,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=216460.0, ans=0.2 2024-09-17 12:00:36,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=216500.0, ans=0.125 2024-09-17 12:00:37,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.06 vs. limit=10.0 2024-09-17 12:00:37,750 INFO [train.py:1198] (1/2) Epoch 12, batch 4350, loss[loss=0.2771, ctc_loss=0.1867, cr_loss=0.4129, attn_decoder_loss=0.2779, over 29493.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1636, cr_loss=0.4051, attn_decoder_loss=0.2659, over 5797085.60 frames. ], batch size: 97, lr: 9.18e-03, grad_scale: 4.0 2024-09-17 12:00:54,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=12.0 2024-09-17 12:00:59,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=216540.0, ans=0.0 2024-09-17 12:00:59,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=216540.0, ans=0.07 2024-09-17 12:01:07,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=216580.0, ans=0.95 2024-09-17 12:01:30,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=216620.0, ans=0.0 2024-09-17 12:01:36,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=216660.0, ans=0.125 2024-09-17 12:01:42,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=216660.0, ans=0.125 2024-09-17 12:01:45,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.75 vs. limit=22.5 2024-09-17 12:01:48,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216660.0, ans=0.1 2024-09-17 12:01:49,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=216660.0, ans=0.0 2024-09-17 12:01:51,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=216700.0, ans=0.5 2024-09-17 12:01:52,273 INFO [train.py:1198] (1/2) Epoch 12, batch 4400, loss[loss=0.2747, ctc_loss=0.1786, cr_loss=0.4298, attn_decoder_loss=0.2759, over 27373.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.1659, cr_loss=0.4079, attn_decoder_loss=0.2686, over 5766849.24 frames. ], batch size: 124, lr: 9.18e-03, grad_scale: 8.0 2024-09-17 12:02:12,849 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.351e+01 9.429e+01 9.897e+01 1.056e+02 1.811e+02, threshold=1.979e+02, percent-clipped=0.0 2024-09-17 12:02:20,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=216780.0, ans=0.0 2024-09-17 12:02:53,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=216860.0, ans=0.125 2024-09-17 12:03:06,659 INFO [train.py:1198] (1/2) Epoch 12, batch 4450, loss[loss=0.2897, ctc_loss=0.2179, cr_loss=0.4256, attn_decoder_loss=0.2882, over 19842.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.1711, cr_loss=0.4117, attn_decoder_loss=0.2712, over 5574045.14 frames. ], batch size: 209, lr: 9.17e-03, grad_scale: 8.0 2024-09-17 12:03:36,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=216940.0, ans=10.0 2024-09-17 12:03:47,556 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:03:54,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217020.0, ans=0.1 2024-09-17 12:04:05,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=217020.0, ans=0.125 2024-09-17 12:04:14,446 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:04:20,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=217060.0, ans=0.125 2024-09-17 12:04:23,109 INFO [train.py:1198] (1/2) Epoch 12, batch 4500, loss[loss=0.275, ctc_loss=0.1942, cr_loss=0.4173, attn_decoder_loss=0.2748, over 19774.00 frames. ], tot_loss[loss=0.2727, ctc_loss=0.1774, cr_loss=0.4146, attn_decoder_loss=0.2741, over 5232449.48 frames. ], batch size: 209, lr: 9.17e-03, grad_scale: 8.0 2024-09-17 12:04:28,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=217100.0, ans=0.0 2024-09-17 12:04:45,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.368e+01 1.035e+02 1.137e+02 1.264e+02 3.702e+02, threshold=2.273e+02, percent-clipped=1.0 2024-09-17 12:04:52,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=217180.0, ans=0.2 2024-09-17 12:05:52,067 INFO [train.py:1198] (1/2) Epoch 13, batch 0, loss[loss=0.2327, ctc_loss=0.1355, cr_loss=0.3654, attn_decoder_loss=0.2354, over 29606.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1355, cr_loss=0.3654, attn_decoder_loss=0.2354, over 29606.00 frames. ], batch size: 73, lr: 8.81e-03, grad_scale: 16.0 2024-09-17 12:05:52,067 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 12:06:10,486 INFO [train.py:1230] (1/2) Epoch 13, validation: loss=0.214, ctc_loss=0.04435, cr_loss=4.652e-15, attn_decoder_loss=0.2329, over 944034.00 frames. 2024-09-17 12:06:10,487 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 12:06:10,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=217200.0, ans=0.1 2024-09-17 12:06:10,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=217200.0, ans=0.05 2024-09-17 12:06:13,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=217200.0, ans=0.0 2024-09-17 12:06:33,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=217240.0, ans=0.0 2024-09-17 12:06:40,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.66 vs. limit=15.0 2024-09-17 12:06:40,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2024-09-17 12:06:42,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=217280.0, ans=0.07 2024-09-17 12:06:50,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=217280.0, ans=0.0 2024-09-17 12:07:18,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=217360.0, ans=0.125 2024-09-17 12:07:28,725 INFO [train.py:1198] (1/2) Epoch 13, batch 50, loss[loss=0.2308, ctc_loss=0.1392, cr_loss=0.3509, attn_decoder_loss=0.2331, over 29430.00 frames. ], tot_loss[loss=0.2619, ctc_loss=0.1626, cr_loss=0.402, attn_decoder_loss=0.264, over 1269455.44 frames. ], batch size: 70, lr: 8.80e-03, grad_scale: 8.0 2024-09-17 12:07:45,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=217440.0, ans=0.0 2024-09-17 12:07:56,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=217440.0, ans=0.035 2024-09-17 12:07:56,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=217440.0, ans=0.125 2024-09-17 12:07:59,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=217480.0, ans=0.0 2024-09-17 12:08:05,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=217480.0, ans=0.0 2024-09-17 12:08:08,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=217480.0, ans=0.125 2024-09-17 12:08:08,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=217480.0, ans=0.1 2024-09-17 12:08:19,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=217520.0, ans=0.0 2024-09-17 12:08:25,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=217520.0, ans=0.125 2024-09-17 12:08:25,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=217520.0, ans=0.125 2024-09-17 12:08:28,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=217560.0, ans=0.07 2024-09-17 12:08:30,967 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.263e+01 9.622e+01 1.023e+02 1.146e+02 3.788e+02, threshold=2.046e+02, percent-clipped=2.0 2024-09-17 12:08:31,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=217560.0, ans=0.125 2024-09-17 12:08:32,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=217560.0, ans=0.125 2024-09-17 12:08:36,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=217560.0, ans=0.125 2024-09-17 12:08:40,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=217560.0, ans=0.125 2024-09-17 12:08:45,176 INFO [train.py:1198] (1/2) Epoch 13, batch 100, loss[loss=0.2544, ctc_loss=0.1614, cr_loss=0.4145, attn_decoder_loss=0.2555, over 29537.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1644, cr_loss=0.4056, attn_decoder_loss=0.2659, over 2253512.68 frames. ], batch size: 76, lr: 8.80e-03, grad_scale: 8.0 2024-09-17 12:08:54,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2024-09-17 12:09:04,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=217640.0, ans=0.0 2024-09-17 12:09:20,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.02 vs. limit=15.0 2024-09-17 12:09:38,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.23 vs. limit=15.0 2024-09-17 12:09:39,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=217720.0, ans=0.125 2024-09-17 12:09:41,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=217720.0, ans=0.2 2024-09-17 12:09:51,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217760.0, ans=0.1 2024-09-17 12:09:53,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217760.0, ans=0.1 2024-09-17 12:10:01,870 INFO [train.py:1198] (1/2) Epoch 13, batch 150, loss[loss=0.2296, ctc_loss=0.1376, cr_loss=0.3644, attn_decoder_loss=0.2317, over 29447.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1607, cr_loss=0.3998, attn_decoder_loss=0.2627, over 3048473.81 frames. ], batch size: 70, lr: 8.80e-03, grad_scale: 8.0 2024-09-17 12:10:28,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=217840.0, ans=0.0 2024-09-17 12:10:32,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=217880.0, ans=0.5 2024-09-17 12:10:44,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=217880.0, ans=0.125 2024-09-17 12:10:57,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=217920.0, ans=0.07 2024-09-17 12:11:06,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.881e+01 9.835e+01 1.094e+02 1.657e+02, threshold=1.967e+02, percent-clipped=0.0 2024-09-17 12:11:06,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=217960.0, ans=0.05 2024-09-17 12:11:10,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.66 vs. limit=15.0 2024-09-17 12:11:15,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=217960.0, ans=0.09899494936611666 2024-09-17 12:11:20,034 INFO [train.py:1198] (1/2) Epoch 13, batch 200, loss[loss=0.2732, ctc_loss=0.1784, cr_loss=0.4234, attn_decoder_loss=0.2743, over 27241.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1599, cr_loss=0.3986, attn_decoder_loss=0.2616, over 3660599.48 frames. ], batch size: 124, lr: 8.79e-03, grad_scale: 8.0 2024-09-17 12:12:01,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2024-09-17 12:12:03,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=218120.0, ans=0.1 2024-09-17 12:12:07,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=218120.0, ans=0.125 2024-09-17 12:12:22,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=218160.0, ans=0.0 2024-09-17 12:12:28,480 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:12:35,455 INFO [train.py:1198] (1/2) Epoch 13, batch 250, loss[loss=0.2713, ctc_loss=0.1723, cr_loss=0.4219, attn_decoder_loss=0.2729, over 29276.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1591, cr_loss=0.3976, attn_decoder_loss=0.2612, over 4142991.08 frames. ], batch size: 100, lr: 8.79e-03, grad_scale: 8.0 2024-09-17 12:12:40,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=218200.0, ans=0.0 2024-09-17 12:12:44,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=218200.0, ans=0.025 2024-09-17 12:12:52,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=218240.0, ans=0.125 2024-09-17 12:13:03,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2024-09-17 12:13:11,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218280.0, ans=0.1 2024-09-17 12:13:17,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=218280.0, ans=0.125 2024-09-17 12:13:22,367 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:13:35,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=218320.0, ans=0.125 2024-09-17 12:13:39,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.019e+01 9.647e+01 1.091e+02 1.389e+02, threshold=1.929e+02, percent-clipped=0.0 2024-09-17 12:13:50,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=218360.0, ans=0.025 2024-09-17 12:13:53,952 INFO [train.py:1198] (1/2) Epoch 13, batch 300, loss[loss=0.272, ctc_loss=0.1708, cr_loss=0.4241, attn_decoder_loss=0.2739, over 29533.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1583, cr_loss=0.3965, attn_decoder_loss=0.2607, over 4511141.91 frames. ], batch size: 92, lr: 8.78e-03, grad_scale: 8.0 2024-09-17 12:14:34,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=218480.0, ans=0.125 2024-09-17 12:14:54,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=218520.0, ans=0.0 2024-09-17 12:15:11,632 INFO [train.py:1198] (1/2) Epoch 13, batch 350, loss[loss=0.2352, ctc_loss=0.1468, cr_loss=0.355, attn_decoder_loss=0.2371, over 29727.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.158, cr_loss=0.3962, attn_decoder_loss=0.2609, over 4796830.83 frames. ], batch size: 72, lr: 8.78e-03, grad_scale: 8.0 2024-09-17 12:15:25,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=218640.0, ans=0.0 2024-09-17 12:15:34,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=218640.0, ans=0.0 2024-09-17 12:15:43,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=218680.0, ans=0.125 2024-09-17 12:16:13,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 9.251e+01 9.818e+01 1.107e+02 7.103e+02, threshold=1.964e+02, percent-clipped=3.0 2024-09-17 12:16:27,028 INFO [train.py:1198] (1/2) Epoch 13, batch 400, loss[loss=0.2684, ctc_loss=0.1727, cr_loss=0.4217, attn_decoder_loss=0.2696, over 29705.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.158, cr_loss=0.3964, attn_decoder_loss=0.2608, over 5025993.88 frames. ], batch size: 82, lr: 8.78e-03, grad_scale: 16.0 2024-09-17 12:16:42,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=218840.0, ans=0.025 2024-09-17 12:16:50,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218840.0, ans=0.1 2024-09-17 12:17:00,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=218880.0, ans=0.0 2024-09-17 12:17:05,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=12.0 2024-09-17 12:17:07,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=218880.0, ans=0.125 2024-09-17 12:17:12,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=218880.0, ans=0.2 2024-09-17 12:17:22,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=218920.0, ans=0.025 2024-09-17 12:17:35,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=218960.0, ans=0.125 2024-09-17 12:17:45,472 INFO [train.py:1198] (1/2) Epoch 13, batch 450, loss[loss=0.2558, ctc_loss=0.1464, cr_loss=0.3739, attn_decoder_loss=0.2597, over 29694.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1581, cr_loss=0.3966, attn_decoder_loss=0.261, over 5188179.16 frames. ], batch size: 83, lr: 8.77e-03, grad_scale: 8.0 2024-09-17 12:17:51,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=219000.0, ans=0.0 2024-09-17 12:18:09,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-17 12:18:11,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=219040.0, ans=0.125 2024-09-17 12:18:11,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219040.0, ans=0.1 2024-09-17 12:18:17,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=219080.0, ans=0.0 2024-09-17 12:18:28,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2024-09-17 12:18:40,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=219120.0, ans=0.125 2024-09-17 12:18:51,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.793e+01 9.370e+01 9.843e+01 2.913e+02, threshold=1.874e+02, percent-clipped=1.0 2024-09-17 12:18:55,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=12.0 2024-09-17 12:19:00,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=219160.0, ans=0.2 2024-09-17 12:19:04,118 INFO [train.py:1198] (1/2) Epoch 13, batch 500, loss[loss=0.2789, ctc_loss=0.1741, cr_loss=0.4294, attn_decoder_loss=0.281, over 29420.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1579, cr_loss=0.3968, attn_decoder_loss=0.2605, over 5330793.16 frames. ], batch size: 94, lr: 8.77e-03, grad_scale: 8.0 2024-09-17 12:19:17,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.35 vs. limit=15.0 2024-09-17 12:19:38,368 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2024-09-17 12:19:40,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=219280.0, ans=0.2 2024-09-17 12:19:51,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=219320.0, ans=0.2 2024-09-17 12:19:52,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=219320.0, ans=0.0 2024-09-17 12:20:06,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219360.0, ans=0.1 2024-09-17 12:20:10,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=219360.0, ans=0.125 2024-09-17 12:20:19,690 INFO [train.py:1198] (1/2) Epoch 13, batch 550, loss[loss=0.2763, ctc_loss=0.1742, cr_loss=0.4368, attn_decoder_loss=0.2779, over 28715.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1578, cr_loss=0.3961, attn_decoder_loss=0.2604, over 5421569.77 frames. ], batch size: 104, lr: 8.76e-03, grad_scale: 8.0 2024-09-17 12:20:27,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=219400.0, ans=0.0 2024-09-17 12:20:33,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=219440.0, ans=0.125 2024-09-17 12:20:41,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=219440.0, ans=0.125 2024-09-17 12:21:03,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=219480.0, ans=0.0 2024-09-17 12:21:15,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2024-09-17 12:21:26,148 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 9.312e+01 1.008e+02 1.110e+02 1.901e+02, threshold=2.017e+02, percent-clipped=1.0 2024-09-17 12:21:29,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=219560.0, ans=0.125 2024-09-17 12:21:38,302 INFO [train.py:1198] (1/2) Epoch 13, batch 600, loss[loss=0.2726, ctc_loss=0.1699, cr_loss=0.4163, attn_decoder_loss=0.2748, over 29270.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1582, cr_loss=0.3966, attn_decoder_loss=0.2609, over 5509688.10 frames. ], batch size: 100, lr: 8.76e-03, grad_scale: 8.0 2024-09-17 12:21:40,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.34 vs. limit=22.5 2024-09-17 12:21:41,696 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:21:56,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=219640.0, ans=0.0 2024-09-17 12:22:14,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=219680.0, ans=0.125 2024-09-17 12:22:27,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2024-09-17 12:22:50,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=219760.0, ans=0.125 2024-09-17 12:22:52,055 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-09-17 12:22:55,880 INFO [train.py:1198] (1/2) Epoch 13, batch 650, loss[loss=0.2651, ctc_loss=0.1598, cr_loss=0.4057, attn_decoder_loss=0.2678, over 29756.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1574, cr_loss=0.3958, attn_decoder_loss=0.2602, over 5586851.08 frames. ], batch size: 81, lr: 8.76e-03, grad_scale: 8.0 2024-09-17 12:23:08,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.46 vs. limit=15.0 2024-09-17 12:23:09,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219840.0, ans=0.1 2024-09-17 12:23:35,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=219880.0, ans=0.0 2024-09-17 12:23:44,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=219920.0, ans=0.0 2024-09-17 12:23:47,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=219920.0, ans=0.125 2024-09-17 12:23:49,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=219920.0, ans=0.0 2024-09-17 12:23:50,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=219920.0, ans=0.125 2024-09-17 12:23:59,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 9.144e+01 9.957e+01 1.061e+02 1.597e+02, threshold=1.991e+02, percent-clipped=0.0 2024-09-17 12:24:12,257 INFO [train.py:1198] (1/2) Epoch 13, batch 700, loss[loss=0.2372, ctc_loss=0.1323, cr_loss=0.3502, attn_decoder_loss=0.2411, over 29522.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1575, cr_loss=0.3962, attn_decoder_loss=0.2606, over 5636808.14 frames. ], batch size: 76, lr: 8.75e-03, grad_scale: 8.0 2024-09-17 12:24:18,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=220000.0, ans=0.025 2024-09-17 12:24:18,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=220000.0, ans=0.07 2024-09-17 12:24:24,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220000.0, ans=0.1 2024-09-17 12:24:26,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=220040.0, ans=0.025 2024-09-17 12:24:46,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=220080.0, ans=0.125 2024-09-17 12:25:13,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=220160.0, ans=0.125 2024-09-17 12:25:16,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220160.0, ans=0.1 2024-09-17 12:25:19,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2024-09-17 12:25:30,052 INFO [train.py:1198] (1/2) Epoch 13, batch 750, loss[loss=0.2643, ctc_loss=0.16, cr_loss=0.4089, attn_decoder_loss=0.2668, over 29691.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1569, cr_loss=0.3954, attn_decoder_loss=0.2601, over 5676926.70 frames. ], batch size: 82, lr: 8.75e-03, grad_scale: 8.0 2024-09-17 12:25:34,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=220200.0, ans=0.125 2024-09-17 12:25:57,179 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:26:01,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=220280.0, ans=0.125 2024-09-17 12:26:06,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=220280.0, ans=0.125 2024-09-17 12:26:10,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220280.0, ans=0.1 2024-09-17 12:26:20,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=220320.0, ans=0.025 2024-09-17 12:26:33,425 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 9.372e+01 1.007e+02 1.108e+02 5.289e+02, threshold=2.013e+02, percent-clipped=1.0 2024-09-17 12:26:41,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220360.0, ans=0.1 2024-09-17 12:26:45,670 INFO [train.py:1198] (1/2) Epoch 13, batch 800, loss[loss=0.2442, ctc_loss=0.1458, cr_loss=0.3811, attn_decoder_loss=0.2467, over 29594.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.157, cr_loss=0.3953, attn_decoder_loss=0.2601, over 5707998.15 frames. ], batch size: 73, lr: 8.74e-03, grad_scale: 16.0 2024-09-17 12:26:55,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=220400.0, ans=0.2 2024-09-17 12:27:08,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=220440.0, ans=0.0 2024-09-17 12:27:11,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=220440.0, ans=0.2 2024-09-17 12:27:35,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.65 vs. limit=15.0 2024-09-17 12:28:03,607 INFO [train.py:1198] (1/2) Epoch 13, batch 850, loss[loss=0.2691, ctc_loss=0.171, cr_loss=0.407, attn_decoder_loss=0.2709, over 29700.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1567, cr_loss=0.3952, attn_decoder_loss=0.2598, over 5736145.88 frames. ], batch size: 89, lr: 8.74e-03, grad_scale: 8.0 2024-09-17 12:28:14,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=220600.0, ans=0.0 2024-09-17 12:28:22,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.35 vs. limit=12.0 2024-09-17 12:28:35,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=220680.0, ans=10.0 2024-09-17 12:29:07,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=220760.0, ans=0.0 2024-09-17 12:29:11,982 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.923e+01 9.332e+01 1.023e+02 2.147e+02, threshold=1.866e+02, percent-clipped=2.0 2024-09-17 12:29:12,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2024-09-17 12:29:19,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=220760.0, ans=0.0 2024-09-17 12:29:23,103 INFO [train.py:1198] (1/2) Epoch 13, batch 900, loss[loss=0.2386, ctc_loss=0.1353, cr_loss=0.3601, attn_decoder_loss=0.2421, over 29595.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1568, cr_loss=0.3957, attn_decoder_loss=0.26, over 5739537.39 frames. ], batch size: 73, lr: 8.74e-03, grad_scale: 8.0 2024-09-17 12:29:26,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-09-17 12:29:44,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=220840.0, ans=0.125 2024-09-17 12:30:01,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-09-17 12:30:04,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=220880.0, ans=0.125 2024-09-17 12:30:21,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.56 vs. limit=22.5 2024-09-17 12:30:26,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=220960.0, ans=0.0 2024-09-17 12:30:37,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=221000.0, ans=0.125 2024-09-17 12:30:38,532 INFO [train.py:1198] (1/2) Epoch 13, batch 950, loss[loss=0.2437, ctc_loss=0.1462, cr_loss=0.3814, attn_decoder_loss=0.2461, over 29515.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1575, cr_loss=0.3964, attn_decoder_loss=0.2604, over 5741610.33 frames. ], batch size: 74, lr: 8.73e-03, grad_scale: 8.0 2024-09-17 12:31:02,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=221040.0, ans=0.0 2024-09-17 12:31:25,482 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:31:46,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 9.884e+01 1.087e+02 1.225e+02 3.377e+02, threshold=2.174e+02, percent-clipped=3.0 2024-09-17 12:31:56,561 INFO [train.py:1198] (1/2) Epoch 13, batch 1000, loss[loss=0.2548, ctc_loss=0.155, cr_loss=0.3895, attn_decoder_loss=0.2572, over 29500.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1586, cr_loss=0.3976, attn_decoder_loss=0.2612, over 5734926.34 frames. ], batch size: 77, lr: 8.73e-03, grad_scale: 8.0 2024-09-17 12:32:05,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-17 12:32:09,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=221200.0, ans=0.125 2024-09-17 12:32:12,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.31 vs. limit=15.0 2024-09-17 12:33:00,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=221360.0, ans=0.1 2024-09-17 12:33:01,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=221360.0, ans=0.125 2024-09-17 12:33:10,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=221360.0, ans=0.07 2024-09-17 12:33:14,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.63 vs. limit=22.5 2024-09-17 12:33:15,015 INFO [train.py:1198] (1/2) Epoch 13, batch 1050, loss[loss=0.2667, ctc_loss=0.1655, cr_loss=0.4259, attn_decoder_loss=0.2685, over 29671.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1582, cr_loss=0.3974, attn_decoder_loss=0.2607, over 5743740.80 frames. ], batch size: 85, lr: 8.73e-03, grad_scale: 8.0 2024-09-17 12:33:21,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=221400.0, ans=22.5 2024-09-17 12:33:28,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.54 vs. limit=6.0 2024-09-17 12:33:30,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=221440.0, ans=0.125 2024-09-17 12:34:01,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=221520.0, ans=0.125 2024-09-17 12:34:02,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=221520.0, ans=15.0 2024-09-17 12:34:20,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.817e+01 9.337e+01 1.034e+02 1.952e+02, threshold=1.867e+02, percent-clipped=0.0 2024-09-17 12:34:22,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=221560.0, ans=0.0 2024-09-17 12:34:32,053 INFO [train.py:1198] (1/2) Epoch 13, batch 1100, loss[loss=0.2646, ctc_loss=0.165, cr_loss=0.4026, attn_decoder_loss=0.2667, over 29449.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1578, cr_loss=0.3967, attn_decoder_loss=0.2606, over 5755028.74 frames. ], batch size: 78, lr: 8.72e-03, grad_scale: 8.0 2024-09-17 12:34:32,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=221600.0, ans=0.125 2024-09-17 12:34:49,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=221640.0, ans=0.025 2024-09-17 12:35:23,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=221720.0, ans=0.1 2024-09-17 12:35:41,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=221760.0, ans=0.0 2024-09-17 12:35:49,959 INFO [train.py:1198] (1/2) Epoch 13, batch 1150, loss[loss=0.2523, ctc_loss=0.1604, cr_loss=0.4048, attn_decoder_loss=0.2535, over 29456.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1577, cr_loss=0.3966, attn_decoder_loss=0.2605, over 5753570.47 frames. ], batch size: 78, lr: 8.72e-03, grad_scale: 8.0 2024-09-17 12:36:18,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-17 12:36:22,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=221880.0, ans=0.0 2024-09-17 12:36:31,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.65 vs. limit=12.0 2024-09-17 12:36:32,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=221880.0, ans=0.125 2024-09-17 12:36:38,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=221920.0, ans=0.025 2024-09-17 12:36:39,719 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:36:51,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221960.0, ans=0.1 2024-09-17 12:36:57,520 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.374e+01 9.019e+01 9.917e+01 1.067e+02 1.578e+02, threshold=1.983e+02, percent-clipped=0.0 2024-09-17 12:37:07,992 INFO [train.py:1198] (1/2) Epoch 13, batch 1200, loss[loss=0.2639, ctc_loss=0.1541, cr_loss=0.3871, attn_decoder_loss=0.2675, over 29694.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1586, cr_loss=0.3981, attn_decoder_loss=0.2613, over 5746772.65 frames. ], batch size: 85, lr: 8.71e-03, grad_scale: 16.0 2024-09-17 12:37:11,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=222000.0, ans=0.125 2024-09-17 12:37:12,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=222000.0, ans=0.0 2024-09-17 12:37:37,495 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:37:39,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=222080.0, ans=0.125 2024-09-17 12:37:43,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=222080.0, ans=0.1 2024-09-17 12:37:49,917 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:37:50,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2024-09-17 12:37:57,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=222120.0, ans=0.025 2024-09-17 12:38:01,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=15.0 2024-09-17 12:38:03,470 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:38:08,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.46 vs. limit=22.5 2024-09-17 12:38:09,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=222160.0, ans=0.2 2024-09-17 12:38:24,362 INFO [train.py:1198] (1/2) Epoch 13, batch 1250, loss[loss=0.2716, ctc_loss=0.1616, cr_loss=0.3988, attn_decoder_loss=0.2749, over 29503.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1585, cr_loss=0.3988, attn_decoder_loss=0.2617, over 5773811.38 frames. ], batch size: 92, lr: 8.71e-03, grad_scale: 8.0 2024-09-17 12:38:29,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=222200.0, ans=0.125 2024-09-17 12:38:32,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=222200.0, ans=0.2 2024-09-17 12:38:35,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=222200.0, ans=0.125 2024-09-17 12:38:59,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.37 vs. limit=10.0 2024-09-17 12:39:12,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=222320.0, ans=0.125 2024-09-17 12:39:33,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 9.226e+01 9.923e+01 1.052e+02 2.205e+02, threshold=1.985e+02, percent-clipped=1.0 2024-09-17 12:39:42,944 INFO [train.py:1198] (1/2) Epoch 13, batch 1300, loss[loss=0.2685, ctc_loss=0.1608, cr_loss=0.4077, attn_decoder_loss=0.2714, over 28155.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1581, cr_loss=0.3981, attn_decoder_loss=0.2612, over 5779325.95 frames. ], batch size: 111, lr: 8.71e-03, grad_scale: 8.0 2024-09-17 12:39:46,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=12.0 2024-09-17 12:39:56,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222440.0, ans=0.125 2024-09-17 12:40:16,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=222480.0, ans=0.0 2024-09-17 12:40:46,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-09-17 12:41:00,772 INFO [train.py:1198] (1/2) Epoch 13, batch 1350, loss[loss=0.2656, ctc_loss=0.1621, cr_loss=0.423, attn_decoder_loss=0.2677, over 29741.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.1577, cr_loss=0.3979, attn_decoder_loss=0.2609, over 5795917.90 frames. ], batch size: 81, lr: 8.70e-03, grad_scale: 8.0 2024-09-17 12:41:10,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=222600.0, ans=0.125 2024-09-17 12:41:17,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=222640.0, ans=0.0 2024-09-17 12:41:50,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=222720.0, ans=0.125 2024-09-17 12:41:54,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=222720.0, ans=0.0 2024-09-17 12:42:06,531 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.825e+01 9.390e+01 1.007e+02 1.307e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-17 12:42:15,651 INFO [train.py:1198] (1/2) Epoch 13, batch 1400, loss[loss=0.2199, ctc_loss=0.1257, cr_loss=0.3415, attn_decoder_loss=0.2227, over 29568.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.1577, cr_loss=0.3978, attn_decoder_loss=0.2609, over 5807026.61 frames. ], batch size: 69, lr: 8.70e-03, grad_scale: 8.0 2024-09-17 12:42:30,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.75 vs. limit=15.0 2024-09-17 12:43:02,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.06 vs. limit=15.0 2024-09-17 12:43:06,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=222920.0, ans=0.07 2024-09-17 12:43:24,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=222960.0, ans=0.0 2024-09-17 12:43:33,422 INFO [train.py:1198] (1/2) Epoch 13, batch 1450, loss[loss=0.2685, ctc_loss=0.161, cr_loss=0.4115, attn_decoder_loss=0.2713, over 29462.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.158, cr_loss=0.398, attn_decoder_loss=0.2614, over 5803951.73 frames. ], batch size: 94, lr: 8.69e-03, grad_scale: 8.0 2024-09-17 12:43:48,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.34 vs. limit=10.0 2024-09-17 12:44:04,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=223080.0, ans=0.125 2024-09-17 12:44:04,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.56 vs. limit=22.5 2024-09-17 12:44:05,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=223080.0, ans=0.0 2024-09-17 12:44:06,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=12.0 2024-09-17 12:44:10,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=223080.0, ans=0.125 2024-09-17 12:44:20,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.36 vs. limit=15.0 2024-09-17 12:44:22,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=223120.0, ans=0.125 2024-09-17 12:44:38,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223160.0, ans=0.1 2024-09-17 12:44:42,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.179e+01 9.900e+01 1.065e+02 2.201e+02, threshold=1.980e+02, percent-clipped=1.0 2024-09-17 12:44:51,610 INFO [train.py:1198] (1/2) Epoch 13, batch 1500, loss[loss=0.2593, ctc_loss=0.152, cr_loss=0.3818, attn_decoder_loss=0.2628, over 29631.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.158, cr_loss=0.3981, attn_decoder_loss=0.2615, over 5805466.86 frames. ], batch size: 86, lr: 8.69e-03, grad_scale: 8.0 2024-09-17 12:45:02,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=223200.0, ans=0.025 2024-09-17 12:45:05,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=223240.0, ans=0.1 2024-09-17 12:45:13,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=223240.0, ans=0.2 2024-09-17 12:45:14,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=223240.0, ans=0.125 2024-09-17 12:45:19,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=223240.0, ans=0.0 2024-09-17 12:45:50,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=223320.0, ans=0.2 2024-09-17 12:45:50,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=223320.0, ans=0.125 2024-09-17 12:46:08,022 INFO [train.py:1198] (1/2) Epoch 13, batch 1550, loss[loss=0.2759, ctc_loss=0.1747, cr_loss=0.4344, attn_decoder_loss=0.2775, over 29523.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1585, cr_loss=0.3981, attn_decoder_loss=0.2616, over 5781103.71 frames. ], batch size: 90, lr: 8.69e-03, grad_scale: 8.0 2024-09-17 12:46:23,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=223440.0, ans=0.07 2024-09-17 12:47:07,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=223520.0, ans=0.125 2024-09-17 12:47:16,425 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.851e+01 1.181e+02 1.437e+02 2.605e+02, threshold=2.361e+02, percent-clipped=3.0 2024-09-17 12:47:25,506 INFO [train.py:1198] (1/2) Epoch 13, batch 1600, loss[loss=0.266, ctc_loss=0.1669, cr_loss=0.4064, attn_decoder_loss=0.268, over 29673.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1586, cr_loss=0.3975, attn_decoder_loss=0.2613, over 5763713.45 frames. ], batch size: 85, lr: 8.68e-03, grad_scale: 16.0 2024-09-17 12:47:36,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=223600.0, ans=0.125 2024-09-17 12:47:40,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=223640.0, ans=0.1 2024-09-17 12:47:42,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.98 vs. limit=15.0 2024-09-17 12:48:07,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=223680.0, ans=0.1 2024-09-17 12:48:15,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=223720.0, ans=0.0 2024-09-17 12:48:19,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=223720.0, ans=0.125 2024-09-17 12:48:24,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2024-09-17 12:48:31,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=223760.0, ans=0.05 2024-09-17 12:48:43,155 INFO [train.py:1198] (1/2) Epoch 13, batch 1650, loss[loss=0.2702, ctc_loss=0.1654, cr_loss=0.4052, attn_decoder_loss=0.2728, over 29700.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1586, cr_loss=0.3975, attn_decoder_loss=0.2613, over 5758682.43 frames. ], batch size: 89, lr: 8.68e-03, grad_scale: 8.0 2024-09-17 12:49:06,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=223840.0, ans=0.125 2024-09-17 12:49:47,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.46 vs. limit=12.0 2024-09-17 12:49:51,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.178e+01 9.964e+01 1.088e+02 2.882e+02, threshold=1.993e+02, percent-clipped=2.0 2024-09-17 12:49:53,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=223960.0, ans=0.125 2024-09-17 12:49:55,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2024-09-17 12:50:06,240 INFO [train.py:1198] (1/2) Epoch 13, batch 1700, loss[loss=0.2273, ctc_loss=0.127, cr_loss=0.3452, attn_decoder_loss=0.2307, over 29579.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1582, cr_loss=0.3978, attn_decoder_loss=0.2613, over 5779695.44 frames. ], batch size: 69, lr: 8.68e-03, grad_scale: 8.0 2024-09-17 12:50:08,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=224000.0, ans=0.125 2024-09-17 12:50:14,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=224000.0, ans=0.025 2024-09-17 12:50:32,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=224040.0, ans=0.025 2024-09-17 12:51:00,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.54 vs. limit=15.0 2024-09-17 12:51:01,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=224120.0, ans=0.125 2024-09-17 12:51:06,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=224120.0, ans=0.2 2024-09-17 12:51:24,022 INFO [train.py:1198] (1/2) Epoch 13, batch 1750, loss[loss=0.2257, ctc_loss=0.1274, cr_loss=0.3593, attn_decoder_loss=0.2287, over 29386.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1574, cr_loss=0.3972, attn_decoder_loss=0.2605, over 5788003.00 frames. ], batch size: 67, lr: 8.67e-03, grad_scale: 8.0 2024-09-17 12:51:24,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=224200.0, ans=0.125 2024-09-17 12:51:27,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=224200.0, ans=0.2 2024-09-17 12:51:51,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=224240.0, ans=0.2 2024-09-17 12:52:00,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=224280.0, ans=0.09899494936611666 2024-09-17 12:52:28,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=224360.0, ans=0.125 2024-09-17 12:52:29,892 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:52:33,954 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.908e+01 9.548e+01 1.035e+02 2.424e+02, threshold=1.910e+02, percent-clipped=1.0 2024-09-17 12:52:41,366 INFO [train.py:1198] (1/2) Epoch 13, batch 1800, loss[loss=0.2646, ctc_loss=0.1599, cr_loss=0.4176, attn_decoder_loss=0.2669, over 29704.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1579, cr_loss=0.3976, attn_decoder_loss=0.2607, over 5790992.05 frames. ], batch size: 83, lr: 8.67e-03, grad_scale: 8.0 2024-09-17 12:52:44,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=224400.0, ans=0.125 2024-09-17 12:52:46,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=224400.0, ans=0.2 2024-09-17 12:52:53,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=224400.0, ans=0.0 2024-09-17 12:53:01,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=224440.0, ans=0.1 2024-09-17 12:53:18,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=224480.0, ans=0.2 2024-09-17 12:53:22,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=224480.0, ans=0.125 2024-09-17 12:53:41,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=224560.0, ans=0.0 2024-09-17 12:53:47,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224560.0, ans=0.1 2024-09-17 12:53:56,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=224600.0, ans=0.125 2024-09-17 12:53:57,458 INFO [train.py:1198] (1/2) Epoch 13, batch 1850, loss[loss=0.2721, ctc_loss=0.162, cr_loss=0.3828, attn_decoder_loss=0.2758, over 29638.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1575, cr_loss=0.3967, attn_decoder_loss=0.2606, over 5796179.56 frames. ], batch size: 86, lr: 8.66e-03, grad_scale: 8.0 2024-09-17 12:54:26,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=224680.0, ans=0.125 2024-09-17 12:54:39,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=224680.0, ans=0.0 2024-09-17 12:54:40,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=224680.0, ans=0.0 2024-09-17 12:54:51,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=224720.0, ans=0.125 2024-09-17 12:55:03,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.97 vs. limit=15.0 2024-09-17 12:55:06,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=224760.0, ans=0.125 2024-09-17 12:55:07,361 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.966e+01 9.738e+01 1.037e+02 3.444e+02, threshold=1.948e+02, percent-clipped=2.0 2024-09-17 12:55:15,194 INFO [train.py:1198] (1/2) Epoch 13, batch 1900, loss[loss=0.2627, ctc_loss=0.1533, cr_loss=0.4052, attn_decoder_loss=0.2658, over 29706.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.1575, cr_loss=0.397, attn_decoder_loss=0.2609, over 5804219.42 frames. ], batch size: 89, lr: 8.66e-03, grad_scale: 8.0 2024-09-17 12:55:23,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=224800.0, ans=0.05 2024-09-17 12:55:38,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=224840.0, ans=0.125 2024-09-17 12:56:03,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=224920.0, ans=0.0 2024-09-17 12:56:10,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=224920.0, ans=0.125 2024-09-17 12:56:13,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=224920.0, ans=0.2 2024-09-17 12:56:16,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=224960.0, ans=0.125 2024-09-17 12:56:18,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=224960.0, ans=0.0 2024-09-17 12:56:22,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=224960.0, ans=0.125 2024-09-17 12:56:22,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=224960.0, ans=0.125 2024-09-17 12:56:24,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=224960.0, ans=0.125 2024-09-17 12:56:27,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-09-17 12:56:30,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=224960.0, ans=0.125 2024-09-17 12:56:33,260 INFO [train.py:1198] (1/2) Epoch 13, batch 1950, loss[loss=0.2456, ctc_loss=0.1381, cr_loss=0.3834, attn_decoder_loss=0.249, over 29447.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1579, cr_loss=0.398, attn_decoder_loss=0.2618, over 5819304.13 frames. ], batch size: 78, lr: 8.66e-03, grad_scale: 8.0 2024-09-17 12:57:28,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.29 vs. limit=22.5 2024-09-17 12:57:40,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 9.122e+01 9.575e+01 1.024e+02 4.346e+02, threshold=1.915e+02, percent-clipped=1.0 2024-09-17 12:57:44,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.38 vs. limit=22.5 2024-09-17 12:57:48,517 INFO [train.py:1198] (1/2) Epoch 13, batch 2000, loss[loss=0.2306, ctc_loss=0.1423, cr_loss=0.379, attn_decoder_loss=0.2319, over 29316.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1587, cr_loss=0.3987, attn_decoder_loss=0.2624, over 5797313.04 frames. ], batch size: 67, lr: 8.65e-03, grad_scale: 16.0 2024-09-17 12:57:57,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2024-09-17 12:58:06,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.67 vs. limit=15.0 2024-09-17 12:58:08,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=225240.0, ans=0.1 2024-09-17 12:58:10,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=225240.0, ans=0.125 2024-09-17 12:58:35,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=225320.0, ans=0.07 2024-09-17 12:58:37,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.24 vs. limit=12.0 2024-09-17 12:59:06,706 INFO [train.py:1198] (1/2) Epoch 13, batch 2050, loss[loss=0.2222, ctc_loss=0.1263, cr_loss=0.3566, attn_decoder_loss=0.225, over 29418.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1581, cr_loss=0.3973, attn_decoder_loss=0.2613, over 5788292.74 frames. ], batch size: 70, lr: 8.65e-03, grad_scale: 8.0 2024-09-17 12:59:07,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=225400.0, ans=0.125 2024-09-17 12:59:20,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=225440.0, ans=0.0 2024-09-17 12:59:25,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=225440.0, ans=0.125 2024-09-17 12:59:32,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=225440.0, ans=0.125 2024-09-17 12:59:54,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=225520.0, ans=0.125 2024-09-17 13:00:08,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-09-17 13:00:18,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 8.858e+01 9.418e+01 1.005e+02 1.765e+02, threshold=1.884e+02, percent-clipped=0.0 2024-09-17 13:00:24,866 INFO [train.py:1198] (1/2) Epoch 13, batch 2100, loss[loss=0.2552, ctc_loss=0.1526, cr_loss=0.3895, attn_decoder_loss=0.2579, over 29759.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1573, cr_loss=0.3963, attn_decoder_loss=0.2608, over 5799132.78 frames. ], batch size: 81, lr: 8.65e-03, grad_scale: 8.0 2024-09-17 13:00:38,736 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:00:43,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=225640.0, ans=0.125 2024-09-17 13:00:54,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.23 vs. limit=10.0 2024-09-17 13:01:18,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-09-17 13:01:40,012 INFO [train.py:1198] (1/2) Epoch 13, batch 2150, loss[loss=0.2649, ctc_loss=0.1624, cr_loss=0.4221, attn_decoder_loss=0.2669, over 29439.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1565, cr_loss=0.3956, attn_decoder_loss=0.2599, over 5813543.95 frames. ], batch size: 78, lr: 8.64e-03, grad_scale: 8.0 2024-09-17 13:01:40,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=225800.0, ans=0.125 2024-09-17 13:01:59,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=225840.0, ans=0.0 2024-09-17 13:02:04,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=225840.0, ans=0.0 2024-09-17 13:02:04,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225840.0, ans=0.1 2024-09-17 13:02:16,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=225880.0, ans=0.125 2024-09-17 13:02:19,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=225880.0, ans=0.125 2024-09-17 13:02:51,961 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.903e+01 9.039e+01 9.591e+01 1.017e+02 1.428e+02, threshold=1.918e+02, percent-clipped=0.0 2024-09-17 13:02:55,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=225960.0, ans=0.0 2024-09-17 13:02:58,186 INFO [train.py:1198] (1/2) Epoch 13, batch 2200, loss[loss=0.2731, ctc_loss=0.161, cr_loss=0.3996, attn_decoder_loss=0.2766, over 29622.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1571, cr_loss=0.3972, attn_decoder_loss=0.2604, over 5810366.39 frames. ], batch size: 86, lr: 8.64e-03, grad_scale: 8.0 2024-09-17 13:03:30,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=226080.0, ans=0.2 2024-09-17 13:03:37,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=226080.0, ans=0.0 2024-09-17 13:03:43,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=226080.0, ans=0.125 2024-09-17 13:03:49,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-09-17 13:03:53,055 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2024-09-17 13:04:16,309 INFO [train.py:1198] (1/2) Epoch 13, batch 2250, loss[loss=0.2555, ctc_loss=0.1452, cr_loss=0.3915, attn_decoder_loss=0.2591, over 29690.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1565, cr_loss=0.3961, attn_decoder_loss=0.2599, over 5810566.82 frames. ], batch size: 82, lr: 8.63e-03, grad_scale: 4.0 2024-09-17 13:04:32,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=226240.0, ans=0.125 2024-09-17 13:05:03,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=226320.0, ans=0.0 2024-09-17 13:05:05,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2024-09-17 13:05:08,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.96 vs. limit=12.0 2024-09-17 13:05:27,515 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 9.102e+01 9.665e+01 1.015e+02 1.637e+02, threshold=1.933e+02, percent-clipped=0.0 2024-09-17 13:05:32,603 INFO [train.py:1198] (1/2) Epoch 13, batch 2300, loss[loss=0.2418, ctc_loss=0.14, cr_loss=0.3822, attn_decoder_loss=0.2446, over 29315.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1555, cr_loss=0.3937, attn_decoder_loss=0.2589, over 5798540.28 frames. ], batch size: 71, lr: 8.63e-03, grad_scale: 8.0 2024-09-17 13:05:37,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226400.0, ans=0.1 2024-09-17 13:05:45,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-09-17 13:05:51,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-09-17 13:06:06,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-09-17 13:06:21,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=226520.0, ans=0.125 2024-09-17 13:06:50,457 INFO [train.py:1198] (1/2) Epoch 13, batch 2350, loss[loss=0.2668, ctc_loss=0.1582, cr_loss=0.4092, attn_decoder_loss=0.2698, over 29676.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1559, cr_loss=0.3949, attn_decoder_loss=0.2592, over 5804260.51 frames. ], batch size: 83, lr: 8.63e-03, grad_scale: 8.0 2024-09-17 13:07:01,298 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:07:04,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=226640.0, ans=0.1 2024-09-17 13:07:22,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=226680.0, ans=0.025 2024-09-17 13:07:23,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.36 vs. limit=22.5 2024-09-17 13:07:31,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=226680.0, ans=0.125 2024-09-17 13:07:37,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=226720.0, ans=22.5 2024-09-17 13:07:47,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=226720.0, ans=0.09899494936611666 2024-09-17 13:07:48,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-09-17 13:07:57,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.82 vs. limit=12.0 2024-09-17 13:07:59,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226760.0, ans=0.1 2024-09-17 13:08:05,465 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.113e+01 9.470e+01 1.028e+02 1.156e+02 2.779e+02, threshold=2.056e+02, percent-clipped=1.0 2024-09-17 13:08:08,453 INFO [train.py:1198] (1/2) Epoch 13, batch 2400, loss[loss=0.2542, ctc_loss=0.1546, cr_loss=0.4114, attn_decoder_loss=0.2561, over 29519.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1563, cr_loss=0.3954, attn_decoder_loss=0.2597, over 5807708.35 frames. ], batch size: 76, lr: 8.62e-03, grad_scale: 8.0 2024-09-17 13:08:13,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=226800.0, ans=0.2 2024-09-17 13:08:13,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=226800.0, ans=0.2 2024-09-17 13:08:22,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=226840.0, ans=0.0 2024-09-17 13:09:03,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=226920.0, ans=0.125 2024-09-17 13:09:24,020 INFO [train.py:1198] (1/2) Epoch 13, batch 2450, loss[loss=0.2572, ctc_loss=0.1531, cr_loss=0.4071, attn_decoder_loss=0.2597, over 29722.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1572, cr_loss=0.3965, attn_decoder_loss=0.2608, over 5786042.02 frames. ], batch size: 82, lr: 8.62e-03, grad_scale: 8.0 2024-09-17 13:09:24,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=227000.0, ans=0.1 2024-09-17 13:09:33,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227000.0, ans=0.1 2024-09-17 13:09:50,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2024-09-17 13:10:00,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=227080.0, ans=0.0 2024-09-17 13:10:28,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=227160.0, ans=0.125 2024-09-17 13:10:35,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=227160.0, ans=0.0 2024-09-17 13:10:38,635 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 9.238e+01 9.776e+01 1.100e+02 2.445e+02, threshold=1.955e+02, percent-clipped=1.0 2024-09-17 13:10:42,075 INFO [train.py:1198] (1/2) Epoch 13, batch 2500, loss[loss=0.2744, ctc_loss=0.1698, cr_loss=0.4093, attn_decoder_loss=0.2769, over 29637.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1571, cr_loss=0.3965, attn_decoder_loss=0.2608, over 5797056.45 frames. ], batch size: 86, lr: 8.62e-03, grad_scale: 8.0 2024-09-17 13:11:11,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=227280.0, ans=22.5 2024-09-17 13:11:14,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=227280.0, ans=0.2 2024-09-17 13:11:15,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=227280.0, ans=0.125 2024-09-17 13:11:20,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=227280.0, ans=0.0 2024-09-17 13:11:21,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227280.0, ans=0.1 2024-09-17 13:11:27,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.15 vs. limit=15.0 2024-09-17 13:11:40,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2024-09-17 13:11:47,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=227360.0, ans=0.2 2024-09-17 13:11:50,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2024-09-17 13:12:00,454 INFO [train.py:1198] (1/2) Epoch 13, batch 2550, loss[loss=0.2309, ctc_loss=0.14, cr_loss=0.3715, attn_decoder_loss=0.2328, over 29358.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1569, cr_loss=0.3965, attn_decoder_loss=0.2605, over 5799954.30 frames. ], batch size: 67, lr: 8.61e-03, grad_scale: 8.0 2024-09-17 13:12:05,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=227400.0, ans=0.125 2024-09-17 13:12:17,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=227440.0, ans=0.125 2024-09-17 13:12:20,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=227440.0, ans=0.125 2024-09-17 13:12:29,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.64 vs. limit=22.5 2024-09-17 13:12:32,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=227480.0, ans=0.0 2024-09-17 13:13:02,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=227560.0, ans=0.2 2024-09-17 13:13:08,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=227560.0, ans=0.0 2024-09-17 13:13:13,208 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 9.236e+01 9.928e+01 1.060e+02 5.337e+02, threshold=1.986e+02, percent-clipped=3.0 2024-09-17 13:13:16,294 INFO [train.py:1198] (1/2) Epoch 13, batch 2600, loss[loss=0.2476, ctc_loss=0.1391, cr_loss=0.374, attn_decoder_loss=0.2514, over 29462.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1576, cr_loss=0.3977, attn_decoder_loss=0.2611, over 5796253.99 frames. ], batch size: 78, lr: 8.61e-03, grad_scale: 8.0 2024-09-17 13:13:18,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=227600.0, ans=0.1 2024-09-17 13:13:24,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=227600.0, ans=0.0 2024-09-17 13:14:23,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=227760.0, ans=0.125 2024-09-17 13:14:25,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227760.0, ans=0.1 2024-09-17 13:14:34,056 INFO [train.py:1198] (1/2) Epoch 13, batch 2650, loss[loss=0.2785, ctc_loss=0.177, cr_loss=0.4346, attn_decoder_loss=0.2801, over 29254.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.158, cr_loss=0.3984, attn_decoder_loss=0.2614, over 5801977.72 frames. ], batch size: 100, lr: 8.60e-03, grad_scale: 8.0 2024-09-17 13:14:37,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227800.0, ans=0.1 2024-09-17 13:14:41,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-09-17 13:14:57,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=227840.0, ans=0.125 2024-09-17 13:15:12,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=227880.0, ans=0.2 2024-09-17 13:15:21,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=227920.0, ans=0.04949747468305833 2024-09-17 13:15:48,539 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.244e+01 9.728e+01 1.060e+02 3.050e+02, threshold=1.946e+02, percent-clipped=2.0 2024-09-17 13:15:52,088 INFO [train.py:1198] (1/2) Epoch 13, batch 2700, loss[loss=0.2702, ctc_loss=0.1657, cr_loss=0.4181, attn_decoder_loss=0.2725, over 29554.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1588, cr_loss=0.3998, attn_decoder_loss=0.262, over 5797921.46 frames. ], batch size: 87, lr: 8.60e-03, grad_scale: 8.0 2024-09-17 13:16:07,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=228040.0, ans=0.125 2024-09-17 13:16:33,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=228080.0, ans=0.0 2024-09-17 13:16:36,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=228120.0, ans=0.2 2024-09-17 13:16:37,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=228120.0, ans=0.2 2024-09-17 13:17:07,972 INFO [train.py:1198] (1/2) Epoch 13, batch 2750, loss[loss=0.2481, ctc_loss=0.1513, cr_loss=0.3972, attn_decoder_loss=0.25, over 29508.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1581, cr_loss=0.3986, attn_decoder_loss=0.2609, over 5796265.64 frames. ], batch size: 75, lr: 8.60e-03, grad_scale: 8.0 2024-09-17 13:17:14,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=228200.0, ans=0.125 2024-09-17 13:18:18,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=228360.0, ans=0.125 2024-09-17 13:18:23,265 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.445e+01 9.348e+01 1.004e+02 1.120e+02 2.904e+02, threshold=2.008e+02, percent-clipped=2.0 2024-09-17 13:18:26,299 INFO [train.py:1198] (1/2) Epoch 13, batch 2800, loss[loss=0.2959, ctc_loss=0.2215, cr_loss=0.4484, attn_decoder_loss=0.2942, over 20531.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1589, cr_loss=0.3996, attn_decoder_loss=0.2613, over 5776423.32 frames. ], batch size: 209, lr: 8.59e-03, grad_scale: 16.0 2024-09-17 13:18:28,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=16.58 vs. limit=15.0 2024-09-17 13:18:41,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.84 vs. limit=10.0 2024-09-17 13:18:43,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=228440.0, ans=0.125 2024-09-17 13:19:14,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-17 13:19:21,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=228520.0, ans=0.0 2024-09-17 13:19:40,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.68 vs. limit=22.5 2024-09-17 13:19:44,131 INFO [train.py:1198] (1/2) Epoch 13, batch 2850, loss[loss=0.2531, ctc_loss=0.1505, cr_loss=0.3772, attn_decoder_loss=0.2561, over 29517.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1591, cr_loss=0.3996, attn_decoder_loss=0.2615, over 5762735.04 frames. ], batch size: 77, lr: 8.59e-03, grad_scale: 8.0 2024-09-17 13:19:50,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=228600.0, ans=0.05 2024-09-17 13:19:51,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=228600.0, ans=0.04949747468305833 2024-09-17 13:20:11,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=228640.0, ans=0.125 2024-09-17 13:20:18,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=228680.0, ans=0.1 2024-09-17 13:20:27,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=228680.0, ans=0.025 2024-09-17 13:20:30,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=228720.0, ans=0.07 2024-09-17 13:20:33,407 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2024-09-17 13:20:39,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.41 vs. limit=15.0 2024-09-17 13:21:00,199 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.259e+01 1.060e+02 1.394e+02 3.143e+02, threshold=2.120e+02, percent-clipped=6.0 2024-09-17 13:21:00,229 INFO [train.py:1198] (1/2) Epoch 13, batch 2900, loss[loss=0.2497, ctc_loss=0.1547, cr_loss=0.3689, attn_decoder_loss=0.252, over 29428.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1592, cr_loss=0.4007, attn_decoder_loss=0.2622, over 5787302.45 frames. ], batch size: 79, lr: 8.59e-03, grad_scale: 8.0 2024-09-17 13:21:02,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228800.0, ans=0.1 2024-09-17 13:21:26,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.39 vs. limit=15.0 2024-09-17 13:21:40,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=228880.0, ans=0.125 2024-09-17 13:21:46,283 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:21:56,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2024-09-17 13:21:56,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=228920.0, ans=0.2 2024-09-17 13:22:07,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=228960.0, ans=0.125 2024-09-17 13:22:09,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=228960.0, ans=0.2 2024-09-17 13:22:18,345 INFO [train.py:1198] (1/2) Epoch 13, batch 2950, loss[loss=0.2451, ctc_loss=0.1546, cr_loss=0.4071, attn_decoder_loss=0.2461, over 29519.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1579, cr_loss=0.3986, attn_decoder_loss=0.2608, over 5781821.85 frames. ], batch size: 75, lr: 8.58e-03, grad_scale: 4.0 2024-09-17 13:22:36,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=229040.0, ans=0.0 2024-09-17 13:22:38,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=229040.0, ans=0.125 2024-09-17 13:22:41,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=229040.0, ans=0.0 2024-09-17 13:22:47,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=229080.0, ans=0.125 2024-09-17 13:23:25,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-09-17 13:23:36,402 INFO [train.py:1198] (1/2) Epoch 13, batch 3000, loss[loss=0.2528, ctc_loss=0.1598, cr_loss=0.4145, attn_decoder_loss=0.2539, over 29770.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1581, cr_loss=0.3988, attn_decoder_loss=0.2609, over 5783585.91 frames. ], batch size: 81, lr: 8.58e-03, grad_scale: 8.0 2024-09-17 13:23:36,402 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 13:23:54,825 INFO [train.py:1230] (1/2) Epoch 13, validation: loss=0.212, ctc_loss=0.04384, cr_loss=4.97e-15, attn_decoder_loss=0.2307, over 944034.00 frames. 2024-09-17 13:23:54,825 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 13:23:55,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2024-09-17 13:23:56,317 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.967e+01 9.683e+01 1.075e+02 2.883e+02, threshold=1.937e+02, percent-clipped=1.0 2024-09-17 13:24:07,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=229200.0, ans=0.2 2024-09-17 13:24:13,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=229240.0, ans=0.05 2024-09-17 13:24:16,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=229240.0, ans=0.125 2024-09-17 13:24:16,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=229240.0, ans=0.0 2024-09-17 13:24:30,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=229280.0, ans=0.125 2024-09-17 13:24:32,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.15 vs. limit=15.0 2024-09-17 13:24:39,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=229320.0, ans=0.0 2024-09-17 13:25:04,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=229360.0, ans=0.025 2024-09-17 13:25:10,619 INFO [train.py:1198] (1/2) Epoch 13, batch 3050, loss[loss=0.2543, ctc_loss=0.1617, cr_loss=0.4142, attn_decoder_loss=0.2554, over 29538.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1584, cr_loss=0.3997, attn_decoder_loss=0.2615, over 5776780.50 frames. ], batch size: 76, lr: 8.57e-03, grad_scale: 8.0 2024-09-17 13:25:30,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229440.0, ans=0.1 2024-09-17 13:25:39,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229480.0, ans=0.1 2024-09-17 13:25:47,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229480.0, ans=0.1 2024-09-17 13:25:52,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2024-09-17 13:26:20,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=229560.0, ans=0.1 2024-09-17 13:26:29,230 INFO [train.py:1198] (1/2) Epoch 13, batch 3100, loss[loss=0.277, ctc_loss=0.177, cr_loss=0.4143, attn_decoder_loss=0.2789, over 29291.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1582, cr_loss=0.3991, attn_decoder_loss=0.2612, over 5775979.73 frames. ], batch size: 100, lr: 8.57e-03, grad_scale: 8.0 2024-09-17 13:26:32,979 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.799e+01 9.409e+01 1.035e+02 1.210e+02 2.103e+02, threshold=2.070e+02, percent-clipped=1.0 2024-09-17 13:26:36,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.89 vs. limit=15.0 2024-09-17 13:26:52,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=229640.0, ans=0.125 2024-09-17 13:27:15,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=229720.0, ans=0.125 2024-09-17 13:27:36,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=229760.0, ans=0.025 2024-09-17 13:27:45,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=229800.0, ans=0.125 2024-09-17 13:27:47,052 INFO [train.py:1198] (1/2) Epoch 13, batch 3150, loss[loss=0.2732, ctc_loss=0.1623, cr_loss=0.4188, attn_decoder_loss=0.2762, over 28793.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1578, cr_loss=0.3986, attn_decoder_loss=0.2609, over 5783162.58 frames. ], batch size: 104, lr: 8.57e-03, grad_scale: 8.0 2024-09-17 13:27:48,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=229800.0, ans=0.0 2024-09-17 13:27:54,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=229800.0, ans=0.0 2024-09-17 13:28:11,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229840.0, ans=0.1 2024-09-17 13:28:11,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=229840.0, ans=0.0 2024-09-17 13:28:17,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229880.0, ans=0.1 2024-09-17 13:28:27,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=229880.0, ans=0.125 2024-09-17 13:28:50,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=229960.0, ans=0.0 2024-09-17 13:28:53,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=229960.0, ans=0.125 2024-09-17 13:28:56,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=229960.0, ans=0.125 2024-09-17 13:29:02,206 INFO [train.py:1198] (1/2) Epoch 13, batch 3200, loss[loss=0.2509, ctc_loss=0.144, cr_loss=0.3818, attn_decoder_loss=0.2543, over 29779.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1571, cr_loss=0.3972, attn_decoder_loss=0.2602, over 5794413.76 frames. ], batch size: 80, lr: 8.56e-03, grad_scale: 16.0 2024-09-17 13:29:05,035 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.025e+01 9.709e+01 1.089e+02 2.819e+02, threshold=1.942e+02, percent-clipped=2.0 2024-09-17 13:29:08,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=230000.0, ans=0.125 2024-09-17 13:29:18,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=22.5 2024-09-17 13:29:37,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=230080.0, ans=0.125 2024-09-17 13:29:48,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=230120.0, ans=0.125 2024-09-17 13:30:20,633 INFO [train.py:1198] (1/2) Epoch 13, batch 3250, loss[loss=0.2632, ctc_loss=0.1568, cr_loss=0.3959, attn_decoder_loss=0.2663, over 29702.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1573, cr_loss=0.398, attn_decoder_loss=0.2605, over 5801264.27 frames. ], batch size: 84, lr: 8.56e-03, grad_scale: 8.0 2024-09-17 13:30:22,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=230200.0, ans=0.125 2024-09-17 13:30:49,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=230240.0, ans=0.0 2024-09-17 13:30:56,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=230280.0, ans=0.125 2024-09-17 13:31:09,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=230320.0, ans=0.0 2024-09-17 13:31:21,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230360.0, ans=0.1 2024-09-17 13:31:38,594 INFO [train.py:1198] (1/2) Epoch 13, batch 3300, loss[loss=0.2616, ctc_loss=0.1534, cr_loss=0.3779, attn_decoder_loss=0.2652, over 28584.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1562, cr_loss=0.3962, attn_decoder_loss=0.2592, over 5799288.28 frames. ], batch size: 112, lr: 8.56e-03, grad_scale: 8.0 2024-09-17 13:31:41,817 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.916e+01 9.519e+01 1.032e+02 2.087e+02, threshold=1.904e+02, percent-clipped=1.0 2024-09-17 13:31:42,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=230400.0, ans=0.125 2024-09-17 13:31:46,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=230400.0, ans=0.0 2024-09-17 13:31:48,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=230400.0, ans=10.0 2024-09-17 13:31:55,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=230440.0, ans=0.125 2024-09-17 13:32:39,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230560.0, ans=0.1 2024-09-17 13:32:43,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=230560.0, ans=0.125 2024-09-17 13:32:53,823 INFO [train.py:1198] (1/2) Epoch 13, batch 3350, loss[loss=0.2665, ctc_loss=0.157, cr_loss=0.3962, attn_decoder_loss=0.2698, over 28777.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1569, cr_loss=0.3967, attn_decoder_loss=0.2599, over 5776814.24 frames. ], batch size: 104, lr: 8.55e-03, grad_scale: 8.0 2024-09-17 13:32:58,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=230600.0, ans=0.125 2024-09-17 13:33:12,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=230640.0, ans=0.2 2024-09-17 13:33:33,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=230680.0, ans=0.125 2024-09-17 13:33:36,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.57 vs. limit=10.0 2024-09-17 13:33:57,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=230760.0, ans=0.04949747468305833 2024-09-17 13:34:14,314 INFO [train.py:1198] (1/2) Epoch 13, batch 3400, loss[loss=0.2253, ctc_loss=0.1331, cr_loss=0.3592, attn_decoder_loss=0.2275, over 29318.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1566, cr_loss=0.3966, attn_decoder_loss=0.26, over 5769617.81 frames. ], batch size: 67, lr: 8.55e-03, grad_scale: 8.0 2024-09-17 13:34:16,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=230800.0, ans=0.125 2024-09-17 13:34:17,289 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.853e+01 8.984e+01 9.781e+01 1.096e+02 3.563e+02, threshold=1.956e+02, percent-clipped=2.0 2024-09-17 13:34:20,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=230800.0, ans=0.125 2024-09-17 13:34:34,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=230840.0, ans=0.025 2024-09-17 13:34:38,697 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:34:49,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=230880.0, ans=0.125 2024-09-17 13:34:58,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=230920.0, ans=0.125 2024-09-17 13:35:07,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=230920.0, ans=0.125 2024-09-17 13:35:21,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=230960.0, ans=0.125 2024-09-17 13:35:30,139 INFO [train.py:1198] (1/2) Epoch 13, batch 3450, loss[loss=0.2743, ctc_loss=0.1711, cr_loss=0.4289, attn_decoder_loss=0.2762, over 28261.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1565, cr_loss=0.3974, attn_decoder_loss=0.2601, over 5777210.60 frames. ], batch size: 111, lr: 8.55e-03, grad_scale: 8.0 2024-09-17 13:35:39,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=231000.0, ans=0.125 2024-09-17 13:35:42,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=231000.0, ans=0.2 2024-09-17 13:35:42,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231000.0, ans=0.125 2024-09-17 13:35:54,633 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:36:09,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=231080.0, ans=0.0 2024-09-17 13:36:11,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-09-17 13:36:12,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=231080.0, ans=0.0 2024-09-17 13:36:12,598 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:36:12,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=231080.0, ans=0.5 2024-09-17 13:36:29,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=231160.0, ans=0.125 2024-09-17 13:36:37,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-17 13:36:45,949 INFO [train.py:1198] (1/2) Epoch 13, batch 3500, loss[loss=0.2466, ctc_loss=0.1541, cr_loss=0.4033, attn_decoder_loss=0.2479, over 29285.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1561, cr_loss=0.3961, attn_decoder_loss=0.2594, over 5777911.46 frames. ], batch size: 71, lr: 8.54e-03, grad_scale: 8.0 2024-09-17 13:36:46,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=231200.0, ans=0.0 2024-09-17 13:36:49,013 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.194e+01 9.091e+01 9.756e+01 1.067e+02 1.863e+02, threshold=1.951e+02, percent-clipped=0.0 2024-09-17 13:36:50,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=231200.0, ans=0.125 2024-09-17 13:36:54,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.07 vs. limit=10.0 2024-09-17 13:37:22,905 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-09-17 13:37:28,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=231280.0, ans=0.05 2024-09-17 13:37:41,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=231320.0, ans=0.125 2024-09-17 13:37:58,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.77 vs. limit=15.0 2024-09-17 13:37:59,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=231400.0, ans=0.0 2024-09-17 13:38:00,813 INFO [train.py:1198] (1/2) Epoch 13, batch 3550, loss[loss=0.2689, ctc_loss=0.1649, cr_loss=0.4161, attn_decoder_loss=0.2712, over 29684.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1559, cr_loss=0.3952, attn_decoder_loss=0.2594, over 5782942.72 frames. ], batch size: 89, lr: 8.54e-03, grad_scale: 8.0 2024-09-17 13:38:01,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=231400.0, ans=0.1 2024-09-17 13:38:12,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=231400.0, ans=0.125 2024-09-17 13:38:42,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231480.0, ans=0.1 2024-09-17 13:38:57,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=231520.0, ans=0.025 2024-09-17 13:39:19,465 INFO [train.py:1198] (1/2) Epoch 13, batch 3600, loss[loss=0.2426, ctc_loss=0.1351, cr_loss=0.3918, attn_decoder_loss=0.2458, over 29494.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1558, cr_loss=0.3954, attn_decoder_loss=0.2596, over 5791881.76 frames. ], batch size: 77, lr: 8.53e-03, grad_scale: 16.0 2024-09-17 13:39:24,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 8.866e+01 9.672e+01 1.060e+02 2.375e+02, threshold=1.934e+02, percent-clipped=1.0 2024-09-17 13:39:25,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=231600.0, ans=0.0 2024-09-17 13:39:36,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=231640.0, ans=0.125 2024-09-17 13:39:45,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2024-09-17 13:40:13,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=231720.0, ans=0.125 2024-09-17 13:40:15,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231720.0, ans=0.125 2024-09-17 13:40:15,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=231720.0, ans=0.125 2024-09-17 13:40:17,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=22.5 2024-09-17 13:40:34,312 INFO [train.py:1198] (1/2) Epoch 13, batch 3650, loss[loss=0.2853, ctc_loss=0.1759, cr_loss=0.4362, attn_decoder_loss=0.2878, over 29488.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.1554, cr_loss=0.3948, attn_decoder_loss=0.2591, over 5794427.62 frames. ], batch size: 90, lr: 8.53e-03, grad_scale: 8.0 2024-09-17 13:40:41,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=231800.0, ans=0.2 2024-09-17 13:40:46,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=231800.0, ans=0.015 2024-09-17 13:40:50,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=231840.0, ans=0.0 2024-09-17 13:40:52,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=231840.0, ans=0.125 2024-09-17 13:40:52,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231840.0, ans=0.1 2024-09-17 13:41:02,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=231880.0, ans=0.025 2024-09-17 13:41:07,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.39 vs. limit=10.0 2024-09-17 13:41:11,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=231880.0, ans=0.125 2024-09-17 13:41:22,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=231920.0, ans=0.125 2024-09-17 13:41:30,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=231920.0, ans=0.125 2024-09-17 13:41:38,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=231960.0, ans=0.125 2024-09-17 13:41:48,930 INFO [train.py:1198] (1/2) Epoch 13, batch 3700, loss[loss=0.2513, ctc_loss=0.1473, cr_loss=0.3715, attn_decoder_loss=0.2546, over 29705.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1555, cr_loss=0.3956, attn_decoder_loss=0.2594, over 5804158.65 frames. ], batch size: 84, lr: 8.53e-03, grad_scale: 8.0 2024-09-17 13:41:53,428 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 9.152e+01 9.843e+01 1.065e+02 3.437e+02, threshold=1.969e+02, percent-clipped=3.0 2024-09-17 13:41:54,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2024-09-17 13:42:02,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=232040.0, ans=0.0 2024-09-17 13:42:14,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.30 vs. limit=22.5 2024-09-17 13:42:19,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.13 vs. limit=15.0 2024-09-17 13:42:27,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2024-09-17 13:42:29,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=232080.0, ans=0.125 2024-09-17 13:42:37,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2024-09-17 13:42:40,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=232120.0, ans=0.015 2024-09-17 13:42:42,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=232120.0, ans=0.125 2024-09-17 13:42:45,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=232120.0, ans=0.0 2024-09-17 13:42:47,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.05 vs. limit=10.0 2024-09-17 13:42:52,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=232160.0, ans=0.125 2024-09-17 13:42:55,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232160.0, ans=0.1 2024-09-17 13:43:03,015 INFO [train.py:1198] (1/2) Epoch 13, batch 3750, loss[loss=0.2239, ctc_loss=0.1375, cr_loss=0.3467, attn_decoder_loss=0.2258, over 29375.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1553, cr_loss=0.3949, attn_decoder_loss=0.259, over 5807208.90 frames. ], batch size: 67, lr: 8.52e-03, grad_scale: 4.0 2024-09-17 13:43:03,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=232200.0, ans=0.125 2024-09-17 13:43:26,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=232240.0, ans=0.125 2024-09-17 13:43:28,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=12.0 2024-09-17 13:43:32,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=232280.0, ans=0.1 2024-09-17 13:43:38,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=232280.0, ans=0.125 2024-09-17 13:43:58,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=232320.0, ans=0.0 2024-09-17 13:44:03,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.55 vs. limit=10.0 2024-09-17 13:44:17,437 INFO [train.py:1198] (1/2) Epoch 13, batch 3800, loss[loss=0.2567, ctc_loss=0.1447, cr_loss=0.3688, attn_decoder_loss=0.261, over 29638.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1551, cr_loss=0.3945, attn_decoder_loss=0.2587, over 5799024.71 frames. ], batch size: 86, lr: 8.52e-03, grad_scale: 8.0 2024-09-17 13:44:17,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=232400.0, ans=0.0 2024-09-17 13:44:23,390 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.154e+01 9.685e+01 1.039e+02 2.233e+02, threshold=1.937e+02, percent-clipped=1.0 2024-09-17 13:44:25,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=232400.0, ans=0.0 2024-09-17 13:45:12,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.26 vs. limit=10.0 2024-09-17 13:45:14,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=232520.0, ans=0.125 2024-09-17 13:45:35,422 INFO [train.py:1198] (1/2) Epoch 13, batch 3850, loss[loss=0.2731, ctc_loss=0.1709, cr_loss=0.4175, attn_decoder_loss=0.2751, over 29264.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1551, cr_loss=0.3946, attn_decoder_loss=0.2587, over 5813917.37 frames. ], batch size: 100, lr: 8.52e-03, grad_scale: 4.0 2024-09-17 13:45:40,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=232600.0, ans=0.0 2024-09-17 13:46:08,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=232680.0, ans=0.125 2024-09-17 13:46:19,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2024-09-17 13:46:35,446 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:46:44,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=232760.0, ans=0.0 2024-09-17 13:46:45,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=232760.0, ans=0.125 2024-09-17 13:46:50,400 INFO [train.py:1198] (1/2) Epoch 13, batch 3900, loss[loss=0.2712, ctc_loss=0.1652, cr_loss=0.4077, attn_decoder_loss=0.2739, over 29651.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1558, cr_loss=0.3958, attn_decoder_loss=0.2595, over 5817632.56 frames. ], batch size: 86, lr: 8.51e-03, grad_scale: 8.0 2024-09-17 13:46:57,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 8.946e+01 9.477e+01 1.034e+02 1.292e+02, threshold=1.895e+02, percent-clipped=0.0 2024-09-17 13:47:11,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=232840.0, ans=0.0 2024-09-17 13:47:26,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-09-17 13:47:30,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=232880.0, ans=0.0 2024-09-17 13:47:34,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2024-09-17 13:47:36,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=232920.0, ans=0.025 2024-09-17 13:47:36,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=232920.0, ans=0.125 2024-09-17 13:47:42,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=232920.0, ans=0.2 2024-09-17 13:47:56,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-09-17 13:48:04,563 INFO [train.py:1198] (1/2) Epoch 13, batch 3950, loss[loss=0.2697, ctc_loss=0.1632, cr_loss=0.4068, attn_decoder_loss=0.2724, over 29490.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.155, cr_loss=0.3949, attn_decoder_loss=0.2593, over 5836878.46 frames. ], batch size: 97, lr: 8.51e-03, grad_scale: 8.0 2024-09-17 13:48:04,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=233000.0, ans=0.125 2024-09-17 13:48:31,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=233040.0, ans=0.1 2024-09-17 13:48:36,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.13 vs. limit=22.5 2024-09-17 13:48:40,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=233080.0, ans=0.125 2024-09-17 13:49:03,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=233160.0, ans=10.0 2024-09-17 13:49:12,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=233160.0, ans=0.125 2024-09-17 13:49:15,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=233160.0, ans=0.125 2024-09-17 13:49:18,346 INFO [train.py:1198] (1/2) Epoch 13, batch 4000, loss[loss=0.2494, ctc_loss=0.1425, cr_loss=0.3638, attn_decoder_loss=0.2532, over 29511.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1553, cr_loss=0.3947, attn_decoder_loss=0.2594, over 5814717.45 frames. ], batch size: 74, lr: 8.51e-03, grad_scale: 16.0 2024-09-17 13:49:25,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=233200.0, ans=0.2 2024-09-17 13:49:26,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2024-09-17 13:49:27,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 9.222e+01 9.816e+01 1.053e+02 2.750e+02, threshold=1.963e+02, percent-clipped=1.0 2024-09-17 13:49:34,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=233240.0, ans=0.2 2024-09-17 13:49:38,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=233240.0, ans=0.0 2024-09-17 13:49:43,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233240.0, ans=0.1 2024-09-17 13:49:45,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=233240.0, ans=0.125 2024-09-17 13:49:48,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-17 13:49:49,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=233280.0, ans=0.125 2024-09-17 13:50:10,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=233320.0, ans=0.125 2024-09-17 13:50:21,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=233360.0, ans=0.0 2024-09-17 13:50:35,262 INFO [train.py:1198] (1/2) Epoch 13, batch 4050, loss[loss=0.292, ctc_loss=0.2017, cr_loss=0.4208, attn_decoder_loss=0.2926, over 20917.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1555, cr_loss=0.3944, attn_decoder_loss=0.2595, over 5798183.91 frames. ], batch size: 209, lr: 8.50e-03, grad_scale: 8.0 2024-09-17 13:50:35,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=233400.0, ans=0.125 2024-09-17 13:50:37,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=233400.0, ans=0.05 2024-09-17 13:50:40,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=233400.0, ans=0.2 2024-09-17 13:50:58,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=233440.0, ans=0.025 2024-09-17 13:51:12,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=233480.0, ans=0.0 2024-09-17 13:51:14,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.01 vs. limit=15.0 2024-09-17 13:51:16,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=233480.0, ans=0.125 2024-09-17 13:51:21,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=233520.0, ans=0.1 2024-09-17 13:51:49,529 INFO [train.py:1198] (1/2) Epoch 13, batch 4100, loss[loss=0.2768, ctc_loss=0.1707, cr_loss=0.4321, attn_decoder_loss=0.279, over 29481.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1559, cr_loss=0.3954, attn_decoder_loss=0.2598, over 5792884.52 frames. ], batch size: 90, lr: 8.50e-03, grad_scale: 8.0 2024-09-17 13:51:49,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=233600.0, ans=0.025 2024-09-17 13:51:51,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=233600.0, ans=0.125 2024-09-17 13:51:55,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=233600.0, ans=0.125 2024-09-17 13:51:59,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-09-17 13:51:59,586 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 9.234e+01 9.794e+01 1.124e+02 2.298e+02, threshold=1.959e+02, percent-clipped=3.0 2024-09-17 13:52:04,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=233640.0, ans=0.0 2024-09-17 13:52:12,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=233640.0, ans=0.125 2024-09-17 13:52:16,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.56 vs. limit=22.5 2024-09-17 13:52:32,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=233720.0, ans=0.125 2024-09-17 13:53:02,943 INFO [train.py:1198] (1/2) Epoch 13, batch 4150, loss[loss=0.2451, ctc_loss=0.1432, cr_loss=0.3796, attn_decoder_loss=0.2479, over 29498.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1557, cr_loss=0.3946, attn_decoder_loss=0.2592, over 5798252.46 frames. ], batch size: 77, lr: 8.49e-03, grad_scale: 8.0 2024-09-17 13:53:04,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=233800.0, ans=0.125 2024-09-17 13:53:24,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-09-17 13:53:33,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=233880.0, ans=0.125 2024-09-17 13:53:36,217 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=15.0 2024-09-17 13:54:18,795 INFO [train.py:1198] (1/2) Epoch 13, batch 4200, loss[loss=0.2799, ctc_loss=0.179, cr_loss=0.4499, attn_decoder_loss=0.2811, over 29527.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1558, cr_loss=0.3952, attn_decoder_loss=0.2598, over 5799943.19 frames. ], batch size: 90, lr: 8.49e-03, grad_scale: 8.0 2024-09-17 13:54:25,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234000.0, ans=0.1 2024-09-17 13:54:30,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.689e+01 8.618e+01 9.139e+01 9.691e+01 3.040e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-17 13:55:01,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=234120.0, ans=0.125 2024-09-17 13:55:08,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.63 vs. limit=15.0 2024-09-17 13:55:12,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.53 vs. limit=6.0 2024-09-17 13:55:13,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=234120.0, ans=0.125 2024-09-17 13:55:19,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=234160.0, ans=0.2 2024-09-17 13:55:22,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=234160.0, ans=0.025 2024-09-17 13:55:22,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=234160.0, ans=0.125 2024-09-17 13:55:32,367 INFO [train.py:1198] (1/2) Epoch 13, batch 4250, loss[loss=0.2451, ctc_loss=0.1457, cr_loss=0.3833, attn_decoder_loss=0.2477, over 29497.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1556, cr_loss=0.3947, attn_decoder_loss=0.2602, over 5805455.50 frames. ], batch size: 74, lr: 8.49e-03, grad_scale: 8.0 2024-09-17 13:55:53,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=22.5 2024-09-17 13:55:57,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=234240.0, ans=0.0 2024-09-17 13:56:01,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=234280.0, ans=0.125 2024-09-17 13:56:02,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234280.0, ans=0.1 2024-09-17 13:56:19,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234320.0, ans=0.0 2024-09-17 13:56:32,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234360.0, ans=0.0 2024-09-17 13:56:35,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=234360.0, ans=0.125 2024-09-17 13:56:46,320 INFO [train.py:1198] (1/2) Epoch 13, batch 4300, loss[loss=0.2754, ctc_loss=0.1689, cr_loss=0.423, attn_decoder_loss=0.2778, over 29530.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.156, cr_loss=0.3951, attn_decoder_loss=0.2606, over 5793931.08 frames. ], batch size: 87, lr: 8.48e-03, grad_scale: 8.0 2024-09-17 13:56:46,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234400.0, ans=0.1 2024-09-17 13:56:47,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.66 vs. limit=6.0 2024-09-17 13:56:58,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.092e+01 9.409e+01 9.956e+01 1.092e+02 6.321e+02, threshold=1.991e+02, percent-clipped=4.0 2024-09-17 13:57:35,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=234520.0, ans=0.125 2024-09-17 13:57:55,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=234560.0, ans=0.0 2024-09-17 13:58:02,350 INFO [train.py:1198] (1/2) Epoch 13, batch 4350, loss[loss=0.2697, ctc_loss=0.1585, cr_loss=0.3909, attn_decoder_loss=0.2734, over 29450.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1591, cr_loss=0.4011, attn_decoder_loss=0.2643, over 5796384.45 frames. ], batch size: 97, lr: 8.48e-03, grad_scale: 8.0 2024-09-17 13:58:04,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=234600.0, ans=0.025 2024-09-17 13:58:14,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234600.0, ans=0.1 2024-09-17 13:58:26,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=234640.0, ans=0.125 2024-09-17 13:58:42,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.51 vs. limit=15.0 2024-09-17 13:58:43,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=234680.0, ans=0.025 2024-09-17 13:58:49,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.87 vs. limit=10.0 2024-09-17 13:58:56,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=234720.0, ans=0.125 2024-09-17 13:58:56,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234720.0, ans=0.1 2024-09-17 13:58:56,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=234720.0, ans=0.2 2024-09-17 13:59:15,467 INFO [train.py:1198] (1/2) Epoch 13, batch 4400, loss[loss=0.2755, ctc_loss=0.1799, cr_loss=0.4252, attn_decoder_loss=0.2767, over 27251.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1614, cr_loss=0.4049, attn_decoder_loss=0.2668, over 5765108.21 frames. ], batch size: 124, lr: 8.48e-03, grad_scale: 16.0 2024-09-17 13:59:15,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=234800.0, ans=0.125 2024-09-17 13:59:24,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=234800.0, ans=0.125 2024-09-17 13:59:28,466 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.453e+01 9.581e+01 9.987e+01 1.106e+02 2.626e+02, threshold=1.997e+02, percent-clipped=1.0 2024-09-17 13:59:56,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=234880.0, ans=0.125 2024-09-17 14:00:01,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=234920.0, ans=0.125 2024-09-17 14:00:18,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=234960.0, ans=0.125 2024-09-17 14:00:23,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=234960.0, ans=0.125 2024-09-17 14:00:26,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-09-17 14:00:30,054 INFO [train.py:1198] (1/2) Epoch 13, batch 4450, loss[loss=0.2811, ctc_loss=0.2031, cr_loss=0.4236, attn_decoder_loss=0.2804, over 20540.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1667, cr_loss=0.4093, attn_decoder_loss=0.2696, over 5572204.63 frames. ], batch size: 209, lr: 8.47e-03, grad_scale: 8.0 2024-09-17 14:00:36,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=235000.0, ans=0.0 2024-09-17 14:00:37,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2024-09-17 14:00:46,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=235040.0, ans=0.0 2024-09-17 14:01:00,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=235080.0, ans=15.0 2024-09-17 14:01:35,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=235160.0, ans=0.125 2024-09-17 14:01:40,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=235160.0, ans=0.0 2024-09-17 14:01:46,500 INFO [train.py:1198] (1/2) Epoch 13, batch 4500, loss[loss=0.2753, ctc_loss=0.1838, cr_loss=0.3928, attn_decoder_loss=0.2767, over 19724.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1725, cr_loss=0.4107, attn_decoder_loss=0.2722, over 5232207.86 frames. ], batch size: 209, lr: 8.47e-03, grad_scale: 8.0 2024-09-17 14:01:48,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.28 vs. limit=22.5 2024-09-17 14:02:00,041 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.936e+01 1.022e+02 1.119e+02 1.227e+02 3.439e+02, threshold=2.238e+02, percent-clipped=3.0 2024-09-17 14:02:07,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=235240.0, ans=0.125 2024-09-17 14:02:16,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-09-17 14:02:18,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=235280.0, ans=0.125 2024-09-17 14:03:16,453 INFO [train.py:1198] (1/2) Epoch 14, batch 0, loss[loss=0.2354, ctc_loss=0.1311, cr_loss=0.3475, attn_decoder_loss=0.2392, over 29611.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1311, cr_loss=0.3475, attn_decoder_loss=0.2392, over 29611.00 frames. ], batch size: 73, lr: 8.16e-03, grad_scale: 16.0 2024-09-17 14:03:16,454 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 14:03:34,823 INFO [train.py:1230] (1/2) Epoch 14, validation: loss=0.2137, ctc_loss=0.04354, cr_loss=5.325e-15, attn_decoder_loss=0.2326, over 944034.00 frames. 2024-09-17 14:03:34,823 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 14:03:40,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.45 vs. limit=15.0 2024-09-17 14:03:42,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=235300.0, ans=0.0 2024-09-17 14:03:42,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=235300.0, ans=0.0 2024-09-17 14:03:44,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.68 vs. limit=15.0 2024-09-17 14:03:47,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=235300.0, ans=0.0 2024-09-17 14:03:59,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=235340.0, ans=0.0 2024-09-17 14:04:19,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=235380.0, ans=0.125 2024-09-17 14:04:26,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2024-09-17 14:04:27,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=235420.0, ans=0.1 2024-09-17 14:04:41,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=12.0 2024-09-17 14:04:46,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.99 vs. limit=15.0 2024-09-17 14:04:50,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=235460.0, ans=0.0 2024-09-17 14:04:52,730 INFO [train.py:1198] (1/2) Epoch 14, batch 50, loss[loss=0.2249, ctc_loss=0.1302, cr_loss=0.3356, attn_decoder_loss=0.2279, over 29416.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1584, cr_loss=0.398, attn_decoder_loss=0.2609, over 1267789.34 frames. ], batch size: 70, lr: 8.16e-03, grad_scale: 8.0 2024-09-17 14:05:05,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=235500.0, ans=0.125 2024-09-17 14:05:09,776 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:05:17,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=235540.0, ans=0.125 2024-09-17 14:05:27,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=235580.0, ans=0.025 2024-09-17 14:05:40,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=235620.0, ans=0.0 2024-09-17 14:05:45,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.961e+01 9.193e+01 1.002e+02 1.099e+02 2.018e+02, threshold=2.003e+02, percent-clipped=0.0 2024-09-17 14:05:50,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=235620.0, ans=0.0 2024-09-17 14:05:56,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=235660.0, ans=0.125 2024-09-17 14:05:57,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-09-17 14:06:04,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=235660.0, ans=0.125 2024-09-17 14:06:04,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=235660.0, ans=0.125 2024-09-17 14:06:08,489 INFO [train.py:1198] (1/2) Epoch 14, batch 100, loss[loss=0.2587, ctc_loss=0.1601, cr_loss=0.43, attn_decoder_loss=0.2601, over 29531.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1597, cr_loss=0.4002, attn_decoder_loss=0.2631, over 2252593.29 frames. ], batch size: 76, lr: 8.15e-03, grad_scale: 8.0 2024-09-17 14:06:29,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-09-17 14:06:36,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=235740.0, ans=0.0 2024-09-17 14:06:42,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=235780.0, ans=0.125 2024-09-17 14:06:45,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=235780.0, ans=0.07 2024-09-17 14:06:50,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=235780.0, ans=0.2 2024-09-17 14:06:57,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=235820.0, ans=0.0 2024-09-17 14:07:04,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=235820.0, ans=0.125 2024-09-17 14:07:07,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=235820.0, ans=0.125 2024-09-17 14:07:25,372 INFO [train.py:1198] (1/2) Epoch 14, batch 150, loss[loss=0.2375, ctc_loss=0.1443, cr_loss=0.3841, attn_decoder_loss=0.2393, over 29426.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1569, cr_loss=0.3974, attn_decoder_loss=0.2608, over 3046777.01 frames. ], batch size: 70, lr: 8.15e-03, grad_scale: 8.0 2024-09-17 14:07:33,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=235900.0, ans=0.125 2024-09-17 14:08:05,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-09-17 14:08:09,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2024-09-17 14:08:20,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 9.181e+01 9.587e+01 1.009e+02 1.798e+02, threshold=1.917e+02, percent-clipped=0.0 2024-09-17 14:08:26,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=236060.0, ans=0.0 2024-09-17 14:08:32,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=236060.0, ans=0.125 2024-09-17 14:08:32,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=236060.0, ans=0.125 2024-09-17 14:08:37,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=236060.0, ans=0.04949747468305833 2024-09-17 14:08:43,261 INFO [train.py:1198] (1/2) Epoch 14, batch 200, loss[loss=0.2762, ctc_loss=0.1709, cr_loss=0.4294, attn_decoder_loss=0.2784, over 27289.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1559, cr_loss=0.397, attn_decoder_loss=0.2599, over 3658736.72 frames. ], batch size: 124, lr: 8.15e-03, grad_scale: 8.0 2024-09-17 14:09:12,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=236180.0, ans=0.125 2024-09-17 14:09:16,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=236180.0, ans=0.0 2024-09-17 14:09:58,994 INFO [train.py:1198] (1/2) Epoch 14, batch 250, loss[loss=0.2733, ctc_loss=0.1692, cr_loss=0.4231, attn_decoder_loss=0.2755, over 29255.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1556, cr_loss=0.3959, attn_decoder_loss=0.2598, over 4139859.98 frames. ], batch size: 100, lr: 8.14e-03, grad_scale: 8.0 2024-09-17 14:10:00,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=236300.0, ans=0.2 2024-09-17 14:10:20,672 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:10:41,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-09-17 14:10:54,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.995e+01 9.389e+01 1.000e+02 1.684e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-17 14:11:13,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.48 vs. limit=6.0 2024-09-17 14:11:17,011 INFO [train.py:1198] (1/2) Epoch 14, batch 300, loss[loss=0.2795, ctc_loss=0.1734, cr_loss=0.4128, attn_decoder_loss=0.2821, over 29515.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1548, cr_loss=0.3952, attn_decoder_loss=0.2594, over 4508262.32 frames. ], batch size: 92, lr: 8.14e-03, grad_scale: 8.0 2024-09-17 14:11:18,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=236500.0, ans=0.125 2024-09-17 14:11:26,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=236500.0, ans=0.125 2024-09-17 14:12:05,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=236620.0, ans=0.0 2024-09-17 14:12:25,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-09-17 14:12:33,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=236700.0, ans=0.125 2024-09-17 14:12:35,072 INFO [train.py:1198] (1/2) Epoch 14, batch 350, loss[loss=0.227, ctc_loss=0.1252, cr_loss=0.3476, attn_decoder_loss=0.2306, over 29309.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1555, cr_loss=0.3969, attn_decoder_loss=0.2601, over 4794081.55 frames. ], batch size: 71, lr: 8.14e-03, grad_scale: 8.0 2024-09-17 14:12:44,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=236700.0, ans=0.125 2024-09-17 14:12:47,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=236700.0, ans=0.0 2024-09-17 14:13:08,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236780.0, ans=0.125 2024-09-17 14:13:08,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=236780.0, ans=0.025 2024-09-17 14:13:28,291 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.698e+01 9.344e+01 1.025e+02 1.871e+02, threshold=1.869e+02, percent-clipped=0.0 2024-09-17 14:13:40,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=236860.0, ans=0.95 2024-09-17 14:13:40,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=236860.0, ans=0.0 2024-09-17 14:13:48,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2024-09-17 14:13:49,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=236900.0, ans=0.0 2024-09-17 14:13:49,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=236900.0, ans=0.125 2024-09-17 14:13:50,804 INFO [train.py:1198] (1/2) Epoch 14, batch 400, loss[loss=0.2606, ctc_loss=0.1537, cr_loss=0.3935, attn_decoder_loss=0.2637, over 29700.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1553, cr_loss=0.3961, attn_decoder_loss=0.2597, over 5024802.06 frames. ], batch size: 82, lr: 8.13e-03, grad_scale: 16.0 2024-09-17 14:13:54,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=236900.0, ans=0.0 2024-09-17 14:14:01,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=236900.0, ans=0.025 2024-09-17 14:14:03,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=236900.0, ans=0.5 2024-09-17 14:14:15,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=236940.0, ans=0.0 2024-09-17 14:14:44,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=237020.0, ans=0.125 2024-09-17 14:15:08,974 INFO [train.py:1198] (1/2) Epoch 14, batch 450, loss[loss=0.2657, ctc_loss=0.1569, cr_loss=0.3967, attn_decoder_loss=0.2689, over 29672.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1549, cr_loss=0.3956, attn_decoder_loss=0.2595, over 5185929.56 frames. ], batch size: 83, lr: 8.13e-03, grad_scale: 8.0 2024-09-17 14:15:24,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=237140.0, ans=0.125 2024-09-17 14:15:27,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=237140.0, ans=0.5 2024-09-17 14:15:33,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=237140.0, ans=0.125 2024-09-17 14:15:55,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237220.0, ans=0.1 2024-09-17 14:15:58,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=237220.0, ans=0.2 2024-09-17 14:16:04,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=237220.0, ans=0.125 2024-09-17 14:16:05,940 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.900e+01 9.763e+01 1.081e+02 1.650e+02, threshold=1.953e+02, percent-clipped=0.0 2024-09-17 14:16:18,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=237260.0, ans=0.0 2024-09-17 14:16:27,064 INFO [train.py:1198] (1/2) Epoch 14, batch 500, loss[loss=0.2695, ctc_loss=0.1618, cr_loss=0.3953, attn_decoder_loss=0.2727, over 29372.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1543, cr_loss=0.3954, attn_decoder_loss=0.2588, over 5328985.44 frames. ], batch size: 94, lr: 8.13e-03, grad_scale: 8.0 2024-09-17 14:16:31,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=237300.0, ans=0.0 2024-09-17 14:16:35,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-09-17 14:16:36,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=237300.0, ans=0.1 2024-09-17 14:16:39,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=237300.0, ans=0.0 2024-09-17 14:16:48,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=237340.0, ans=0.0 2024-09-17 14:17:06,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=237380.0, ans=0.0 2024-09-17 14:17:42,705 INFO [train.py:1198] (1/2) Epoch 14, batch 550, loss[loss=0.2554, ctc_loss=0.1402, cr_loss=0.3644, attn_decoder_loss=0.2601, over 28877.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1546, cr_loss=0.3954, attn_decoder_loss=0.2589, over 5421250.49 frames. ], batch size: 104, lr: 8.12e-03, grad_scale: 8.0 2024-09-17 14:17:47,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=237500.0, ans=0.125 2024-09-17 14:18:13,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=237580.0, ans=0.0 2024-09-17 14:18:40,101 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.963e+01 9.623e+01 1.012e+02 2.800e+02, threshold=1.925e+02, percent-clipped=3.0 2024-09-17 14:18:40,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=237620.0, ans=0.0 2024-09-17 14:18:57,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=237660.0, ans=0.125 2024-09-17 14:18:58,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237660.0, ans=0.1 2024-09-17 14:19:00,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=237700.0, ans=0.125 2024-09-17 14:19:01,495 INFO [train.py:1198] (1/2) Epoch 14, batch 600, loss[loss=0.2778, ctc_loss=0.1772, cr_loss=0.4299, attn_decoder_loss=0.2795, over 29246.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1547, cr_loss=0.3954, attn_decoder_loss=0.2593, over 5507649.85 frames. ], batch size: 100, lr: 8.12e-03, grad_scale: 8.0 2024-09-17 14:19:19,841 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:19:22,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2024-09-17 14:19:32,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.07 vs. limit=15.0 2024-09-17 14:19:34,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=237780.0, ans=0.04949747468305833 2024-09-17 14:19:40,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237780.0, ans=0.1 2024-09-17 14:19:40,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=237780.0, ans=0.125 2024-09-17 14:19:57,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=12.0 2024-09-17 14:20:06,051 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-09-17 14:20:19,157 INFO [train.py:1198] (1/2) Epoch 14, batch 650, loss[loss=0.2565, ctc_loss=0.1584, cr_loss=0.3895, attn_decoder_loss=0.2587, over 29771.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1537, cr_loss=0.3938, attn_decoder_loss=0.2586, over 5585572.59 frames. ], batch size: 81, lr: 8.12e-03, grad_scale: 8.0 2024-09-17 14:20:51,324 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:20:57,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=237980.0, ans=0.07 2024-09-17 14:21:11,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=238020.0, ans=0.0 2024-09-17 14:21:13,715 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.771e+01 9.255e+01 1.013e+02 1.766e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-17 14:21:24,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=238060.0, ans=0.05 2024-09-17 14:21:34,763 INFO [train.py:1198] (1/2) Epoch 14, batch 700, loss[loss=0.2504, ctc_loss=0.1567, cr_loss=0.4038, attn_decoder_loss=0.2518, over 29519.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.154, cr_loss=0.3943, attn_decoder_loss=0.2591, over 5637000.16 frames. ], batch size: 76, lr: 8.11e-03, grad_scale: 8.0 2024-09-17 14:21:44,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=238100.0, ans=0.125 2024-09-17 14:22:08,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=238180.0, ans=0.0 2024-09-17 14:22:09,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238180.0, ans=0.1 2024-09-17 14:22:13,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.76 vs. limit=15.0 2024-09-17 14:22:37,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=238260.0, ans=0.125 2024-09-17 14:22:52,567 INFO [train.py:1198] (1/2) Epoch 14, batch 750, loss[loss=0.2664, ctc_loss=0.1619, cr_loss=0.4176, attn_decoder_loss=0.2687, over 29723.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.154, cr_loss=0.3944, attn_decoder_loss=0.2587, over 5677255.98 frames. ], batch size: 82, lr: 8.11e-03, grad_scale: 8.0 2024-09-17 14:23:07,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=238340.0, ans=0.2 2024-09-17 14:23:12,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=238340.0, ans=0.1 2024-09-17 14:23:24,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=238380.0, ans=0.1 2024-09-17 14:23:32,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=238380.0, ans=0.125 2024-09-17 14:23:35,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=238380.0, ans=0.125 2024-09-17 14:23:40,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=238420.0, ans=0.0 2024-09-17 14:23:42,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-17 14:23:47,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2024-09-17 14:23:48,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238420.0, ans=0.1 2024-09-17 14:23:49,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.200e+01 9.849e+01 1.104e+02 2.206e+02, threshold=1.970e+02, percent-clipped=2.0 2024-09-17 14:24:07,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2024-09-17 14:24:10,897 INFO [train.py:1198] (1/2) Epoch 14, batch 800, loss[loss=0.2295, ctc_loss=0.1296, cr_loss=0.3453, attn_decoder_loss=0.2329, over 29589.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1543, cr_loss=0.395, attn_decoder_loss=0.2588, over 5708390.29 frames. ], batch size: 73, lr: 8.11e-03, grad_scale: 16.0 2024-09-17 14:24:42,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238580.0, ans=0.1 2024-09-17 14:24:50,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=238580.0, ans=0.125 2024-09-17 14:25:07,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=238620.0, ans=0.125 2024-09-17 14:25:13,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238660.0, ans=0.1 2024-09-17 14:25:20,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=238660.0, ans=0.125 2024-09-17 14:25:23,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff3.min_abs, batch_count=238660.0, ans=0.2 2024-09-17 14:25:26,162 INFO [train.py:1198] (1/2) Epoch 14, batch 850, loss[loss=0.2781, ctc_loss=0.1821, cr_loss=0.437, attn_decoder_loss=0.2791, over 29715.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1536, cr_loss=0.3938, attn_decoder_loss=0.2583, over 5737663.69 frames. ], batch size: 89, lr: 8.10e-03, grad_scale: 8.0 2024-09-17 14:25:41,515 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:25:42,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238740.0, ans=0.1 2024-09-17 14:25:42,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=238740.0, ans=0.125 2024-09-17 14:25:50,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=238740.0, ans=0.1 2024-09-17 14:25:58,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=238780.0, ans=0.07 2024-09-17 14:26:01,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=238780.0, ans=10.0 2024-09-17 14:26:02,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=238780.0, ans=0.125 2024-09-17 14:26:20,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2024-09-17 14:26:22,025 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.039e+01 9.635e+01 1.057e+02 1.739e+02, threshold=1.927e+02, percent-clipped=0.0 2024-09-17 14:26:22,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=238820.0, ans=0.125 2024-09-17 14:26:27,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=238860.0, ans=0.1 2024-09-17 14:26:43,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=238900.0, ans=0.0 2024-09-17 14:26:44,217 INFO [train.py:1198] (1/2) Epoch 14, batch 900, loss[loss=0.2371, ctc_loss=0.1405, cr_loss=0.3659, attn_decoder_loss=0.2397, over 29629.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1545, cr_loss=0.395, attn_decoder_loss=0.259, over 5741993.99 frames. ], batch size: 73, lr: 8.10e-03, grad_scale: 8.0 2024-09-17 14:26:44,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=238900.0, ans=0.125 2024-09-17 14:26:52,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=238900.0, ans=0.04949747468305833 2024-09-17 14:26:53,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=238900.0, ans=0.025 2024-09-17 14:27:05,746 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:27:08,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=238940.0, ans=0.125 2024-09-17 14:27:09,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.88 vs. limit=22.5 2024-09-17 14:27:21,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-09-17 14:27:28,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=239020.0, ans=0.125 2024-09-17 14:27:59,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239060.0, ans=0.1 2024-09-17 14:28:01,790 INFO [train.py:1198] (1/2) Epoch 14, batch 950, loss[loss=0.2358, ctc_loss=0.1403, cr_loss=0.3648, attn_decoder_loss=0.2382, over 29493.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1543, cr_loss=0.3948, attn_decoder_loss=0.2591, over 5742391.10 frames. ], batch size: 74, lr: 8.10e-03, grad_scale: 8.0 2024-09-17 14:28:23,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=239140.0, ans=0.025 2024-09-17 14:28:26,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=239140.0, ans=0.2 2024-09-17 14:28:35,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=239180.0, ans=0.125 2024-09-17 14:28:57,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.51 vs. limit=15.0 2024-09-17 14:28:58,290 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 9.217e+01 9.958e+01 1.123e+02 9.034e+02, threshold=1.992e+02, percent-clipped=2.0 2024-09-17 14:29:17,707 INFO [train.py:1198] (1/2) Epoch 14, batch 1000, loss[loss=0.262, ctc_loss=0.1625, cr_loss=0.4037, attn_decoder_loss=0.2641, over 29493.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1552, cr_loss=0.3961, attn_decoder_loss=0.2597, over 5737531.41 frames. ], batch size: 77, lr: 8.09e-03, grad_scale: 8.0 2024-09-17 14:29:19,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=239300.0, ans=0.125 2024-09-17 14:29:24,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=239300.0, ans=0.125 2024-09-17 14:29:26,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=239300.0, ans=0.125 2024-09-17 14:29:29,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=239300.0, ans=0.125 2024-09-17 14:29:55,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=239380.0, ans=0.125 2024-09-17 14:30:03,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=239420.0, ans=0.0 2024-09-17 14:30:18,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=239460.0, ans=0.125 2024-09-17 14:30:35,801 INFO [train.py:1198] (1/2) Epoch 14, batch 1050, loss[loss=0.2641, ctc_loss=0.1585, cr_loss=0.4161, attn_decoder_loss=0.2666, over 29668.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1546, cr_loss=0.3948, attn_decoder_loss=0.2589, over 5745785.83 frames. ], batch size: 85, lr: 8.09e-03, grad_scale: 8.0 2024-09-17 14:30:42,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.02 vs. limit=15.0 2024-09-17 14:30:53,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.76 vs. limit=22.5 2024-09-17 14:30:55,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=239540.0, ans=0.0 2024-09-17 14:31:19,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.53 vs. limit=15.0 2024-09-17 14:31:20,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=239620.0, ans=0.125 2024-09-17 14:31:24,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239620.0, ans=0.1 2024-09-17 14:31:26,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=239620.0, ans=0.0 2024-09-17 14:31:34,119 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 8.789e+01 9.469e+01 1.013e+02 1.494e+02, threshold=1.894e+02, percent-clipped=0.0 2024-09-17 14:31:53,884 INFO [train.py:1198] (1/2) Epoch 14, batch 1100, loss[loss=0.2533, ctc_loss=0.1528, cr_loss=0.3952, attn_decoder_loss=0.2557, over 29457.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1542, cr_loss=0.3943, attn_decoder_loss=0.2585, over 5756991.47 frames. ], batch size: 78, lr: 8.09e-03, grad_scale: 8.0 2024-09-17 14:32:07,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=239740.0, ans=0.2 2024-09-17 14:33:09,740 INFO [train.py:1198] (1/2) Epoch 14, batch 1150, loss[loss=0.2486, ctc_loss=0.1431, cr_loss=0.3781, attn_decoder_loss=0.2519, over 29455.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1543, cr_loss=0.3945, attn_decoder_loss=0.2586, over 5754838.57 frames. ], batch size: 78, lr: 8.08e-03, grad_scale: 8.0 2024-09-17 14:33:42,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=239980.0, ans=0.125 2024-09-17 14:34:13,793 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.029e+01 9.820e+01 1.050e+02 2.109e+02, threshold=1.964e+02, percent-clipped=1.0 2024-09-17 14:34:23,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=240060.0, ans=0.125 2024-09-17 14:34:26,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=240060.0, ans=0.0 2024-09-17 14:34:32,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=240060.0, ans=15.0 2024-09-17 14:34:36,162 INFO [train.py:1198] (1/2) Epoch 14, batch 1200, loss[loss=0.2573, ctc_loss=0.1473, cr_loss=0.3868, attn_decoder_loss=0.261, over 29671.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1551, cr_loss=0.3955, attn_decoder_loss=0.2595, over 5745099.65 frames. ], batch size: 85, lr: 8.08e-03, grad_scale: 16.0 2024-09-17 14:35:18,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=15.0 2024-09-17 14:35:30,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=240220.0, ans=0.2 2024-09-17 14:35:39,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=240260.0, ans=0.125 2024-09-17 14:35:41,067 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:35:41,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=240260.0, ans=0.0 2024-09-17 14:35:54,248 INFO [train.py:1198] (1/2) Epoch 14, batch 1250, loss[loss=0.2693, ctc_loss=0.1698, cr_loss=0.4292, attn_decoder_loss=0.2708, over 29562.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1552, cr_loss=0.3961, attn_decoder_loss=0.26, over 5773554.34 frames. ], batch size: 92, lr: 8.08e-03, grad_scale: 8.0 2024-09-17 14:36:11,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=240340.0, ans=0.0 2024-09-17 14:36:27,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.66 vs. limit=15.0 2024-09-17 14:36:40,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240420.0, ans=0.1 2024-09-17 14:36:52,178 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.786e+01 9.275e+01 9.951e+01 3.249e+02, threshold=1.855e+02, percent-clipped=3.0 2024-09-17 14:37:04,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=240460.0, ans=0.125 2024-09-17 14:37:09,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=240500.0, ans=0.95 2024-09-17 14:37:10,334 INFO [train.py:1198] (1/2) Epoch 14, batch 1300, loss[loss=0.266, ctc_loss=0.1651, cr_loss=0.3872, attn_decoder_loss=0.2686, over 28210.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.1544, cr_loss=0.3951, attn_decoder_loss=0.2592, over 5777094.82 frames. ], batch size: 111, lr: 8.07e-03, grad_scale: 8.0 2024-09-17 14:37:27,278 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:37:54,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=240620.0, ans=0.09899494936611666 2024-09-17 14:38:04,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=240620.0, ans=0.1 2024-09-17 14:38:24,437 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:38:25,776 INFO [train.py:1198] (1/2) Epoch 14, batch 1350, loss[loss=0.2595, ctc_loss=0.1573, cr_loss=0.4095, attn_decoder_loss=0.2617, over 29762.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1538, cr_loss=0.3951, attn_decoder_loss=0.2589, over 5794203.67 frames. ], batch size: 81, lr: 8.07e-03, grad_scale: 8.0 2024-09-17 14:38:28,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=240700.0, ans=0.5 2024-09-17 14:39:09,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=240780.0, ans=0.125 2024-09-17 14:39:17,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.29 vs. limit=22.5 2024-09-17 14:39:27,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.834e+01 9.288e+01 9.876e+01 1.389e+02, threshold=1.858e+02, percent-clipped=0.0 2024-09-17 14:39:33,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=240860.0, ans=0.0 2024-09-17 14:39:34,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=240860.0, ans=0.0 2024-09-17 14:39:45,916 INFO [train.py:1198] (1/2) Epoch 14, batch 1400, loss[loss=0.2255, ctc_loss=0.134, cr_loss=0.3677, attn_decoder_loss=0.2275, over 29576.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1538, cr_loss=0.3946, attn_decoder_loss=0.2587, over 5805692.97 frames. ], batch size: 69, lr: 8.07e-03, grad_scale: 8.0 2024-09-17 14:40:18,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-09-17 14:40:34,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=241020.0, ans=0.0 2024-09-17 14:40:57,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=241060.0, ans=0.125 2024-09-17 14:41:00,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2024-09-17 14:41:01,383 INFO [train.py:1198] (1/2) Epoch 14, batch 1450, loss[loss=0.2767, ctc_loss=0.172, cr_loss=0.4248, attn_decoder_loss=0.2789, over 29438.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1543, cr_loss=0.3959, attn_decoder_loss=0.2593, over 5803338.15 frames. ], batch size: 94, lr: 8.06e-03, grad_scale: 8.0 2024-09-17 14:41:06,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.55 vs. limit=15.0 2024-09-17 14:41:43,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=241180.0, ans=0.0 2024-09-17 14:41:58,395 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 9.187e+01 9.748e+01 1.026e+02 3.155e+02, threshold=1.950e+02, percent-clipped=2.0 2024-09-17 14:42:03,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=241260.0, ans=0.125 2024-09-17 14:42:12,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=241260.0, ans=0.2 2024-09-17 14:42:12,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=241260.0, ans=0.125 2024-09-17 14:42:16,801 INFO [train.py:1198] (1/2) Epoch 14, batch 1500, loss[loss=0.2671, ctc_loss=0.1578, cr_loss=0.4009, attn_decoder_loss=0.2703, over 29656.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1542, cr_loss=0.3955, attn_decoder_loss=0.2595, over 5803771.76 frames. ], batch size: 86, lr: 8.06e-03, grad_scale: 8.0 2024-09-17 14:42:39,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=241340.0, ans=0.0 2024-09-17 14:42:41,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=22.5 2024-09-17 14:42:42,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=241340.0, ans=0.0 2024-09-17 14:42:43,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=241340.0, ans=0.125 2024-09-17 14:42:48,122 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:42:56,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.90 vs. limit=15.0 2024-09-17 14:43:00,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=241380.0, ans=0.125 2024-09-17 14:43:07,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-09-17 14:43:11,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=241420.0, ans=0.125 2024-09-17 14:43:13,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=241420.0, ans=0.125 2024-09-17 14:43:37,274 INFO [train.py:1198] (1/2) Epoch 14, batch 1550, loss[loss=0.2698, ctc_loss=0.1628, cr_loss=0.426, attn_decoder_loss=0.2722, over 29499.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1543, cr_loss=0.3951, attn_decoder_loss=0.2594, over 5781158.44 frames. ], batch size: 90, lr: 8.06e-03, grad_scale: 8.0 2024-09-17 14:44:11,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-17 14:44:24,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=241620.0, ans=0.125 2024-09-17 14:44:34,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 9.004e+01 9.910e+01 1.078e+02 4.071e+02, threshold=1.982e+02, percent-clipped=2.0 2024-09-17 14:44:53,195 INFO [train.py:1198] (1/2) Epoch 14, batch 1600, loss[loss=0.2654, ctc_loss=0.1569, cr_loss=0.3874, attn_decoder_loss=0.2689, over 29667.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1548, cr_loss=0.3953, attn_decoder_loss=0.2593, over 5764821.75 frames. ], batch size: 85, lr: 8.05e-03, grad_scale: 16.0 2024-09-17 14:45:07,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2024-09-17 14:45:23,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=241780.0, ans=0.125 2024-09-17 14:45:29,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=241780.0, ans=0.025 2024-09-17 14:45:31,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=241780.0, ans=0.2 2024-09-17 14:45:33,059 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:45:40,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=241820.0, ans=0.125 2024-09-17 14:45:59,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2024-09-17 14:46:07,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=241900.0, ans=0.025 2024-09-17 14:46:08,676 INFO [train.py:1198] (1/2) Epoch 14, batch 1650, loss[loss=0.2798, ctc_loss=0.1706, cr_loss=0.4113, attn_decoder_loss=0.2828, over 29732.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1548, cr_loss=0.395, attn_decoder_loss=0.2593, over 5758585.08 frames. ], batch size: 89, lr: 8.05e-03, grad_scale: 8.0 2024-09-17 14:46:53,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2024-09-17 14:47:12,402 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 8.773e+01 9.391e+01 1.036e+02 1.444e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-17 14:47:17,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=242060.0, ans=0.125 2024-09-17 14:47:26,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=242060.0, ans=0.02 2024-09-17 14:47:27,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=242100.0, ans=0.125 2024-09-17 14:47:28,929 INFO [train.py:1198] (1/2) Epoch 14, batch 1700, loss[loss=0.2284, ctc_loss=0.1259, cr_loss=0.3505, attn_decoder_loss=0.232, over 29610.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1541, cr_loss=0.3938, attn_decoder_loss=0.2589, over 5780581.18 frames. ], batch size: 69, lr: 8.05e-03, grad_scale: 8.0 2024-09-17 14:47:58,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=242180.0, ans=0.125 2024-09-17 14:48:02,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=242180.0, ans=0.04949747468305833 2024-09-17 14:48:13,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=242220.0, ans=0.125 2024-09-17 14:48:26,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=242220.0, ans=0.125 2024-09-17 14:48:29,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=242260.0, ans=0.125 2024-09-17 14:48:32,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=242260.0, ans=0.125 2024-09-17 14:48:43,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=242300.0, ans=0.0 2024-09-17 14:48:44,855 INFO [train.py:1198] (1/2) Epoch 14, batch 1750, loss[loss=0.2299, ctc_loss=0.1304, cr_loss=0.3569, attn_decoder_loss=0.2331, over 29388.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1538, cr_loss=0.394, attn_decoder_loss=0.2586, over 5789017.42 frames. ], batch size: 67, lr: 8.04e-03, grad_scale: 8.0 2024-09-17 14:48:51,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=242300.0, ans=0.125 2024-09-17 14:49:00,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=242340.0, ans=0.125 2024-09-17 14:49:03,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=242340.0, ans=0.0 2024-09-17 14:49:19,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=242380.0, ans=0.0 2024-09-17 14:49:44,145 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.812e+01 9.337e+01 1.025e+02 2.569e+02, threshold=1.867e+02, percent-clipped=1.0 2024-09-17 14:49:45,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242460.0, ans=0.1 2024-09-17 14:49:48,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.41 vs. limit=15.0 2024-09-17 14:49:54,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=242460.0, ans=0.0 2024-09-17 14:50:00,703 INFO [train.py:1198] (1/2) Epoch 14, batch 1800, loss[loss=0.262, ctc_loss=0.1524, cr_loss=0.4132, attn_decoder_loss=0.2649, over 29675.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1539, cr_loss=0.3945, attn_decoder_loss=0.2587, over 5790796.23 frames. ], batch size: 83, lr: 8.04e-03, grad_scale: 8.0 2024-09-17 14:50:07,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=242500.0, ans=10.0 2024-09-17 14:50:33,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=242580.0, ans=0.125 2024-09-17 14:50:45,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=242580.0, ans=0.125 2024-09-17 14:50:51,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=242620.0, ans=0.025 2024-09-17 14:50:59,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=15.0 2024-09-17 14:51:20,990 INFO [train.py:1198] (1/2) Epoch 14, batch 1850, loss[loss=0.2589, ctc_loss=0.1481, cr_loss=0.3885, attn_decoder_loss=0.2626, over 29654.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1534, cr_loss=0.3939, attn_decoder_loss=0.2585, over 5797314.36 frames. ], batch size: 86, lr: 8.04e-03, grad_scale: 8.0 2024-09-17 14:51:26,060 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:51:27,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=242700.0, ans=0.0 2024-09-17 14:51:36,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=242740.0, ans=0.05 2024-09-17 14:52:01,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=242780.0, ans=10.0 2024-09-17 14:52:19,865 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.993e+01 9.601e+01 1.027e+02 2.401e+02, threshold=1.920e+02, percent-clipped=1.0 2024-09-17 14:52:29,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=242860.0, ans=0.05 2024-09-17 14:52:32,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=242860.0, ans=0.0 2024-09-17 14:52:36,268 INFO [train.py:1198] (1/2) Epoch 14, batch 1900, loss[loss=0.2794, ctc_loss=0.167, cr_loss=0.4393, attn_decoder_loss=0.2821, over 29701.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1537, cr_loss=0.3948, attn_decoder_loss=0.259, over 5805717.65 frames. ], batch size: 89, lr: 8.03e-03, grad_scale: 8.0 2024-09-17 14:52:51,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=242940.0, ans=0.125 2024-09-17 14:53:08,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242980.0, ans=0.1 2024-09-17 14:53:08,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=242980.0, ans=0.025 2024-09-17 14:53:14,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=242980.0, ans=0.025 2024-09-17 14:53:35,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.44 vs. limit=15.0 2024-09-17 14:53:36,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=243060.0, ans=0.125 2024-09-17 14:53:46,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=15.0 2024-09-17 14:53:49,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=243060.0, ans=0.025 2024-09-17 14:53:51,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=243100.0, ans=0.125 2024-09-17 14:53:52,713 INFO [train.py:1198] (1/2) Epoch 14, batch 1950, loss[loss=0.2494, ctc_loss=0.1491, cr_loss=0.3832, attn_decoder_loss=0.252, over 29452.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1545, cr_loss=0.397, attn_decoder_loss=0.2604, over 5819876.15 frames. ], batch size: 78, lr: 8.03e-03, grad_scale: 8.0 2024-09-17 14:53:53,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=243100.0, ans=15.0 2024-09-17 14:54:08,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=243140.0, ans=0.125 2024-09-17 14:54:12,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=243140.0, ans=0.07 2024-09-17 14:54:57,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.855e+01 9.181e+01 9.574e+01 1.007e+02 1.903e+02, threshold=1.915e+02, percent-clipped=0.0 2024-09-17 14:55:02,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-09-17 14:55:13,060 INFO [train.py:1198] (1/2) Epoch 14, batch 2000, loss[loss=0.2361, ctc_loss=0.1465, cr_loss=0.408, attn_decoder_loss=0.237, over 29354.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1549, cr_loss=0.3976, attn_decoder_loss=0.2605, over 5796931.29 frames. ], batch size: 67, lr: 8.03e-03, grad_scale: 8.0 2024-09-17 14:55:30,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=243340.0, ans=10.0 2024-09-17 14:55:58,952 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:56:07,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=243420.0, ans=0.125 2024-09-17 14:56:29,005 INFO [train.py:1198] (1/2) Epoch 14, batch 2050, loss[loss=0.2312, ctc_loss=0.1285, cr_loss=0.3479, attn_decoder_loss=0.2349, over 29419.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1547, cr_loss=0.3966, attn_decoder_loss=0.2598, over 5788485.26 frames. ], batch size: 70, lr: 8.02e-03, grad_scale: 8.0 2024-09-17 14:56:36,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=243500.0, ans=0.125 2024-09-17 14:56:54,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.02 vs. limit=15.0 2024-09-17 14:57:04,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2024-09-17 14:57:12,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.47 vs. limit=22.5 2024-09-17 14:57:19,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.14 vs. limit=10.0 2024-09-17 14:57:25,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=243620.0, ans=0.125 2024-09-17 14:57:29,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.883e+01 9.401e+01 1.013e+02 1.488e+02, threshold=1.880e+02, percent-clipped=0.0 2024-09-17 14:57:44,649 INFO [train.py:1198] (1/2) Epoch 14, batch 2100, loss[loss=0.26, ctc_loss=0.1554, cr_loss=0.413, attn_decoder_loss=0.2625, over 29784.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1537, cr_loss=0.3949, attn_decoder_loss=0.2585, over 5801368.28 frames. ], batch size: 81, lr: 8.02e-03, grad_scale: 8.0 2024-09-17 14:57:44,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=243700.0, ans=0.125 2024-09-17 14:57:55,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=243700.0, ans=0.125 2024-09-17 14:58:03,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=243740.0, ans=10.0 2024-09-17 14:58:36,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=243820.0, ans=0.125 2024-09-17 14:58:46,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=243820.0, ans=0.05 2024-09-17 14:58:59,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=22.5 2024-09-17 14:59:00,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=243860.0, ans=0.0 2024-09-17 14:59:04,626 INFO [train.py:1198] (1/2) Epoch 14, batch 2150, loss[loss=0.2495, ctc_loss=0.1452, cr_loss=0.3914, attn_decoder_loss=0.2524, over 29436.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1525, cr_loss=0.3931, attn_decoder_loss=0.2577, over 5816845.44 frames. ], batch size: 78, lr: 8.02e-03, grad_scale: 8.0 2024-09-17 14:59:08,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=243900.0, ans=0.1 2024-09-17 14:59:17,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=243900.0, ans=0.125 2024-09-17 14:59:43,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=243980.0, ans=0.125 2024-09-17 14:59:51,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2024-09-17 15:00:05,496 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.957e+01 9.631e+01 1.031e+02 4.379e+02, threshold=1.926e+02, percent-clipped=1.0 2024-09-17 15:00:20,630 INFO [train.py:1198] (1/2) Epoch 14, batch 2200, loss[loss=0.2639, ctc_loss=0.1571, cr_loss=0.4135, attn_decoder_loss=0.2666, over 29621.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1524, cr_loss=0.3928, attn_decoder_loss=0.2578, over 5812772.20 frames. ], batch size: 86, lr: 8.01e-03, grad_scale: 8.0 2024-09-17 15:00:23,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=244100.0, ans=22.5 2024-09-17 15:00:27,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=244100.0, ans=0.2 2024-09-17 15:01:03,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=244180.0, ans=0.125 2024-09-17 15:01:12,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2024-09-17 15:01:13,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=244220.0, ans=12.0 2024-09-17 15:01:24,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=244260.0, ans=0.0 2024-09-17 15:01:36,358 INFO [train.py:1198] (1/2) Epoch 14, batch 2250, loss[loss=0.2642, ctc_loss=0.1523, cr_loss=0.3995, attn_decoder_loss=0.2677, over 29715.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1526, cr_loss=0.3928, attn_decoder_loss=0.2581, over 5812857.08 frames. ], batch size: 82, lr: 8.01e-03, grad_scale: 8.0 2024-09-17 15:01:37,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=9.96 vs. limit=12.0 2024-09-17 15:01:38,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.37 vs. limit=15.0 2024-09-17 15:01:39,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=244300.0, ans=0.125 2024-09-17 15:01:44,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=244300.0, ans=0.0 2024-09-17 15:01:48,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=244300.0, ans=0.125 2024-09-17 15:01:51,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=244340.0, ans=0.125 2024-09-17 15:01:54,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=244340.0, ans=0.025 2024-09-17 15:01:55,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2024-09-17 15:02:09,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=244380.0, ans=0.125 2024-09-17 15:02:13,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=12.0 2024-09-17 15:02:31,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=244420.0, ans=0.125 2024-09-17 15:02:38,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=244420.0, ans=0.0 2024-09-17 15:02:40,757 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.724e+01 9.348e+01 1.021e+02 5.677e+02, threshold=1.870e+02, percent-clipped=2.0 2024-09-17 15:02:56,094 INFO [train.py:1198] (1/2) Epoch 14, batch 2300, loss[loss=0.2356, ctc_loss=0.1421, cr_loss=0.3679, attn_decoder_loss=0.2378, over 29350.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1521, cr_loss=0.3918, attn_decoder_loss=0.2572, over 5799164.61 frames. ], batch size: 71, lr: 8.01e-03, grad_scale: 8.0 2024-09-17 15:03:05,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=244500.0, ans=0.0 2024-09-17 15:03:13,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.74 vs. limit=12.0 2024-09-17 15:03:27,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244580.0, ans=0.1 2024-09-17 15:03:46,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=244620.0, ans=0.125 2024-09-17 15:03:47,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=244620.0, ans=0.5 2024-09-17 15:03:59,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=244660.0, ans=0.125 2024-09-17 15:04:00,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.05 vs. limit=10.0 2024-09-17 15:04:11,745 INFO [train.py:1198] (1/2) Epoch 14, batch 2350, loss[loss=0.2635, ctc_loss=0.1668, cr_loss=0.4215, attn_decoder_loss=0.2648, over 29685.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1527, cr_loss=0.3935, attn_decoder_loss=0.2575, over 5805381.94 frames. ], batch size: 83, lr: 8.00e-03, grad_scale: 8.0 2024-09-17 15:04:16,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=244700.0, ans=0.0 2024-09-17 15:04:19,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244700.0, ans=0.1 2024-09-17 15:04:19,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-09-17 15:04:35,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=244740.0, ans=0.0 2024-09-17 15:04:36,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244740.0, ans=0.1 2024-09-17 15:04:40,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=244780.0, ans=0.125 2024-09-17 15:04:43,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=244780.0, ans=0.07 2024-09-17 15:04:50,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244780.0, ans=0.1 2024-09-17 15:04:57,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=244820.0, ans=0.035 2024-09-17 15:05:05,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.42 vs. limit=15.0 2024-09-17 15:05:12,275 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.941e+01 9.524e+01 1.022e+02 1.702e+02, threshold=1.905e+02, percent-clipped=0.0 2024-09-17 15:05:27,597 INFO [train.py:1198] (1/2) Epoch 14, batch 2400, loss[loss=0.2381, ctc_loss=0.1487, cr_loss=0.383, attn_decoder_loss=0.2395, over 29538.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.153, cr_loss=0.3942, attn_decoder_loss=0.2578, over 5809382.44 frames. ], batch size: 76, lr: 8.00e-03, grad_scale: 16.0 2024-09-17 15:05:42,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=244940.0, ans=0.0 2024-09-17 15:05:43,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=244940.0, ans=0.025 2024-09-17 15:06:14,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=245020.0, ans=0.125 2024-09-17 15:06:45,772 INFO [train.py:1198] (1/2) Epoch 14, batch 2450, loss[loss=0.257, ctc_loss=0.1513, cr_loss=0.384, attn_decoder_loss=0.2602, over 29727.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.154, cr_loss=0.3949, attn_decoder_loss=0.2588, over 5785889.42 frames. ], batch size: 82, lr: 8.00e-03, grad_scale: 4.0 2024-09-17 15:06:46,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2024-09-17 15:06:47,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=245100.0, ans=0.125 2024-09-17 15:07:18,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.76 vs. limit=15.0 2024-09-17 15:07:49,610 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 8.888e+01 9.584e+01 1.028e+02 5.136e+02, threshold=1.917e+02, percent-clipped=2.0 2024-09-17 15:08:01,733 INFO [train.py:1198] (1/2) Epoch 14, batch 2500, loss[loss=0.271, ctc_loss=0.1658, cr_loss=0.4228, attn_decoder_loss=0.2733, over 29663.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1542, cr_loss=0.3957, attn_decoder_loss=0.2589, over 5795428.61 frames. ], batch size: 86, lr: 7.99e-03, grad_scale: 8.0 2024-09-17 15:08:33,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245380.0, ans=0.1 2024-09-17 15:08:59,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=245420.0, ans=0.125 2024-09-17 15:09:18,073 INFO [train.py:1198] (1/2) Epoch 14, batch 2550, loss[loss=0.2334, ctc_loss=0.137, cr_loss=0.3653, attn_decoder_loss=0.236, over 29349.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1538, cr_loss=0.395, attn_decoder_loss=0.2588, over 5798068.04 frames. ], batch size: 67, lr: 7.99e-03, grad_scale: 8.0 2024-09-17 15:09:18,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=245500.0, ans=0.0 2024-09-17 15:09:40,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.18 vs. limit=15.0 2024-09-17 15:10:00,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=245580.0, ans=0.125 2024-09-17 15:10:06,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=245620.0, ans=0.125 2024-09-17 15:10:27,971 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.823e+01 9.211e+01 1.016e+02 2.509e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-17 15:10:35,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=245660.0, ans=0.0 2024-09-17 15:10:38,540 INFO [train.py:1198] (1/2) Epoch 14, batch 2600, loss[loss=0.2492, ctc_loss=0.1447, cr_loss=0.3808, attn_decoder_loss=0.2523, over 29446.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1539, cr_loss=0.3952, attn_decoder_loss=0.2591, over 5794084.86 frames. ], batch size: 78, lr: 7.99e-03, grad_scale: 8.0 2024-09-17 15:10:47,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=245700.0, ans=0.125 2024-09-17 15:10:50,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=245700.0, ans=0.125 2024-09-17 15:10:51,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-09-17 15:11:01,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2024-09-17 15:11:07,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=245780.0, ans=0.125 2024-09-17 15:11:11,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=245780.0, ans=0.0 2024-09-17 15:11:11,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=245780.0, ans=0.0 2024-09-17 15:11:16,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245780.0, ans=0.1 2024-09-17 15:11:24,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=245820.0, ans=0.125 2024-09-17 15:11:51,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=245860.0, ans=0.0 2024-09-17 15:11:54,250 INFO [train.py:1198] (1/2) Epoch 14, batch 2650, loss[loss=0.2638, ctc_loss=0.1608, cr_loss=0.3981, attn_decoder_loss=0.2664, over 29280.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1538, cr_loss=0.3956, attn_decoder_loss=0.2593, over 5801624.64 frames. ], batch size: 100, lr: 7.98e-03, grad_scale: 8.0 2024-09-17 15:12:01,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2024-09-17 15:12:14,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=245940.0, ans=0.95 2024-09-17 15:12:15,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=245940.0, ans=10.0 2024-09-17 15:12:21,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=22.5 2024-09-17 15:12:27,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=245980.0, ans=0.2 2024-09-17 15:12:59,468 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.821e+01 9.405e+01 9.920e+01 1.834e+02, threshold=1.881e+02, percent-clipped=0.0 2024-09-17 15:13:10,174 INFO [train.py:1198] (1/2) Epoch 14, batch 2700, loss[loss=0.2673, ctc_loss=0.1683, cr_loss=0.4212, attn_decoder_loss=0.2689, over 29547.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.154, cr_loss=0.3957, attn_decoder_loss=0.2595, over 5797499.20 frames. ], batch size: 87, lr: 7.98e-03, grad_scale: 8.0 2024-09-17 15:13:16,370 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:13:20,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=246100.0, ans=0.125 2024-09-17 15:13:30,036 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:13:34,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=246140.0, ans=0.04949747468305833 2024-09-17 15:13:51,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=246180.0, ans=0.125 2024-09-17 15:13:52,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=246180.0, ans=0.125 2024-09-17 15:14:08,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=246220.0, ans=0.04949747468305833 2024-09-17 15:14:10,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=246220.0, ans=0.0 2024-09-17 15:14:18,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=246260.0, ans=0.125 2024-09-17 15:14:27,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=246260.0, ans=0.125 2024-09-17 15:14:30,686 INFO [train.py:1198] (1/2) Epoch 14, batch 2750, loss[loss=0.2499, ctc_loss=0.1432, cr_loss=0.3919, attn_decoder_loss=0.2531, over 29497.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1531, cr_loss=0.3942, attn_decoder_loss=0.2585, over 5796110.93 frames. ], batch size: 75, lr: 7.98e-03, grad_scale: 8.0 2024-09-17 15:14:31,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=246300.0, ans=0.0 2024-09-17 15:14:43,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.15 vs. limit=15.0 2024-09-17 15:15:01,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=246380.0, ans=0.0 2024-09-17 15:15:04,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=246380.0, ans=0.2 2024-09-17 15:15:06,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.64 vs. limit=15.0 2024-09-17 15:15:19,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=246420.0, ans=0.125 2024-09-17 15:15:28,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=246420.0, ans=0.125 2024-09-17 15:15:36,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.822e+01 9.426e+01 1.011e+02 2.167e+02, threshold=1.885e+02, percent-clipped=1.0 2024-09-17 15:15:47,142 INFO [train.py:1198] (1/2) Epoch 14, batch 2800, loss[loss=0.2877, ctc_loss=0.203, cr_loss=0.4451, attn_decoder_loss=0.2872, over 20013.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1535, cr_loss=0.3941, attn_decoder_loss=0.2585, over 5777997.30 frames. ], batch size: 210, lr: 7.97e-03, grad_scale: 16.0 2024-09-17 15:15:48,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=246500.0, ans=0.125 2024-09-17 15:15:51,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=246500.0, ans=0.09899494936611666 2024-09-17 15:17:02,336 INFO [train.py:1198] (1/2) Epoch 14, batch 2850, loss[loss=0.2554, ctc_loss=0.151, cr_loss=0.3863, attn_decoder_loss=0.2584, over 29519.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1538, cr_loss=0.394, attn_decoder_loss=0.2591, over 5762706.05 frames. ], batch size: 77, lr: 7.97e-03, grad_scale: 8.0 2024-09-17 15:17:08,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=246700.0, ans=0.125 2024-09-17 15:17:17,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=246740.0, ans=0.125 2024-09-17 15:17:28,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246740.0, ans=0.1 2024-09-17 15:17:28,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.96 vs. limit=15.0 2024-09-17 15:17:49,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.69 vs. limit=10.0 2024-09-17 15:17:49,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-09-17 15:17:54,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=246820.0, ans=0.0 2024-09-17 15:18:12,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.83 vs. limit=22.5 2024-09-17 15:18:13,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.023e+01 9.108e+01 9.602e+01 1.037e+02 1.624e+02, threshold=1.920e+02, percent-clipped=0.0 2024-09-17 15:18:21,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=246900.0, ans=0.125 2024-09-17 15:18:22,633 INFO [train.py:1198] (1/2) Epoch 14, batch 2900, loss[loss=0.2504, ctc_loss=0.1522, cr_loss=0.4052, attn_decoder_loss=0.2524, over 29440.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1542, cr_loss=0.3947, attn_decoder_loss=0.2598, over 5788110.99 frames. ], batch size: 79, lr: 7.97e-03, grad_scale: 8.0 2024-09-17 15:18:58,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2024-09-17 15:19:20,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=247020.0, ans=0.125 2024-09-17 15:19:29,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=247060.0, ans=0.0 2024-09-17 15:19:38,653 INFO [train.py:1198] (1/2) Epoch 14, batch 2950, loss[loss=0.2502, ctc_loss=0.1481, cr_loss=0.3945, attn_decoder_loss=0.2528, over 29519.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1531, cr_loss=0.3927, attn_decoder_loss=0.2585, over 5781320.48 frames. ], batch size: 75, lr: 7.97e-03, grad_scale: 8.0 2024-09-17 15:19:45,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=10.27 vs. limit=15.0 2024-09-17 15:19:57,338 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:19:57,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=247140.0, ans=0.025 2024-09-17 15:20:15,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=247180.0, ans=0.125 2024-09-17 15:20:15,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=247180.0, ans=0.2 2024-09-17 15:20:17,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=247180.0, ans=0.125 2024-09-17 15:20:31,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=247220.0, ans=0.125 2024-09-17 15:20:46,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.963e+01 9.607e+01 1.034e+02 4.390e+02, threshold=1.921e+02, percent-clipped=3.0 2024-09-17 15:20:55,396 INFO [train.py:1198] (1/2) Epoch 14, batch 3000, loss[loss=0.2543, ctc_loss=0.144, cr_loss=0.3597, attn_decoder_loss=0.2585, over 29754.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1531, cr_loss=0.3935, attn_decoder_loss=0.2585, over 5782601.35 frames. ], batch size: 81, lr: 7.96e-03, grad_scale: 8.0 2024-09-17 15:20:55,396 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 15:21:13,887 INFO [train.py:1230] (1/2) Epoch 14, validation: loss=0.212, ctc_loss=0.04343, cr_loss=5.03e-15, attn_decoder_loss=0.2308, over 944034.00 frames. 2024-09-17 15:21:13,887 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 15:21:28,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=247340.0, ans=0.0 2024-09-17 15:21:28,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=247340.0, ans=0.125 2024-09-17 15:21:40,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-17 15:22:34,537 INFO [train.py:1198] (1/2) Epoch 14, batch 3050, loss[loss=0.2463, ctc_loss=0.1425, cr_loss=0.3789, attn_decoder_loss=0.2494, over 29542.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.154, cr_loss=0.3955, attn_decoder_loss=0.2595, over 5776991.26 frames. ], batch size: 76, lr: 7.96e-03, grad_scale: 8.0 2024-09-17 15:22:55,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=247540.0, ans=0.1 2024-09-17 15:22:56,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=247540.0, ans=0.125 2024-09-17 15:23:08,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-17 15:23:12,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=247580.0, ans=0.0 2024-09-17 15:23:40,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.297e+01 9.179e+01 9.620e+01 1.029e+02 1.592e+02, threshold=1.924e+02, percent-clipped=0.0 2024-09-17 15:23:49,652 INFO [train.py:1198] (1/2) Epoch 14, batch 3100, loss[loss=0.2735, ctc_loss=0.1657, cr_loss=0.404, attn_decoder_loss=0.2765, over 29298.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1541, cr_loss=0.3949, attn_decoder_loss=0.2592, over 5777127.31 frames. ], batch size: 100, lr: 7.96e-03, grad_scale: 8.0 2024-09-17 15:23:53,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.55 vs. limit=15.0 2024-09-17 15:24:03,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=247740.0, ans=0.1 2024-09-17 15:24:15,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=247740.0, ans=0.125 2024-09-17 15:24:20,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=247780.0, ans=10.0 2024-09-17 15:24:41,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=247820.0, ans=0.2 2024-09-17 15:24:49,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247860.0, ans=0.1 2024-09-17 15:24:55,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=247860.0, ans=0.0 2024-09-17 15:25:01,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=247860.0, ans=0.07 2024-09-17 15:25:05,570 INFO [train.py:1198] (1/2) Epoch 14, batch 3150, loss[loss=0.2663, ctc_loss=0.1605, cr_loss=0.4141, attn_decoder_loss=0.2688, over 28830.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1535, cr_loss=0.3942, attn_decoder_loss=0.259, over 5783249.21 frames. ], batch size: 104, lr: 7.95e-03, grad_scale: 4.0 2024-09-17 15:25:35,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=247940.0, ans=0.025 2024-09-17 15:26:17,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=248060.0, ans=0.125 2024-09-17 15:26:18,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 9.083e+01 9.655e+01 1.039e+02 2.253e+02, threshold=1.931e+02, percent-clipped=1.0 2024-09-17 15:26:25,935 INFO [train.py:1198] (1/2) Epoch 14, batch 3200, loss[loss=0.2494, ctc_loss=0.1466, cr_loss=0.3945, attn_decoder_loss=0.252, over 29425.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1529, cr_loss=0.3929, attn_decoder_loss=0.2583, over 5794016.81 frames. ], batch size: 79, lr: 7.95e-03, grad_scale: 8.0 2024-09-17 15:26:30,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=248100.0, ans=0.125 2024-09-17 15:26:47,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=248140.0, ans=0.2 2024-09-17 15:26:56,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=248180.0, ans=0.0 2024-09-17 15:27:22,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=248220.0, ans=0.0 2024-09-17 15:27:42,196 INFO [train.py:1198] (1/2) Epoch 14, batch 3250, loss[loss=0.2784, ctc_loss=0.1702, cr_loss=0.4278, attn_decoder_loss=0.2809, over 29697.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1531, cr_loss=0.3938, attn_decoder_loss=0.2588, over 5800128.76 frames. ], batch size: 84, lr: 7.95e-03, grad_scale: 8.0 2024-09-17 15:28:31,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=248420.0, ans=10.0 2024-09-17 15:28:49,497 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.075e+01 9.653e+01 1.031e+02 3.050e+02, threshold=1.931e+02, percent-clipped=1.0 2024-09-17 15:28:57,462 INFO [train.py:1198] (1/2) Epoch 14, batch 3300, loss[loss=0.2674, ctc_loss=0.1645, cr_loss=0.4142, attn_decoder_loss=0.2696, over 28503.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1528, cr_loss=0.3936, attn_decoder_loss=0.258, over 5798173.19 frames. ], batch size: 112, lr: 7.94e-03, grad_scale: 8.0 2024-09-17 15:29:23,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.79 vs. limit=15.0 2024-09-17 15:29:26,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2024-09-17 15:29:31,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-09-17 15:29:33,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=248580.0, ans=0.125 2024-09-17 15:29:39,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=248580.0, ans=0.125 2024-09-17 15:29:39,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=248580.0, ans=0.1 2024-09-17 15:30:03,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=248660.0, ans=0.2 2024-09-17 15:30:09,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=248660.0, ans=0.125 2024-09-17 15:30:16,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=248700.0, ans=0.125 2024-09-17 15:30:17,822 INFO [train.py:1198] (1/2) Epoch 14, batch 3350, loss[loss=0.2643, ctc_loss=0.1494, cr_loss=0.3591, attn_decoder_loss=0.2691, over 28736.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1533, cr_loss=0.3936, attn_decoder_loss=0.2586, over 5774849.12 frames. ], batch size: 104, lr: 7.94e-03, grad_scale: 8.0 2024-09-17 15:30:41,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-09-17 15:30:41,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.16 vs. limit=15.0 2024-09-17 15:30:51,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=248780.0, ans=0.125 2024-09-17 15:30:58,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=248780.0, ans=0.5 2024-09-17 15:31:00,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=248780.0, ans=0.0 2024-09-17 15:31:02,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=248820.0, ans=0.0 2024-09-17 15:31:04,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2024-09-17 15:31:05,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=248820.0, ans=0.2 2024-09-17 15:31:11,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=248820.0, ans=0.2 2024-09-17 15:31:13,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=248820.0, ans=0.2 2024-09-17 15:31:26,420 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 9.128e+01 9.625e+01 1.037e+02 1.571e+02, threshold=1.925e+02, percent-clipped=0.0 2024-09-17 15:31:34,158 INFO [train.py:1198] (1/2) Epoch 14, batch 3400, loss[loss=0.2333, ctc_loss=0.1319, cr_loss=0.3728, attn_decoder_loss=0.2363, over 29369.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1534, cr_loss=0.3933, attn_decoder_loss=0.2587, over 5768234.91 frames. ], batch size: 67, lr: 7.94e-03, grad_scale: 8.0 2024-09-17 15:31:34,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=248900.0, ans=0.125 2024-09-17 15:31:34,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=248900.0, ans=0.125 2024-09-17 15:32:25,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=249020.0, ans=0.125 2024-09-17 15:32:26,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=249020.0, ans=0.125 2024-09-17 15:32:29,235 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:32:30,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=249020.0, ans=0.0 2024-09-17 15:32:32,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=249020.0, ans=0.125 2024-09-17 15:32:39,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=249060.0, ans=0.125 2024-09-17 15:32:41,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=249060.0, ans=0.0 2024-09-17 15:32:42,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=249060.0, ans=0.0 2024-09-17 15:32:47,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=249060.0, ans=0.125 2024-09-17 15:32:50,243 INFO [train.py:1198] (1/2) Epoch 14, batch 3450, loss[loss=0.2667, ctc_loss=0.1581, cr_loss=0.4019, attn_decoder_loss=0.2699, over 28409.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1536, cr_loss=0.394, attn_decoder_loss=0.2592, over 5775614.99 frames. ], batch size: 111, lr: 7.93e-03, grad_scale: 8.0 2024-09-17 15:33:05,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-09-17 15:33:12,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2024-09-17 15:33:29,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=249180.0, ans=0.125 2024-09-17 15:33:38,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=249220.0, ans=0.125 2024-09-17 15:33:38,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=249220.0, ans=0.2 2024-09-17 15:34:03,111 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.913e+01 9.467e+01 9.956e+01 4.435e+02, threshold=1.893e+02, percent-clipped=2.0 2024-09-17 15:34:06,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=249260.0, ans=0.0 2024-09-17 15:34:08,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=249260.0, ans=0.2 2024-09-17 15:34:10,848 INFO [train.py:1198] (1/2) Epoch 14, batch 3500, loss[loss=0.2282, ctc_loss=0.129, cr_loss=0.3532, attn_decoder_loss=0.2314, over 29316.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1534, cr_loss=0.3941, attn_decoder_loss=0.2586, over 5777243.24 frames. ], batch size: 71, lr: 7.93e-03, grad_scale: 8.0 2024-09-17 15:34:11,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.49 vs. limit=6.0 2024-09-17 15:34:23,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=249300.0, ans=0.0 2024-09-17 15:34:42,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249380.0, ans=0.1 2024-09-17 15:34:47,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=249380.0, ans=0.1 2024-09-17 15:34:47,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=249380.0, ans=0.2 2024-09-17 15:34:56,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249420.0, ans=0.1 2024-09-17 15:34:57,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=249420.0, ans=0.2 2024-09-17 15:35:12,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=249460.0, ans=0.125 2024-09-17 15:35:16,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=249460.0, ans=0.1 2024-09-17 15:35:21,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=249460.0, ans=0.125 2024-09-17 15:35:24,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=249500.0, ans=0.125 2024-09-17 15:35:25,499 INFO [train.py:1198] (1/2) Epoch 14, batch 3550, loss[loss=0.268, ctc_loss=0.1577, cr_loss=0.4012, attn_decoder_loss=0.2714, over 29704.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1528, cr_loss=0.3934, attn_decoder_loss=0.2584, over 5782868.08 frames. ], batch size: 89, lr: 7.93e-03, grad_scale: 8.0 2024-09-17 15:35:27,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=249500.0, ans=0.125 2024-09-17 15:36:20,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.15 vs. limit=22.5 2024-09-17 15:36:24,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249660.0, ans=0.1 2024-09-17 15:36:27,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=249660.0, ans=0.0 2024-09-17 15:36:33,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.867e+01 9.428e+01 1.003e+02 3.029e+02, threshold=1.886e+02, percent-clipped=1.0 2024-09-17 15:36:39,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=249700.0, ans=0.125 2024-09-17 15:36:40,442 INFO [train.py:1198] (1/2) Epoch 14, batch 3600, loss[loss=0.2429, ctc_loss=0.1454, cr_loss=0.3692, attn_decoder_loss=0.2455, over 29521.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1531, cr_loss=0.3937, attn_decoder_loss=0.2584, over 5793121.65 frames. ], batch size: 77, lr: 7.92e-03, grad_scale: 16.0 2024-09-17 15:36:43,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=249700.0, ans=0.035 2024-09-17 15:36:48,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=249700.0, ans=0.0 2024-09-17 15:37:01,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=249740.0, ans=0.125 2024-09-17 15:37:21,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.29 vs. limit=10.0 2024-09-17 15:37:27,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=249820.0, ans=0.07 2024-09-17 15:37:28,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=249820.0, ans=0.125 2024-09-17 15:37:46,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=249860.0, ans=0.125 2024-09-17 15:37:53,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-17 15:37:55,263 INFO [train.py:1198] (1/2) Epoch 14, batch 3650, loss[loss=0.2655, ctc_loss=0.1656, cr_loss=0.3917, attn_decoder_loss=0.2679, over 29530.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.152, cr_loss=0.3925, attn_decoder_loss=0.2577, over 5795341.63 frames. ], batch size: 90, lr: 7.92e-03, grad_scale: 8.0 2024-09-17 15:37:56,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=249900.0, ans=0.0 2024-09-17 15:38:05,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=249900.0, ans=0.025 2024-09-17 15:38:13,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=249940.0, ans=0.2 2024-09-17 15:38:16,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=249940.0, ans=0.125 2024-09-17 15:38:39,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=249980.0, ans=0.95 2024-09-17 15:38:46,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250020.0, ans=0.1 2024-09-17 15:38:50,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-09-17 15:38:58,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=250060.0, ans=0.125 2024-09-17 15:39:01,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=250060.0, ans=0.2 2024-09-17 15:39:01,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=250060.0, ans=0.0 2024-09-17 15:39:05,888 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.885e+01 9.531e+01 1.024e+02 1.907e+02, threshold=1.906e+02, percent-clipped=1.0 2024-09-17 15:39:09,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=250060.0, ans=0.125 2024-09-17 15:39:11,991 INFO [train.py:1198] (1/2) Epoch 14, batch 3700, loss[loss=0.2693, ctc_loss=0.1652, cr_loss=0.3945, attn_decoder_loss=0.272, over 29708.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1519, cr_loss=0.3921, attn_decoder_loss=0.2576, over 5805218.93 frames. ], batch size: 84, lr: 7.92e-03, grad_scale: 8.0 2024-09-17 15:39:16,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=250100.0, ans=0.0 2024-09-17 15:39:42,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=250180.0, ans=0.125 2024-09-17 15:40:03,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=250220.0, ans=0.125 2024-09-17 15:40:16,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=250260.0, ans=0.2 2024-09-17 15:40:17,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2024-09-17 15:40:28,253 INFO [train.py:1198] (1/2) Epoch 14, batch 3750, loss[loss=0.2229, ctc_loss=0.127, cr_loss=0.338, attn_decoder_loss=0.2261, over 29341.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.152, cr_loss=0.3923, attn_decoder_loss=0.2577, over 5809359.21 frames. ], batch size: 67, lr: 7.92e-03, grad_scale: 8.0 2024-09-17 15:40:35,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=250300.0, ans=0.2 2024-09-17 15:40:37,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=250300.0, ans=0.125 2024-09-17 15:40:43,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.63 vs. limit=22.5 2024-09-17 15:41:02,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=250380.0, ans=0.025 2024-09-17 15:41:06,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250380.0, ans=0.1 2024-09-17 15:41:06,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=250380.0, ans=0.0 2024-09-17 15:41:09,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=250380.0, ans=0.125 2024-09-17 15:41:15,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=250420.0, ans=0.125 2024-09-17 15:41:25,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-09-17 15:41:37,294 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.041e+01 9.799e+01 1.080e+02 3.062e+02, threshold=1.960e+02, percent-clipped=2.0 2024-09-17 15:41:43,359 INFO [train.py:1198] (1/2) Epoch 14, batch 3800, loss[loss=0.2578, ctc_loss=0.1528, cr_loss=0.393, attn_decoder_loss=0.2607, over 29635.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1519, cr_loss=0.3917, attn_decoder_loss=0.2575, over 5799053.55 frames. ], batch size: 86, lr: 7.91e-03, grad_scale: 8.0 2024-09-17 15:42:10,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=250540.0, ans=0.1 2024-09-17 15:42:19,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=250580.0, ans=0.0 2024-09-17 15:42:55,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=250660.0, ans=0.125 2024-09-17 15:42:57,825 INFO [train.py:1198] (1/2) Epoch 14, batch 3850, loss[loss=0.2677, ctc_loss=0.1547, cr_loss=0.3975, attn_decoder_loss=0.2714, over 29338.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1513, cr_loss=0.3911, attn_decoder_loss=0.2574, over 5811447.40 frames. ], batch size: 100, lr: 7.91e-03, grad_scale: 8.0 2024-09-17 15:43:11,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.82 vs. limit=15.0 2024-09-17 15:43:20,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=250740.0, ans=0.125 2024-09-17 15:44:06,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.879e+01 9.449e+01 1.016e+02 1.639e+02, threshold=1.890e+02, percent-clipped=0.0 2024-09-17 15:44:14,350 INFO [train.py:1198] (1/2) Epoch 14, batch 3900, loss[loss=0.2582, ctc_loss=0.1484, cr_loss=0.3882, attn_decoder_loss=0.2617, over 29619.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1518, cr_loss=0.3922, attn_decoder_loss=0.2581, over 5816368.22 frames. ], batch size: 86, lr: 7.91e-03, grad_scale: 8.0 2024-09-17 15:44:15,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.55 vs. limit=6.0 2024-09-17 15:44:22,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=250900.0, ans=0.1 2024-09-17 15:44:46,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2024-09-17 15:44:49,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-09-17 15:44:53,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=250980.0, ans=0.0 2024-09-17 15:45:02,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=251020.0, ans=0.0 2024-09-17 15:45:27,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=251100.0, ans=0.125 2024-09-17 15:45:28,471 INFO [train.py:1198] (1/2) Epoch 14, batch 3950, loss[loss=0.2596, ctc_loss=0.1554, cr_loss=0.3885, attn_decoder_loss=0.2626, over 29487.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1512, cr_loss=0.3919, attn_decoder_loss=0.2578, over 5835835.01 frames. ], batch size: 97, lr: 7.90e-03, grad_scale: 8.0 2024-09-17 15:45:40,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=251100.0, ans=0.0 2024-09-17 15:45:40,809 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:45:50,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=22.5 2024-09-17 15:45:58,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=251180.0, ans=0.1 2024-09-17 15:46:02,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=251180.0, ans=0.025 2024-09-17 15:46:38,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.707e+01 9.297e+01 1.013e+02 1.953e+02, threshold=1.859e+02, percent-clipped=1.0 2024-09-17 15:46:39,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.21 vs. limit=15.0 2024-09-17 15:46:44,340 INFO [train.py:1198] (1/2) Epoch 14, batch 4000, loss[loss=0.2377, ctc_loss=0.1411, cr_loss=0.3561, attn_decoder_loss=0.2406, over 29512.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1518, cr_loss=0.3921, attn_decoder_loss=0.2581, over 5814096.12 frames. ], batch size: 74, lr: 7.90e-03, grad_scale: 16.0 2024-09-17 15:47:00,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=251340.0, ans=0.2 2024-09-17 15:47:04,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.46 vs. limit=10.0 2024-09-17 15:47:12,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=251380.0, ans=0.125 2024-09-17 15:47:17,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=251380.0, ans=0.2 2024-09-17 15:47:33,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=251420.0, ans=0.125 2024-09-17 15:47:42,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=251460.0, ans=0.0 2024-09-17 15:47:54,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=251460.0, ans=0.125 2024-09-17 15:47:57,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=251500.0, ans=0.0 2024-09-17 15:47:57,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=251500.0, ans=0.0 2024-09-17 15:47:58,628 INFO [train.py:1198] (1/2) Epoch 14, batch 4050, loss[loss=0.2845, ctc_loss=0.2078, cr_loss=0.4404, attn_decoder_loss=0.2832, over 20062.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1522, cr_loss=0.3927, attn_decoder_loss=0.2583, over 5796555.82 frames. ], batch size: 210, lr: 7.90e-03, grad_scale: 8.0 2024-09-17 15:48:04,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=251500.0, ans=0.125 2024-09-17 15:48:09,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=251500.0, ans=0.125 2024-09-17 15:48:12,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.42 vs. limit=15.0 2024-09-17 15:48:26,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=251580.0, ans=0.125 2024-09-17 15:48:34,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=251580.0, ans=10.0 2024-09-17 15:48:51,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=251620.0, ans=0.0 2024-09-17 15:49:06,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=251660.0, ans=0.125 2024-09-17 15:49:08,784 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 9.172e+01 9.805e+01 1.134e+02 3.956e+02, threshold=1.961e+02, percent-clipped=2.0 2024-09-17 15:49:13,321 INFO [train.py:1198] (1/2) Epoch 14, batch 4100, loss[loss=0.283, ctc_loss=0.1824, cr_loss=0.4385, attn_decoder_loss=0.2844, over 29493.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1528, cr_loss=0.3938, attn_decoder_loss=0.2588, over 5792167.55 frames. ], batch size: 90, lr: 7.89e-03, grad_scale: 8.0 2024-09-17 15:49:15,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=251700.0, ans=0.125 2024-09-17 15:49:16,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251700.0, ans=0.1 2024-09-17 15:49:56,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=251820.0, ans=0.125 2024-09-17 15:50:05,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=251820.0, ans=0.125 2024-09-17 15:50:10,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=251820.0, ans=0.125 2024-09-17 15:50:24,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=251860.0, ans=0.125 2024-09-17 15:50:28,375 INFO [train.py:1198] (1/2) Epoch 14, batch 4150, loss[loss=0.2417, ctc_loss=0.1345, cr_loss=0.374, attn_decoder_loss=0.2453, over 29527.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1523, cr_loss=0.3926, attn_decoder_loss=0.2581, over 5797673.04 frames. ], batch size: 77, lr: 7.89e-03, grad_scale: 8.0 2024-09-17 15:50:52,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.67 vs. limit=22.5 2024-09-17 15:50:54,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.14 vs. limit=15.0 2024-09-17 15:51:04,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.20 vs. limit=15.0 2024-09-17 15:51:11,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=252020.0, ans=0.2 2024-09-17 15:51:37,954 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.795e+01 9.407e+01 9.935e+01 3.892e+02, threshold=1.881e+02, percent-clipped=3.0 2024-09-17 15:51:42,412 INFO [train.py:1198] (1/2) Epoch 14, batch 4200, loss[loss=0.2759, ctc_loss=0.1762, cr_loss=0.4425, attn_decoder_loss=0.2772, over 29490.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1527, cr_loss=0.3936, attn_decoder_loss=0.2583, over 5799196.79 frames. ], batch size: 90, lr: 7.89e-03, grad_scale: 8.0 2024-09-17 15:51:47,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=252100.0, ans=0.125 2024-09-17 15:51:48,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=252100.0, ans=0.125 2024-09-17 15:52:10,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=252180.0, ans=0.0 2024-09-17 15:52:20,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=252180.0, ans=0.125 2024-09-17 15:52:42,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=252260.0, ans=0.125 2024-09-17 15:52:56,847 INFO [train.py:1198] (1/2) Epoch 14, batch 4250, loss[loss=0.2395, ctc_loss=0.1373, cr_loss=0.366, attn_decoder_loss=0.2428, over 29502.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.152, cr_loss=0.3925, attn_decoder_loss=0.2582, over 5804568.01 frames. ], batch size: 74, lr: 7.88e-03, grad_scale: 8.0 2024-09-17 15:53:06,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=12.0 2024-09-17 15:53:19,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=252340.0, ans=0.0 2024-09-17 15:53:31,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=252380.0, ans=0.0 2024-09-17 15:53:35,430 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:53:57,055 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:54:06,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 8.931e+01 9.548e+01 1.031e+02 6.441e+02, threshold=1.910e+02, percent-clipped=3.0 2024-09-17 15:54:11,269 INFO [train.py:1198] (1/2) Epoch 14, batch 4300, loss[loss=0.2535, ctc_loss=0.147, cr_loss=0.4072, attn_decoder_loss=0.2563, over 29517.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1524, cr_loss=0.3923, attn_decoder_loss=0.2584, over 5793844.48 frames. ], batch size: 87, lr: 7.88e-03, grad_scale: 8.0 2024-09-17 15:54:20,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.61 vs. limit=22.5 2024-09-17 15:54:30,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-17 15:54:54,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=252620.0, ans=0.125 2024-09-17 15:54:56,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=252620.0, ans=0.125 2024-09-17 15:55:13,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2024-09-17 15:55:25,800 INFO [train.py:1198] (1/2) Epoch 14, batch 4350, loss[loss=0.2696, ctc_loss=0.1577, cr_loss=0.404, attn_decoder_loss=0.273, over 29505.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1555, cr_loss=0.3985, attn_decoder_loss=0.262, over 5796952.08 frames. ], batch size: 97, lr: 7.88e-03, grad_scale: 8.0 2024-09-17 15:55:33,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=252700.0, ans=0.1 2024-09-17 15:56:05,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=252780.0, ans=0.1 2024-09-17 15:56:21,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=252820.0, ans=0.0 2024-09-17 15:56:34,333 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:56:35,453 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.459e+01 9.298e+01 9.711e+01 1.038e+02 2.895e+02, threshold=1.942e+02, percent-clipped=1.0 2024-09-17 15:56:39,912 INFO [train.py:1198] (1/2) Epoch 14, batch 4400, loss[loss=0.2748, ctc_loss=0.1763, cr_loss=0.4342, attn_decoder_loss=0.2761, over 27244.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1573, cr_loss=0.4015, attn_decoder_loss=0.2641, over 5766382.29 frames. ], batch size: 124, lr: 7.87e-03, grad_scale: 16.0 2024-09-17 15:56:52,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=252900.0, ans=0.125 2024-09-17 15:56:53,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=252940.0, ans=0.1 2024-09-17 15:56:53,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=252940.0, ans=0.0 2024-09-17 15:57:02,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=252940.0, ans=0.05 2024-09-17 15:57:02,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=252940.0, ans=0.125 2024-09-17 15:57:04,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=252940.0, ans=0.125 2024-09-17 15:57:50,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=253060.0, ans=0.05 2024-09-17 15:57:55,063 INFO [train.py:1198] (1/2) Epoch 14, batch 4450, loss[loss=0.2806, ctc_loss=0.1916, cr_loss=0.4312, attn_decoder_loss=0.2809, over 20402.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1628, cr_loss=0.4065, attn_decoder_loss=0.2671, over 5576675.08 frames. ], batch size: 211, lr: 7.87e-03, grad_scale: 8.0 2024-09-17 15:58:22,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-09-17 15:59:08,778 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.421e+01 1.004e+02 1.135e+02 1.248e+02 2.199e+02, threshold=2.271e+02, percent-clipped=1.0 2024-09-17 15:59:10,360 INFO [train.py:1198] (1/2) Epoch 14, batch 4500, loss[loss=0.2791, ctc_loss=0.1893, cr_loss=0.4018, attn_decoder_loss=0.2801, over 20752.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1687, cr_loss=0.4087, attn_decoder_loss=0.2701, over 5237993.98 frames. ], batch size: 209, lr: 7.87e-03, grad_scale: 8.0 2024-09-17 16:00:38,577 INFO [train.py:1198] (1/2) Epoch 15, batch 0, loss[loss=0.2446, ctc_loss=0.1369, cr_loss=0.3715, attn_decoder_loss=0.2483, over 29589.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1369, cr_loss=0.3715, attn_decoder_loss=0.2483, over 29589.00 frames. ], batch size: 73, lr: 7.60e-03, grad_scale: 16.0 2024-09-17 16:00:38,577 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 16:00:56,922 INFO [train.py:1230] (1/2) Epoch 15, validation: loss=0.2128, ctc_loss=0.04201, cr_loss=5.567e-15, attn_decoder_loss=0.2317, over 944034.00 frames. 2024-09-17 16:00:56,922 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 16:00:58,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=253400.0, ans=0.125 2024-09-17 16:01:02,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.66 vs. limit=22.5 2024-09-17 16:01:19,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=253440.0, ans=0.125 2024-09-17 16:01:41,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=253520.0, ans=0.2 2024-09-17 16:01:44,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=253520.0, ans=0.0 2024-09-17 16:01:48,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=253520.0, ans=0.125 2024-09-17 16:02:15,219 INFO [train.py:1198] (1/2) Epoch 15, batch 50, loss[loss=0.2359, ctc_loss=0.1389, cr_loss=0.3804, attn_decoder_loss=0.2383, over 29381.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1567, cr_loss=0.4048, attn_decoder_loss=0.2613, over 1268339.51 frames. ], batch size: 70, lr: 7.60e-03, grad_scale: 8.0 2024-09-17 16:02:20,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2024-09-17 16:02:21,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=253600.0, ans=0.125 2024-09-17 16:02:26,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=253600.0, ans=0.0 2024-09-17 16:02:42,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=253640.0, ans=0.07 2024-09-17 16:02:51,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=253680.0, ans=0.0 2024-09-17 16:02:51,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=253680.0, ans=0.2 2024-09-17 16:02:53,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 9.844e+01 1.055e+02 1.171e+02 3.873e+02, threshold=2.109e+02, percent-clipped=1.0 2024-09-17 16:03:07,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=253720.0, ans=0.0 2024-09-17 16:03:12,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.90 vs. limit=22.5 2024-09-17 16:03:29,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.55 vs. limit=15.0 2024-09-17 16:03:33,153 INFO [train.py:1198] (1/2) Epoch 15, batch 100, loss[loss=0.2471, ctc_loss=0.1368, cr_loss=0.384, attn_decoder_loss=0.2509, over 29540.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.156, cr_loss=0.4015, attn_decoder_loss=0.2617, over 2250973.94 frames. ], batch size: 76, lr: 7.59e-03, grad_scale: 8.0 2024-09-17 16:03:35,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=253800.0, ans=0.1 2024-09-17 16:03:46,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=253840.0, ans=0.2 2024-09-17 16:04:01,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=253880.0, ans=0.2 2024-09-17 16:04:23,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2024-09-17 16:04:40,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=253960.0, ans=0.2 2024-09-17 16:04:40,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253960.0, ans=0.1 2024-09-17 16:04:47,890 INFO [train.py:1198] (1/2) Epoch 15, batch 150, loss[loss=0.2224, ctc_loss=0.1101, cr_loss=0.3166, attn_decoder_loss=0.2278, over 29446.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.153, cr_loss=0.3966, attn_decoder_loss=0.2595, over 3045158.45 frames. ], batch size: 70, lr: 7.59e-03, grad_scale: 8.0 2024-09-17 16:04:48,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.74 vs. limit=22.5 2024-09-17 16:05:21,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=254080.0, ans=0.025 2024-09-17 16:05:25,781 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 8.837e+01 9.448e+01 1.022e+02 1.353e+02, threshold=1.890e+02, percent-clipped=0.0 2024-09-17 16:05:35,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2024-09-17 16:05:43,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.23 vs. limit=22.5 2024-09-17 16:05:48,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=254160.0, ans=0.125 2024-09-17 16:06:03,417 INFO [train.py:1198] (1/2) Epoch 15, batch 200, loss[loss=0.2714, ctc_loss=0.1704, cr_loss=0.4239, attn_decoder_loss=0.2732, over 27024.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1519, cr_loss=0.3947, attn_decoder_loss=0.2584, over 3657014.15 frames. ], batch size: 124, lr: 7.59e-03, grad_scale: 8.0 2024-09-17 16:06:33,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=254240.0, ans=0.025 2024-09-17 16:07:19,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=254360.0, ans=10.0 2024-09-17 16:07:24,603 INFO [train.py:1198] (1/2) Epoch 15, batch 250, loss[loss=0.2765, ctc_loss=0.1637, cr_loss=0.4339, attn_decoder_loss=0.2794, over 29213.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1516, cr_loss=0.3941, attn_decoder_loss=0.2581, over 4137760.04 frames. ], batch size: 100, lr: 7.58e-03, grad_scale: 8.0 2024-09-17 16:07:25,325 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.52 vs. limit=22.5 2024-09-17 16:07:26,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=254400.0, ans=10.0 2024-09-17 16:07:27,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=254400.0, ans=0.125 2024-09-17 16:07:49,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-09-17 16:07:53,431 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:07:59,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=254480.0, ans=0.2 2024-09-17 16:08:02,186 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.896e+01 9.283e+01 1.022e+02 2.095e+02, threshold=1.857e+02, percent-clipped=1.0 2024-09-17 16:08:24,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.33 vs. limit=15.0 2024-09-17 16:08:40,123 INFO [train.py:1198] (1/2) Epoch 15, batch 300, loss[loss=0.2738, ctc_loss=0.1591, cr_loss=0.4185, attn_decoder_loss=0.2772, over 29546.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1504, cr_loss=0.3914, attn_decoder_loss=0.257, over 4507794.15 frames. ], batch size: 92, lr: 7.58e-03, grad_scale: 8.0 2024-09-17 16:08:53,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=254640.0, ans=0.125 2024-09-17 16:09:09,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.16 vs. limit=15.0 2024-09-17 16:09:39,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=254760.0, ans=0.125 2024-09-17 16:09:44,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=254760.0, ans=0.125 2024-09-17 16:09:48,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=254760.0, ans=0.125 2024-09-17 16:09:51,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=22.5 2024-09-17 16:09:56,011 INFO [train.py:1198] (1/2) Epoch 15, batch 350, loss[loss=0.2308, ctc_loss=0.126, cr_loss=0.3433, attn_decoder_loss=0.2348, over 29329.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1506, cr_loss=0.3922, attn_decoder_loss=0.2573, over 4793021.07 frames. ], batch size: 71, lr: 7.58e-03, grad_scale: 8.0 2024-09-17 16:09:56,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=254800.0, ans=0.2 2024-09-17 16:09:56,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=254800.0, ans=0.125 2024-09-17 16:09:56,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2024-09-17 16:10:04,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.99 vs. limit=22.5 2024-09-17 16:10:16,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=254840.0, ans=0.025 2024-09-17 16:10:18,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=254840.0, ans=0.2 2024-09-17 16:10:19,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=22.5 2024-09-17 16:10:31,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=254880.0, ans=0.0 2024-09-17 16:10:35,868 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.706e+01 9.394e+01 1.041e+02 2.813e+02, threshold=1.879e+02, percent-clipped=1.0 2024-09-17 16:10:59,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254960.0, ans=0.1 2024-09-17 16:11:16,017 INFO [train.py:1198] (1/2) Epoch 15, batch 400, loss[loss=0.2572, ctc_loss=0.1455, cr_loss=0.3901, attn_decoder_loss=0.261, over 29711.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1509, cr_loss=0.3917, attn_decoder_loss=0.2573, over 5022670.65 frames. ], batch size: 82, lr: 7.58e-03, grad_scale: 16.0 2024-09-17 16:11:17,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=255000.0, ans=0.2 2024-09-17 16:11:22,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=255000.0, ans=0.125 2024-09-17 16:11:39,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=255040.0, ans=0.0 2024-09-17 16:11:43,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=255040.0, ans=0.0 2024-09-17 16:12:14,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=255120.0, ans=0.0 2024-09-17 16:12:23,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=255160.0, ans=0.0 2024-09-17 16:12:25,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2024-09-17 16:12:32,459 INFO [train.py:1198] (1/2) Epoch 15, batch 450, loss[loss=0.2592, ctc_loss=0.1513, cr_loss=0.3954, attn_decoder_loss=0.2624, over 29689.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1509, cr_loss=0.3917, attn_decoder_loss=0.2573, over 5186122.55 frames. ], batch size: 83, lr: 7.57e-03, grad_scale: 8.0 2024-09-17 16:12:40,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=255200.0, ans=0.125 2024-09-17 16:12:41,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=255200.0, ans=0.025 2024-09-17 16:13:11,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.843e+01 9.415e+01 1.015e+02 2.907e+02, threshold=1.883e+02, percent-clipped=1.0 2024-09-17 16:13:12,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=255280.0, ans=0.125 2024-09-17 16:13:15,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=255280.0, ans=0.125 2024-09-17 16:13:21,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=255320.0, ans=0.07 2024-09-17 16:13:21,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=255320.0, ans=0.125 2024-09-17 16:13:24,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=255320.0, ans=0.025 2024-09-17 16:13:30,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=255320.0, ans=0.0 2024-09-17 16:13:32,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255360.0, ans=0.0 2024-09-17 16:13:36,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=255360.0, ans=0.125 2024-09-17 16:13:42,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=255360.0, ans=0.025 2024-09-17 16:13:48,521 INFO [train.py:1198] (1/2) Epoch 15, batch 500, loss[loss=0.2791, ctc_loss=0.1735, cr_loss=0.4351, attn_decoder_loss=0.2812, over 29480.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1503, cr_loss=0.3911, attn_decoder_loss=0.2568, over 5329229.41 frames. ], batch size: 94, lr: 7.57e-03, grad_scale: 8.0 2024-09-17 16:13:51,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=255400.0, ans=0.95 2024-09-17 16:13:56,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=255400.0, ans=0.1 2024-09-17 16:14:13,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255440.0, ans=0.1 2024-09-17 16:14:16,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=255440.0, ans=0.125 2024-09-17 16:14:27,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=255480.0, ans=0.0 2024-09-17 16:14:29,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=255480.0, ans=0.0 2024-09-17 16:14:50,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=255560.0, ans=0.025 2024-09-17 16:14:51,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=255560.0, ans=0.04949747468305833 2024-09-17 16:15:08,804 INFO [train.py:1198] (1/2) Epoch 15, batch 550, loss[loss=0.2701, ctc_loss=0.157, cr_loss=0.4097, attn_decoder_loss=0.2736, over 28804.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1505, cr_loss=0.3908, attn_decoder_loss=0.2568, over 5421395.06 frames. ], batch size: 104, lr: 7.57e-03, grad_scale: 8.0 2024-09-17 16:15:13,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=255600.0, ans=0.125 2024-09-17 16:15:26,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=12.0 2024-09-17 16:15:34,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=255640.0, ans=0.125 2024-09-17 16:15:42,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=255680.0, ans=0.0 2024-09-17 16:15:48,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.993e+01 9.917e+01 1.076e+02 7.641e+02, threshold=1.983e+02, percent-clipped=4.0 2024-09-17 16:16:08,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=255760.0, ans=0.125 2024-09-17 16:16:24,501 INFO [train.py:1198] (1/2) Epoch 15, batch 600, loss[loss=0.269, ctc_loss=0.1601, cr_loss=0.4161, attn_decoder_loss=0.2719, over 29280.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1509, cr_loss=0.3922, attn_decoder_loss=0.2572, over 5507696.85 frames. ], batch size: 100, lr: 7.56e-03, grad_scale: 8.0 2024-09-17 16:16:31,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-09-17 16:16:35,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2024-09-17 16:16:35,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=255800.0, ans=0.125 2024-09-17 16:16:42,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2024-09-17 16:16:53,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=255880.0, ans=0.2 2024-09-17 16:17:21,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=255920.0, ans=0.125 2024-09-17 16:17:23,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=255960.0, ans=0.125 2024-09-17 16:17:28,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2024-09-17 16:17:47,270 INFO [train.py:1198] (1/2) Epoch 15, batch 650, loss[loss=0.2567, ctc_loss=0.1516, cr_loss=0.4045, attn_decoder_loss=0.2594, over 29750.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.15, cr_loss=0.3912, attn_decoder_loss=0.2566, over 5584347.18 frames. ], batch size: 81, lr: 7.56e-03, grad_scale: 8.0 2024-09-17 16:17:47,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=256000.0, ans=0.0 2024-09-17 16:17:59,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2024-09-17 16:18:04,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=256040.0, ans=0.125 2024-09-17 16:18:26,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=256080.0, ans=0.125 2024-09-17 16:18:28,869 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.547e+01 9.070e+01 9.577e+01 1.264e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-17 16:18:30,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=256080.0, ans=0.0 2024-09-17 16:18:31,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2024-09-17 16:18:34,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.27 vs. limit=10.0 2024-09-17 16:18:34,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-09-17 16:18:36,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-17 16:18:48,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=256160.0, ans=0.0 2024-09-17 16:18:56,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=256160.0, ans=0.1 2024-09-17 16:18:59,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=256160.0, ans=0.04949747468305833 2024-09-17 16:19:07,282 INFO [train.py:1198] (1/2) Epoch 15, batch 700, loss[loss=0.2394, ctc_loss=0.1419, cr_loss=0.381, attn_decoder_loss=0.2418, over 29540.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1502, cr_loss=0.3917, attn_decoder_loss=0.257, over 5635143.59 frames. ], batch size: 76, lr: 7.56e-03, grad_scale: 8.0 2024-09-17 16:19:41,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=256280.0, ans=0.95 2024-09-17 16:19:47,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=256280.0, ans=0.125 2024-09-17 16:20:04,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=256320.0, ans=0.125 2024-09-17 16:20:14,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256360.0, ans=0.1 2024-09-17 16:20:23,532 INFO [train.py:1198] (1/2) Epoch 15, batch 750, loss[loss=0.2553, ctc_loss=0.1525, cr_loss=0.4208, attn_decoder_loss=0.2574, over 29688.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1499, cr_loss=0.3909, attn_decoder_loss=0.2568, over 5674362.80 frames. ], batch size: 82, lr: 7.55e-03, grad_scale: 8.0 2024-09-17 16:20:26,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=256400.0, ans=0.125 2024-09-17 16:20:32,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256400.0, ans=0.1 2024-09-17 16:20:38,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=256440.0, ans=0.125 2024-09-17 16:20:59,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=22.5 2024-09-17 16:21:02,920 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.739e+01 9.295e+01 9.820e+01 3.813e+02, threshold=1.859e+02, percent-clipped=2.0 2024-09-17 16:21:06,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=256480.0, ans=0.1 2024-09-17 16:21:09,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=256520.0, ans=0.2 2024-09-17 16:21:12,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=256520.0, ans=0.125 2024-09-17 16:21:18,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2024-09-17 16:21:39,091 INFO [train.py:1198] (1/2) Epoch 15, batch 800, loss[loss=0.236, ctc_loss=0.1324, cr_loss=0.3702, attn_decoder_loss=0.2393, over 29584.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.15, cr_loss=0.3917, attn_decoder_loss=0.2567, over 5705241.91 frames. ], batch size: 73, lr: 7.55e-03, grad_scale: 16.0 2024-09-17 16:22:04,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=256640.0, ans=0.0 2024-09-17 16:22:08,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.67 vs. limit=15.0 2024-09-17 16:22:26,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=256720.0, ans=0.0 2024-09-17 16:22:32,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=256720.0, ans=0.2 2024-09-17 16:22:40,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=256760.0, ans=0.2 2024-09-17 16:22:43,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-17 16:22:49,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=256760.0, ans=0.0 2024-09-17 16:22:49,430 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:22:49,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=256760.0, ans=0.125 2024-09-17 16:22:56,871 INFO [train.py:1198] (1/2) Epoch 15, batch 850, loss[loss=0.2719, ctc_loss=0.1687, cr_loss=0.4223, attn_decoder_loss=0.274, over 29695.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.15, cr_loss=0.3916, attn_decoder_loss=0.2568, over 5735395.85 frames. ], batch size: 89, lr: 7.55e-03, grad_scale: 8.0 2024-09-17 16:23:08,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=256800.0, ans=0.0 2024-09-17 16:23:11,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=256800.0, ans=0.125 2024-09-17 16:23:38,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=256880.0, ans=0.125 2024-09-17 16:23:39,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.676e+01 9.240e+01 9.818e+01 3.041e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-17 16:23:42,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=256920.0, ans=0.125 2024-09-17 16:23:50,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=256920.0, ans=0.025 2024-09-17 16:23:51,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-09-17 16:24:14,677 INFO [train.py:1198] (1/2) Epoch 15, batch 900, loss[loss=0.2355, ctc_loss=0.1342, cr_loss=0.3762, attn_decoder_loss=0.2384, over 29601.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1506, cr_loss=0.3928, attn_decoder_loss=0.2572, over 5740678.67 frames. ], batch size: 73, lr: 7.55e-03, grad_scale: 8.0 2024-09-17 16:24:42,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=257040.0, ans=0.125 2024-09-17 16:25:30,435 INFO [train.py:1198] (1/2) Epoch 15, batch 950, loss[loss=0.2432, ctc_loss=0.1401, cr_loss=0.3687, attn_decoder_loss=0.2464, over 29509.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1506, cr_loss=0.3924, attn_decoder_loss=0.257, over 5742685.78 frames. ], batch size: 74, lr: 7.54e-03, grad_scale: 8.0 2024-09-17 16:25:36,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=257200.0, ans=0.0 2024-09-17 16:25:47,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=257240.0, ans=0.1 2024-09-17 16:25:59,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=257280.0, ans=0.0 2024-09-17 16:26:08,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=15.0 2024-09-17 16:26:13,742 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.071e+01 9.347e+01 1.011e+02 1.116e+02 3.125e+02, threshold=2.021e+02, percent-clipped=4.0 2024-09-17 16:26:14,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=257280.0, ans=0.2 2024-09-17 16:26:27,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=257320.0, ans=0.0 2024-09-17 16:26:27,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=257320.0, ans=0.125 2024-09-17 16:26:34,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-09-17 16:26:50,492 INFO [train.py:1198] (1/2) Epoch 15, batch 1000, loss[loss=0.2467, ctc_loss=0.1489, cr_loss=0.3967, attn_decoder_loss=0.2488, over 29485.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1516, cr_loss=0.3935, attn_decoder_loss=0.2577, over 5735805.51 frames. ], batch size: 77, lr: 7.54e-03, grad_scale: 8.0 2024-09-17 16:26:52,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.13 vs. limit=15.0 2024-09-17 16:27:09,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=257440.0, ans=0.0 2024-09-17 16:27:10,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=257440.0, ans=0.125 2024-09-17 16:27:27,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=257480.0, ans=0.0 2024-09-17 16:27:28,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=257480.0, ans=0.5 2024-09-17 16:27:59,353 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:28:06,970 INFO [train.py:1198] (1/2) Epoch 15, batch 1050, loss[loss=0.2624, ctc_loss=0.1579, cr_loss=0.3966, attn_decoder_loss=0.2652, over 29688.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1513, cr_loss=0.3932, attn_decoder_loss=0.2571, over 5743659.68 frames. ], batch size: 85, lr: 7.54e-03, grad_scale: 8.0 2024-09-17 16:28:16,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=257600.0, ans=0.0 2024-09-17 16:28:29,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.51 vs. limit=22.5 2024-09-17 16:28:33,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=257640.0, ans=0.95 2024-09-17 16:28:34,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257640.0, ans=0.1 2024-09-17 16:28:34,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=257640.0, ans=0.1 2024-09-17 16:28:48,455 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.880e+01 9.520e+01 1.043e+02 1.808e+02, threshold=1.904e+02, percent-clipped=0.0 2024-09-17 16:29:19,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=257760.0, ans=0.125 2024-09-17 16:29:21,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=22.5 2024-09-17 16:29:23,481 INFO [train.py:1198] (1/2) Epoch 15, batch 1100, loss[loss=0.26, ctc_loss=0.1568, cr_loss=0.3931, attn_decoder_loss=0.2628, over 29428.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1511, cr_loss=0.3928, attn_decoder_loss=0.2569, over 5756164.03 frames. ], batch size: 78, lr: 7.53e-03, grad_scale: 8.0 2024-09-17 16:29:32,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257800.0, ans=0.1 2024-09-17 16:29:40,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=257840.0, ans=0.07 2024-09-17 16:29:57,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=257880.0, ans=0.125 2024-09-17 16:30:10,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257920.0, ans=0.1 2024-09-17 16:30:10,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=257920.0, ans=0.125 2024-09-17 16:30:15,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257920.0, ans=0.1 2024-09-17 16:30:22,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=257920.0, ans=0.0 2024-09-17 16:30:26,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.09 vs. limit=10.0 2024-09-17 16:30:43,779 INFO [train.py:1198] (1/2) Epoch 15, batch 1150, loss[loss=0.2506, ctc_loss=0.1456, cr_loss=0.3905, attn_decoder_loss=0.2536, over 29445.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1511, cr_loss=0.3922, attn_decoder_loss=0.2569, over 5755078.94 frames. ], batch size: 78, lr: 7.53e-03, grad_scale: 8.0 2024-09-17 16:30:57,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=258040.0, ans=0.2 2024-09-17 16:31:02,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=258040.0, ans=0.125 2024-09-17 16:31:08,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258040.0, ans=0.1 2024-09-17 16:31:14,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=258080.0, ans=0.1 2024-09-17 16:31:20,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=258080.0, ans=0.1 2024-09-17 16:31:25,059 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.854e+01 8.950e+01 9.429e+01 1.052e+02 4.091e+02, threshold=1.886e+02, percent-clipped=2.0 2024-09-17 16:31:26,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=258080.0, ans=0.0 2024-09-17 16:31:38,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.23 vs. limit=10.0 2024-09-17 16:31:43,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=258160.0, ans=0.125 2024-09-17 16:31:43,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=258160.0, ans=0.1 2024-09-17 16:31:56,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-09-17 16:32:00,032 INFO [train.py:1198] (1/2) Epoch 15, batch 1200, loss[loss=0.2648, ctc_loss=0.1483, cr_loss=0.3723, attn_decoder_loss=0.2695, over 29675.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1517, cr_loss=0.3927, attn_decoder_loss=0.2577, over 5747601.48 frames. ], batch size: 85, lr: 7.53e-03, grad_scale: 16.0 2024-09-17 16:32:01,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=258200.0, ans=0.125 2024-09-17 16:32:15,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258240.0, ans=0.1 2024-09-17 16:32:23,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=258240.0, ans=0.0 2024-09-17 16:32:31,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=12.0 2024-09-17 16:32:46,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=258320.0, ans=0.0 2024-09-17 16:32:52,717 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=15.0 2024-09-17 16:33:01,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=258360.0, ans=0.125 2024-09-17 16:33:06,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-17 16:33:11,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.06 vs. limit=10.0 2024-09-17 16:33:15,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=258400.0, ans=0.125 2024-09-17 16:33:16,673 INFO [train.py:1198] (1/2) Epoch 15, batch 1250, loss[loss=0.2663, ctc_loss=0.1531, cr_loss=0.4107, attn_decoder_loss=0.2698, over 29548.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1518, cr_loss=0.3933, attn_decoder_loss=0.2582, over 5774906.07 frames. ], batch size: 92, lr: 7.53e-03, grad_scale: 8.0 2024-09-17 16:33:32,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=258440.0, ans=0.0 2024-09-17 16:33:39,126 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.43 vs. limit=15.0 2024-09-17 16:33:59,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.854e+01 8.868e+01 9.518e+01 1.036e+02 1.703e+02, threshold=1.904e+02, percent-clipped=0.0 2024-09-17 16:34:21,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-17 16:34:23,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=258560.0, ans=0.1 2024-09-17 16:34:29,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=258560.0, ans=0.2 2024-09-17 16:34:37,194 INFO [train.py:1198] (1/2) Epoch 15, batch 1300, loss[loss=0.2619, ctc_loss=0.1445, cr_loss=0.3688, attn_decoder_loss=0.2668, over 28302.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1515, cr_loss=0.3931, attn_decoder_loss=0.2577, over 5778505.06 frames. ], batch size: 111, lr: 7.52e-03, grad_scale: 8.0 2024-09-17 16:35:06,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=258680.0, ans=0.125 2024-09-17 16:35:24,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=258720.0, ans=0.04949747468305833 2024-09-17 16:35:35,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.96 vs. limit=22.5 2024-09-17 16:35:39,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=258760.0, ans=0.2 2024-09-17 16:35:53,272 INFO [train.py:1198] (1/2) Epoch 15, batch 1350, loss[loss=0.2584, ctc_loss=0.1548, cr_loss=0.4105, attn_decoder_loss=0.2608, over 29736.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1508, cr_loss=0.3921, attn_decoder_loss=0.2572, over 5795907.42 frames. ], batch size: 81, lr: 7.52e-03, grad_scale: 8.0 2024-09-17 16:36:01,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=258800.0, ans=0.0 2024-09-17 16:36:07,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=258840.0, ans=0.0 2024-09-17 16:36:35,391 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.925e+01 9.317e+01 1.009e+02 1.483e+02, threshold=1.863e+02, percent-clipped=0.0 2024-09-17 16:36:44,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=258920.0, ans=0.0 2024-09-17 16:36:50,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=258920.0, ans=0.2 2024-09-17 16:37:06,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=15.0 2024-09-17 16:37:08,670 INFO [train.py:1198] (1/2) Epoch 15, batch 1400, loss[loss=0.221, ctc_loss=0.1233, cr_loss=0.3429, attn_decoder_loss=0.2242, over 29596.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1505, cr_loss=0.3921, attn_decoder_loss=0.2569, over 5806813.58 frames. ], batch size: 69, lr: 7.52e-03, grad_scale: 8.0 2024-09-17 16:37:09,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.39 vs. limit=15.0 2024-09-17 16:37:22,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=259040.0, ans=0.125 2024-09-17 16:37:33,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=15.0 2024-09-17 16:37:36,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=259040.0, ans=0.125 2024-09-17 16:37:39,415 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:37:48,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=259080.0, ans=0.125 2024-09-17 16:37:54,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=259120.0, ans=15.0 2024-09-17 16:37:56,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=15.0 2024-09-17 16:38:10,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=259160.0, ans=0.125 2024-09-17 16:38:19,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2024-09-17 16:38:23,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=259160.0, ans=0.1 2024-09-17 16:38:27,009 INFO [train.py:1198] (1/2) Epoch 15, batch 1450, loss[loss=0.2686, ctc_loss=0.1594, cr_loss=0.4128, attn_decoder_loss=0.2715, over 29451.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1508, cr_loss=0.3931, attn_decoder_loss=0.2575, over 5804189.42 frames. ], batch size: 94, lr: 7.51e-03, grad_scale: 8.0 2024-09-17 16:38:27,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.70 vs. limit=6.0 2024-09-17 16:38:39,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=259200.0, ans=0.0 2024-09-17 16:38:43,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=259240.0, ans=0.07 2024-09-17 16:38:46,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=259240.0, ans=0.125 2024-09-17 16:38:54,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=259240.0, ans=0.0 2024-09-17 16:39:01,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=259280.0, ans=0.125 2024-09-17 16:39:07,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=259280.0, ans=0.0 2024-09-17 16:39:11,108 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.827e+01 9.578e+01 1.049e+02 2.248e+02, threshold=1.916e+02, percent-clipped=2.0 2024-09-17 16:39:25,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=259320.0, ans=0.125 2024-09-17 16:39:28,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=22.5 2024-09-17 16:39:34,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=259360.0, ans=0.0 2024-09-17 16:39:44,413 INFO [train.py:1198] (1/2) Epoch 15, batch 1500, loss[loss=0.2711, ctc_loss=0.1645, cr_loss=0.4245, attn_decoder_loss=0.2735, over 29646.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1513, cr_loss=0.3937, attn_decoder_loss=0.258, over 5805645.87 frames. ], batch size: 86, lr: 7.51e-03, grad_scale: 8.0 2024-09-17 16:39:47,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=259400.0, ans=0.2 2024-09-17 16:40:38,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=259520.0, ans=0.0 2024-09-17 16:40:39,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=259520.0, ans=0.5 2024-09-17 16:40:47,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=259560.0, ans=0.0 2024-09-17 16:40:53,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=259560.0, ans=0.125 2024-09-17 16:40:58,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=259560.0, ans=0.0 2024-09-17 16:40:59,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=259600.0, ans=0.125 2024-09-17 16:41:00,891 INFO [train.py:1198] (1/2) Epoch 15, batch 1550, loss[loss=0.2861, ctc_loss=0.1821, cr_loss=0.4619, attn_decoder_loss=0.2874, over 29505.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1512, cr_loss=0.3927, attn_decoder_loss=0.2578, over 5782753.85 frames. ], batch size: 90, lr: 7.51e-03, grad_scale: 8.0 2024-09-17 16:41:04,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.29 vs. limit=15.0 2024-09-17 16:41:14,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=259640.0, ans=0.0 2024-09-17 16:41:19,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.60 vs. limit=22.5 2024-09-17 16:41:20,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=259640.0, ans=0.125 2024-09-17 16:41:42,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.774e+01 9.466e+01 1.042e+02 2.668e+02, threshold=1.893e+02, percent-clipped=3.0 2024-09-17 16:42:02,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=259760.0, ans=0.125 2024-09-17 16:42:08,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-09-17 16:42:17,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=259760.0, ans=0.125 2024-09-17 16:42:20,457 INFO [train.py:1198] (1/2) Epoch 15, batch 1600, loss[loss=0.2727, ctc_loss=0.1616, cr_loss=0.4061, attn_decoder_loss=0.276, over 29665.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1511, cr_loss=0.3922, attn_decoder_loss=0.2575, over 5764622.31 frames. ], batch size: 85, lr: 7.51e-03, grad_scale: 16.0 2024-09-17 16:42:28,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=259800.0, ans=0.2 2024-09-17 16:42:28,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=259800.0, ans=0.2 2024-09-17 16:42:40,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259840.0, ans=0.1 2024-09-17 16:42:45,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=259840.0, ans=0.125 2024-09-17 16:42:46,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=259840.0, ans=0.125 2024-09-17 16:42:55,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=259880.0, ans=0.09899494936611666 2024-09-17 16:42:59,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2024-09-17 16:43:04,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=259920.0, ans=0.2 2024-09-17 16:43:05,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.33 vs. limit=22.5 2024-09-17 16:43:13,906 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:43:36,452 INFO [train.py:1198] (1/2) Epoch 15, batch 1650, loss[loss=0.2744, ctc_loss=0.1632, cr_loss=0.4142, attn_decoder_loss=0.2776, over 29713.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1509, cr_loss=0.392, attn_decoder_loss=0.2571, over 5758140.47 frames. ], batch size: 89, lr: 7.50e-03, grad_scale: 8.0 2024-09-17 16:43:59,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=260040.0, ans=10.0 2024-09-17 16:43:59,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2024-09-17 16:44:03,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=22.5 2024-09-17 16:44:20,238 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.810e+01 9.523e+01 1.053e+02 2.945e+02, threshold=1.905e+02, percent-clipped=2.0 2024-09-17 16:44:48,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=260160.0, ans=0.125 2024-09-17 16:44:51,638 INFO [train.py:1198] (1/2) Epoch 15, batch 1700, loss[loss=0.2277, ctc_loss=0.1302, cr_loss=0.3586, attn_decoder_loss=0.2306, over 29566.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1505, cr_loss=0.392, attn_decoder_loss=0.2569, over 5778004.33 frames. ], batch size: 69, lr: 7.50e-03, grad_scale: 8.0 2024-09-17 16:44:57,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=260200.0, ans=0.125 2024-09-17 16:44:57,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=260200.0, ans=0.0 2024-09-17 16:44:59,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=260200.0, ans=0.125 2024-09-17 16:45:10,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=260240.0, ans=0.125 2024-09-17 16:45:15,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-09-17 16:45:17,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=260240.0, ans=0.025 2024-09-17 16:45:19,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=260240.0, ans=0.2 2024-09-17 16:45:19,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=260240.0, ans=0.2 2024-09-17 16:45:26,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=260280.0, ans=0.025 2024-09-17 16:45:34,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-09-17 16:46:04,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-17 16:46:11,458 INFO [train.py:1198] (1/2) Epoch 15, batch 1750, loss[loss=0.2191, ctc_loss=0.1226, cr_loss=0.3584, attn_decoder_loss=0.2218, over 29368.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1501, cr_loss=0.3916, attn_decoder_loss=0.2565, over 5787528.80 frames. ], batch size: 67, lr: 7.50e-03, grad_scale: 8.0 2024-09-17 16:46:32,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.02 vs. limit=15.0 2024-09-17 16:46:55,373 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.854e+01 9.659e+01 1.042e+02 2.660e+02, threshold=1.932e+02, percent-clipped=4.0 2024-09-17 16:46:57,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=260520.0, ans=0.0 2024-09-17 16:47:09,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=260520.0, ans=0.125 2024-09-17 16:47:15,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=260560.0, ans=0.125 2024-09-17 16:47:26,571 INFO [train.py:1198] (1/2) Epoch 15, batch 1800, loss[loss=0.2656, ctc_loss=0.1649, cr_loss=0.4224, attn_decoder_loss=0.2674, over 29680.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.15, cr_loss=0.391, attn_decoder_loss=0.2566, over 5791253.65 frames. ], batch size: 83, lr: 7.49e-03, grad_scale: 8.0 2024-09-17 16:47:33,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=260600.0, ans=0.125 2024-09-17 16:47:53,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-09-17 16:48:11,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=260720.0, ans=0.0 2024-09-17 16:48:11,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=260720.0, ans=0.125 2024-09-17 16:48:32,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=260760.0, ans=0.0 2024-09-17 16:48:38,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=260760.0, ans=0.125 2024-09-17 16:48:43,404 INFO [train.py:1198] (1/2) Epoch 15, batch 1850, loss[loss=0.2741, ctc_loss=0.1671, cr_loss=0.423, attn_decoder_loss=0.2766, over 29632.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1495, cr_loss=0.3903, attn_decoder_loss=0.2562, over 5797819.22 frames. ], batch size: 86, lr: 7.49e-03, grad_scale: 8.0 2024-09-17 16:48:47,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2024-09-17 16:48:55,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=260800.0, ans=0.07 2024-09-17 16:49:27,359 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.760e+01 9.313e+01 1.001e+02 1.511e+02, threshold=1.863e+02, percent-clipped=0.0 2024-09-17 16:49:42,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=260960.0, ans=0.1 2024-09-17 16:49:50,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=260960.0, ans=0.0 2024-09-17 16:50:01,055 INFO [train.py:1198] (1/2) Epoch 15, batch 1900, loss[loss=0.2634, ctc_loss=0.1506, cr_loss=0.376, attn_decoder_loss=0.2676, over 29716.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1495, cr_loss=0.3902, attn_decoder_loss=0.2566, over 5805797.70 frames. ], batch size: 89, lr: 7.49e-03, grad_scale: 8.0 2024-09-17 16:50:12,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=261000.0, ans=0.025 2024-09-17 16:50:16,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=261040.0, ans=0.125 2024-09-17 16:50:33,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=261080.0, ans=0.125 2024-09-17 16:50:48,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=22.5 2024-09-17 16:50:58,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2024-09-17 16:51:12,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2024-09-17 16:51:18,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.18 vs. limit=15.0 2024-09-17 16:51:18,999 INFO [train.py:1198] (1/2) Epoch 15, batch 1950, loss[loss=0.2502, ctc_loss=0.1324, cr_loss=0.3552, attn_decoder_loss=0.2554, over 29455.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1498, cr_loss=0.3916, attn_decoder_loss=0.2577, over 5819790.17 frames. ], batch size: 78, lr: 7.49e-03, grad_scale: 8.0 2024-09-17 16:51:33,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=261240.0, ans=0.125 2024-09-17 16:51:38,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=261240.0, ans=0.0 2024-09-17 16:51:39,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2024-09-17 16:51:53,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=261280.0, ans=0.025 2024-09-17 16:52:02,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.191e+01 8.962e+01 9.463e+01 1.031e+02 5.545e+02, threshold=1.893e+02, percent-clipped=1.0 2024-09-17 16:52:02,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=261320.0, ans=0.125 2024-09-17 16:52:34,220 INFO [train.py:1198] (1/2) Epoch 15, batch 2000, loss[loss=0.2266, ctc_loss=0.1333, cr_loss=0.365, attn_decoder_loss=0.2289, over 29334.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1508, cr_loss=0.3927, attn_decoder_loss=0.2584, over 5798037.35 frames. ], batch size: 67, lr: 7.48e-03, grad_scale: 16.0 2024-09-17 16:52:37,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=261400.0, ans=0.0 2024-09-17 16:52:43,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-09-17 16:52:47,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=15.0 2024-09-17 16:52:57,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=261440.0, ans=0.125 2024-09-17 16:53:03,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=261480.0, ans=0.0 2024-09-17 16:53:53,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=261600.0, ans=0.125 2024-09-17 16:53:54,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.08 vs. limit=12.0 2024-09-17 16:53:55,005 INFO [train.py:1198] (1/2) Epoch 15, batch 2050, loss[loss=0.2303, ctc_loss=0.1325, cr_loss=0.358, attn_decoder_loss=0.2332, over 29424.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1505, cr_loss=0.3919, attn_decoder_loss=0.2578, over 5790431.09 frames. ], batch size: 70, lr: 7.48e-03, grad_scale: 8.0 2024-09-17 16:54:16,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=261640.0, ans=0.5 2024-09-17 16:54:24,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=261680.0, ans=0.125 2024-09-17 16:54:28,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=261680.0, ans=0.0 2024-09-17 16:54:30,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.52 vs. limit=15.0 2024-09-17 16:54:38,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=261680.0, ans=0.125 2024-09-17 16:54:40,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.073e+01 9.648e+01 1.067e+02 2.180e+02, threshold=1.930e+02, percent-clipped=1.0 2024-09-17 16:54:51,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2024-09-17 16:55:03,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=261760.0, ans=0.125 2024-09-17 16:55:10,751 INFO [train.py:1198] (1/2) Epoch 15, batch 2100, loss[loss=0.2545, ctc_loss=0.1491, cr_loss=0.3944, attn_decoder_loss=0.2574, over 29770.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1501, cr_loss=0.3917, attn_decoder_loss=0.2572, over 5801205.33 frames. ], batch size: 81, lr: 7.48e-03, grad_scale: 8.0 2024-09-17 16:55:15,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=261800.0, ans=0.2 2024-09-17 16:55:44,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=261880.0, ans=0.125 2024-09-17 16:55:57,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=261920.0, ans=0.04949747468305833 2024-09-17 16:55:59,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=261920.0, ans=0.125 2024-09-17 16:56:23,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=261960.0, ans=0.0 2024-09-17 16:56:26,349 INFO [train.py:1198] (1/2) Epoch 15, batch 2150, loss[loss=0.2445, ctc_loss=0.142, cr_loss=0.393, attn_decoder_loss=0.2472, over 29444.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1494, cr_loss=0.3908, attn_decoder_loss=0.2564, over 5815567.00 frames. ], batch size: 78, lr: 7.47e-03, grad_scale: 8.0 2024-09-17 16:56:40,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=262040.0, ans=0.0 2024-09-17 16:57:11,851 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.623e+01 9.076e+01 9.705e+01 5.465e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-17 16:57:17,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.17 vs. limit=15.0 2024-09-17 16:57:41,945 INFO [train.py:1198] (1/2) Epoch 15, batch 2200, loss[loss=0.2631, ctc_loss=0.1536, cr_loss=0.4136, attn_decoder_loss=0.2661, over 29607.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1494, cr_loss=0.3909, attn_decoder_loss=0.2564, over 5811862.51 frames. ], batch size: 86, lr: 7.47e-03, grad_scale: 8.0 2024-09-17 16:57:48,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=262200.0, ans=0.2 2024-09-17 16:57:56,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2024-09-17 16:58:13,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=262240.0, ans=0.125 2024-09-17 16:58:29,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=262280.0, ans=0.0 2024-09-17 16:58:35,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=262320.0, ans=0.035 2024-09-17 16:58:55,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262360.0, ans=0.1 2024-09-17 16:59:02,696 INFO [train.py:1198] (1/2) Epoch 15, batch 2250, loss[loss=0.2599, ctc_loss=0.152, cr_loss=0.3981, attn_decoder_loss=0.2631, over 29689.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1489, cr_loss=0.3897, attn_decoder_loss=0.2561, over 5811510.54 frames. ], batch size: 82, lr: 7.47e-03, grad_scale: 8.0 2024-09-17 16:59:34,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=262480.0, ans=0.2 2024-09-17 16:59:42,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=262480.0, ans=0.125 2024-09-17 16:59:47,795 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.637e+01 9.303e+01 1.004e+02 1.390e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-17 16:59:54,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=262520.0, ans=0.125 2024-09-17 17:00:00,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2024-09-17 17:00:04,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=262560.0, ans=0.05 2024-09-17 17:00:09,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=262560.0, ans=0.0 2024-09-17 17:00:18,402 INFO [train.py:1198] (1/2) Epoch 15, batch 2300, loss[loss=0.2343, ctc_loss=0.1338, cr_loss=0.3509, attn_decoder_loss=0.2377, over 29328.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1482, cr_loss=0.3884, attn_decoder_loss=0.2553, over 5798120.71 frames. ], batch size: 71, lr: 7.47e-03, grad_scale: 8.0 2024-09-17 17:00:23,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=262600.0, ans=0.2 2024-09-17 17:00:29,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=262600.0, ans=0.2 2024-09-17 17:00:34,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=262640.0, ans=0.125 2024-09-17 17:00:36,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262640.0, ans=0.1 2024-09-17 17:00:50,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=262680.0, ans=0.0 2024-09-17 17:00:56,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=262680.0, ans=0.0 2024-09-17 17:01:07,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-17 17:01:19,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.17 vs. limit=15.0 2024-09-17 17:01:20,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262760.0, ans=0.1 2024-09-17 17:01:22,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262760.0, ans=0.1 2024-09-17 17:01:28,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.62 vs. limit=15.0 2024-09-17 17:01:32,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=262800.0, ans=0.125 2024-09-17 17:01:34,192 INFO [train.py:1198] (1/2) Epoch 15, batch 2350, loss[loss=0.2699, ctc_loss=0.1646, cr_loss=0.4066, attn_decoder_loss=0.2726, over 29697.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1485, cr_loss=0.3886, attn_decoder_loss=0.2555, over 5803871.74 frames. ], batch size: 83, lr: 7.46e-03, grad_scale: 8.0 2024-09-17 17:02:12,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=262880.0, ans=0.2 2024-09-17 17:02:14,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=262880.0, ans=0.125 2024-09-17 17:02:17,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-09-17 17:02:24,199 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.797e+01 9.531e+01 1.053e+02 3.289e+02, threshold=1.906e+02, percent-clipped=2.0 2024-09-17 17:02:49,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2024-09-17 17:02:54,792 INFO [train.py:1198] (1/2) Epoch 15, batch 2400, loss[loss=0.2366, ctc_loss=0.1354, cr_loss=0.3664, attn_decoder_loss=0.2398, over 29536.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1493, cr_loss=0.3904, attn_decoder_loss=0.2563, over 5808899.22 frames. ], batch size: 76, lr: 7.46e-03, grad_scale: 16.0 2024-09-17 17:03:05,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=263000.0, ans=0.025 2024-09-17 17:03:08,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=263040.0, ans=0.125 2024-09-17 17:03:14,659 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:03:46,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=263120.0, ans=0.1 2024-09-17 17:04:05,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=263160.0, ans=0.0 2024-09-17 17:04:11,101 INFO [train.py:1198] (1/2) Epoch 15, batch 2450, loss[loss=0.2625, ctc_loss=0.1553, cr_loss=0.3977, attn_decoder_loss=0.2656, over 29711.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.15, cr_loss=0.391, attn_decoder_loss=0.2574, over 5785786.05 frames. ], batch size: 82, lr: 7.46e-03, grad_scale: 8.0 2024-09-17 17:04:17,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=263200.0, ans=0.125 2024-09-17 17:04:21,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=263200.0, ans=0.125 2024-09-17 17:04:35,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=263240.0, ans=0.125 2024-09-17 17:04:46,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=263280.0, ans=0.125 2024-09-17 17:04:46,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=263280.0, ans=0.125 2024-09-17 17:04:49,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=263280.0, ans=0.125 2024-09-17 17:04:57,814 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 9.083e+01 9.931e+01 1.099e+02 3.144e+02, threshold=1.986e+02, percent-clipped=3.0 2024-09-17 17:04:59,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=263320.0, ans=0.125 2024-09-17 17:05:26,722 INFO [train.py:1198] (1/2) Epoch 15, batch 2500, loss[loss=0.2606, ctc_loss=0.1504, cr_loss=0.3914, attn_decoder_loss=0.2642, over 29640.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1501, cr_loss=0.3909, attn_decoder_loss=0.2574, over 5795177.13 frames. ], batch size: 86, lr: 7.46e-03, grad_scale: 8.0 2024-09-17 17:05:39,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=263400.0, ans=0.0 2024-09-17 17:05:58,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=263440.0, ans=0.2 2024-09-17 17:06:18,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2024-09-17 17:06:20,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=263520.0, ans=0.125 2024-09-17 17:06:31,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=263560.0, ans=10.0 2024-09-17 17:06:32,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=263560.0, ans=0.0 2024-09-17 17:06:44,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=263560.0, ans=0.125 2024-09-17 17:06:47,416 INFO [train.py:1198] (1/2) Epoch 15, batch 2550, loss[loss=0.2283, ctc_loss=0.1296, cr_loss=0.3717, attn_decoder_loss=0.231, over 29362.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1499, cr_loss=0.3911, attn_decoder_loss=0.2571, over 5797897.26 frames. ], batch size: 67, lr: 7.45e-03, grad_scale: 8.0 2024-09-17 17:06:53,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=263600.0, ans=0.025 2024-09-17 17:07:11,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=263640.0, ans=0.125 2024-09-17 17:07:22,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=263680.0, ans=0.125 2024-09-17 17:07:34,222 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.892e+01 9.357e+01 1.015e+02 2.489e+02, threshold=1.871e+02, percent-clipped=2.0 2024-09-17 17:07:34,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=263720.0, ans=0.125 2024-09-17 17:07:42,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=263720.0, ans=0.125 2024-09-17 17:07:43,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=263720.0, ans=0.125 2024-09-17 17:08:02,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=263800.0, ans=0.125 2024-09-17 17:08:03,390 INFO [train.py:1198] (1/2) Epoch 15, batch 2600, loss[loss=0.2528, ctc_loss=0.1533, cr_loss=0.3976, attn_decoder_loss=0.255, over 29431.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1501, cr_loss=0.3914, attn_decoder_loss=0.2574, over 5794458.39 frames. ], batch size: 78, lr: 7.45e-03, grad_scale: 8.0 2024-09-17 17:08:06,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=263800.0, ans=0.0 2024-09-17 17:08:06,836 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:08:12,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.47 vs. limit=10.0 2024-09-17 17:08:20,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=263840.0, ans=0.1 2024-09-17 17:08:22,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.30 vs. limit=10.0 2024-09-17 17:08:28,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=263840.0, ans=0.0 2024-09-17 17:08:50,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=263920.0, ans=0.0 2024-09-17 17:08:50,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=263920.0, ans=0.2 2024-09-17 17:08:57,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=263920.0, ans=0.125 2024-09-17 17:09:15,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=263960.0, ans=0.0 2024-09-17 17:09:18,683 INFO [train.py:1198] (1/2) Epoch 15, batch 2650, loss[loss=0.2677, ctc_loss=0.1613, cr_loss=0.4111, attn_decoder_loss=0.2704, over 29258.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1506, cr_loss=0.3924, attn_decoder_loss=0.2578, over 5801027.78 frames. ], batch size: 100, lr: 7.45e-03, grad_scale: 8.0 2024-09-17 17:09:19,143 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:09:33,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=264000.0, ans=0.125 2024-09-17 17:09:51,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=264080.0, ans=0.125 2024-09-17 17:10:02,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=264080.0, ans=0.0 2024-09-17 17:10:06,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=264120.0, ans=0.2 2024-09-17 17:10:09,779 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 9.042e+01 9.386e+01 1.019e+02 2.005e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-17 17:10:22,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=264160.0, ans=0.125 2024-09-17 17:10:32,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.56 vs. limit=15.0 2024-09-17 17:10:36,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=264160.0, ans=0.1 2024-09-17 17:10:38,717 INFO [train.py:1198] (1/2) Epoch 15, batch 2700, loss[loss=0.2459, ctc_loss=0.132, cr_loss=0.3645, attn_decoder_loss=0.2505, over 29515.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1508, cr_loss=0.392, attn_decoder_loss=0.258, over 5797103.84 frames. ], batch size: 87, lr: 7.44e-03, grad_scale: 8.0 2024-09-17 17:10:39,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264200.0, ans=0.1 2024-09-17 17:10:49,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=264200.0, ans=0.0 2024-09-17 17:10:52,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=264240.0, ans=0.0 2024-09-17 17:11:06,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=264240.0, ans=0.0 2024-09-17 17:11:15,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=264280.0, ans=0.125 2024-09-17 17:11:20,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=264280.0, ans=0.125 2024-09-17 17:11:23,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=264320.0, ans=0.025 2024-09-17 17:11:30,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=264320.0, ans=0.2 2024-09-17 17:11:39,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=264360.0, ans=0.1 2024-09-17 17:11:55,010 INFO [train.py:1198] (1/2) Epoch 15, batch 2750, loss[loss=0.2483, ctc_loss=0.144, cr_loss=0.368, attn_decoder_loss=0.2517, over 29524.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1496, cr_loss=0.3897, attn_decoder_loss=0.2566, over 5794608.65 frames. ], batch size: 75, lr: 7.44e-03, grad_scale: 8.0 2024-09-17 17:12:07,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=264400.0, ans=0.125 2024-09-17 17:12:21,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=264440.0, ans=0.035 2024-09-17 17:12:24,129 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:12:33,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=264480.0, ans=0.0 2024-09-17 17:12:41,847 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.996e+01 9.846e+01 1.075e+02 1.941e+02, threshold=1.969e+02, percent-clipped=1.0 2024-09-17 17:13:13,288 INFO [train.py:1198] (1/2) Epoch 15, batch 2800, loss[loss=0.2827, ctc_loss=0.2015, cr_loss=0.411, attn_decoder_loss=0.2826, over 20405.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1504, cr_loss=0.3908, attn_decoder_loss=0.2572, over 5776481.99 frames. ], batch size: 210, lr: 7.44e-03, grad_scale: 16.0 2024-09-17 17:13:42,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2024-09-17 17:13:45,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=264680.0, ans=0.05 2024-09-17 17:14:31,447 INFO [train.py:1198] (1/2) Epoch 15, batch 2850, loss[loss=0.2379, ctc_loss=0.1409, cr_loss=0.401, attn_decoder_loss=0.2398, over 29519.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1512, cr_loss=0.3918, attn_decoder_loss=0.2577, over 5762308.63 frames. ], batch size: 77, lr: 7.44e-03, grad_scale: 8.0 2024-09-17 17:14:51,483 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:15:06,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=264880.0, ans=0.0 2024-09-17 17:15:09,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264880.0, ans=0.1 2024-09-17 17:15:20,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 9.061e+01 9.943e+01 1.094e+02 2.532e+02, threshold=1.989e+02, percent-clipped=2.0 2024-09-17 17:15:23,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264920.0, ans=0.1 2024-09-17 17:15:24,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=264920.0, ans=0.125 2024-09-17 17:15:40,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=264960.0, ans=0.125 2024-09-17 17:15:47,577 INFO [train.py:1198] (1/2) Epoch 15, batch 2900, loss[loss=0.2564, ctc_loss=0.1417, cr_loss=0.3821, attn_decoder_loss=0.2607, over 29447.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.152, cr_loss=0.3942, attn_decoder_loss=0.259, over 5787670.87 frames. ], batch size: 79, lr: 7.43e-03, grad_scale: 8.0 2024-09-17 17:15:53,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=265000.0, ans=0.025 2024-09-17 17:16:25,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-09-17 17:16:35,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=265120.0, ans=0.2 2024-09-17 17:17:05,973 INFO [train.py:1198] (1/2) Epoch 15, batch 2950, loss[loss=0.2378, ctc_loss=0.1398, cr_loss=0.3854, attn_decoder_loss=0.2401, over 29538.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1512, cr_loss=0.3924, attn_decoder_loss=0.2578, over 5783079.06 frames. ], batch size: 75, lr: 7.43e-03, grad_scale: 8.0 2024-09-17 17:17:21,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-09-17 17:17:25,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265240.0, ans=0.1 2024-09-17 17:17:35,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.68 vs. limit=10.0 2024-09-17 17:17:41,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=265280.0, ans=0.0 2024-09-17 17:17:47,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=265280.0, ans=0.2 2024-09-17 17:17:56,690 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.763e+01 8.981e+01 9.731e+01 1.093e+02 3.344e+02, threshold=1.946e+02, percent-clipped=1.0 2024-09-17 17:18:12,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=265360.0, ans=0.125 2024-09-17 17:18:24,069 INFO [train.py:1198] (1/2) Epoch 15, batch 3000, loss[loss=0.2522, ctc_loss=0.1478, cr_loss=0.4039, attn_decoder_loss=0.2548, over 29796.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1507, cr_loss=0.3916, attn_decoder_loss=0.2574, over 5784782.27 frames. ], batch size: 81, lr: 7.43e-03, grad_scale: 8.0 2024-09-17 17:18:24,069 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 17:18:31,763 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.4.self_attn_weights, attn_weights_entropy = tensor([3.2878, 2.8578, 2.1803, 2.8235, 2.6970, 1.6750, 2.1686, 2.6388], device='cuda:1') 2024-09-17 17:18:41,439 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6429, 4.6855, 4.1347, 2.4540], device='cuda:1') 2024-09-17 17:18:42,410 INFO [train.py:1230] (1/2) Epoch 15, validation: loss=0.2111, ctc_loss=0.04175, cr_loss=4.872e-15, attn_decoder_loss=0.23, over 944034.00 frames. 2024-09-17 17:18:42,410 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 17:18:44,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=265400.0, ans=0.0 2024-09-17 17:19:10,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=265440.0, ans=0.2 2024-09-17 17:19:22,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265480.0, ans=0.1 2024-09-17 17:19:48,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=265560.0, ans=0.125 2024-09-17 17:19:58,991 INFO [train.py:1198] (1/2) Epoch 15, batch 3050, loss[loss=0.2466, ctc_loss=0.1421, cr_loss=0.3894, attn_decoder_loss=0.2496, over 29536.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1514, cr_loss=0.393, attn_decoder_loss=0.2583, over 5778017.30 frames. ], batch size: 76, lr: 7.42e-03, grad_scale: 8.0 2024-09-17 17:20:17,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=265640.0, ans=0.125 2024-09-17 17:20:49,578 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 9.345e+01 1.016e+02 1.110e+02 2.723e+02, threshold=2.032e+02, percent-clipped=2.0 2024-09-17 17:20:49,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=265720.0, ans=0.1 2024-09-17 17:21:09,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=265760.0, ans=0.0 2024-09-17 17:21:16,562 INFO [train.py:1198] (1/2) Epoch 15, batch 3100, loss[loss=0.2633, ctc_loss=0.1529, cr_loss=0.3978, attn_decoder_loss=0.2668, over 29280.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1507, cr_loss=0.3919, attn_decoder_loss=0.258, over 5778092.26 frames. ], batch size: 100, lr: 7.42e-03, grad_scale: 8.0 2024-09-17 17:21:16,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=265800.0, ans=0.125 2024-09-17 17:21:35,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=265840.0, ans=0.0 2024-09-17 17:21:54,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-09-17 17:22:02,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=265920.0, ans=0.125 2024-09-17 17:22:13,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=265920.0, ans=0.0 2024-09-17 17:22:18,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=265960.0, ans=0.05 2024-09-17 17:22:19,724 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:22:25,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=265960.0, ans=0.125 2024-09-17 17:22:30,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=265960.0, ans=0.125 2024-09-17 17:22:32,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-09-17 17:22:34,711 INFO [train.py:1198] (1/2) Epoch 15, batch 3150, loss[loss=0.2711, ctc_loss=0.1664, cr_loss=0.4112, attn_decoder_loss=0.2735, over 28882.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1507, cr_loss=0.3922, attn_decoder_loss=0.2578, over 5785386.82 frames. ], batch size: 104, lr: 7.42e-03, grad_scale: 8.0 2024-09-17 17:22:38,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.90 vs. limit=22.5 2024-09-17 17:22:53,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=266040.0, ans=0.05 2024-09-17 17:23:11,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=266080.0, ans=0.125 2024-09-17 17:23:16,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=266080.0, ans=0.95 2024-09-17 17:23:23,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.876e+01 9.219e+01 9.735e+01 3.011e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-17 17:23:50,661 INFO [train.py:1198] (1/2) Epoch 15, batch 3200, loss[loss=0.2519, ctc_loss=0.14, cr_loss=0.3765, attn_decoder_loss=0.2559, over 29401.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1501, cr_loss=0.3915, attn_decoder_loss=0.257, over 5796023.00 frames. ], batch size: 79, lr: 7.42e-03, grad_scale: 16.0 2024-09-17 17:23:57,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2024-09-17 17:24:14,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-09-17 17:24:15,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=266240.0, ans=0.125 2024-09-17 17:24:26,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=266280.0, ans=0.2 2024-09-17 17:24:27,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=266280.0, ans=0.125 2024-09-17 17:24:31,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.60 vs. limit=12.0 2024-09-17 17:24:43,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-09-17 17:24:46,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=266320.0, ans=0.1 2024-09-17 17:24:48,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=266320.0, ans=0.125 2024-09-17 17:25:07,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=266400.0, ans=0.025 2024-09-17 17:25:09,282 INFO [train.py:1198] (1/2) Epoch 15, batch 3250, loss[loss=0.2576, ctc_loss=0.1463, cr_loss=0.3967, attn_decoder_loss=0.2612, over 29706.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1497, cr_loss=0.3907, attn_decoder_loss=0.257, over 5803327.60 frames. ], batch size: 84, lr: 7.41e-03, grad_scale: 8.0 2024-09-17 17:25:15,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=266400.0, ans=0.125 2024-09-17 17:25:21,697 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:25:24,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=266440.0, ans=0.025 2024-09-17 17:25:29,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-17 17:25:32,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=266440.0, ans=0.0 2024-09-17 17:26:01,023 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.753e+01 9.181e+01 1.001e+02 1.564e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-17 17:26:04,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=266520.0, ans=0.05 2024-09-17 17:26:04,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=266520.0, ans=0.0 2024-09-17 17:26:26,790 INFO [train.py:1198] (1/2) Epoch 15, batch 3300, loss[loss=0.2723, ctc_loss=0.1675, cr_loss=0.4181, attn_decoder_loss=0.2746, over 28357.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1489, cr_loss=0.3899, attn_decoder_loss=0.256, over 5799894.85 frames. ], batch size: 111, lr: 7.41e-03, grad_scale: 8.0 2024-09-17 17:26:36,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=266600.0, ans=0.1 2024-09-17 17:27:17,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=266720.0, ans=0.2 2024-09-17 17:27:18,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=266720.0, ans=0.125 2024-09-17 17:27:26,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=266760.0, ans=0.07 2024-09-17 17:27:35,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=266760.0, ans=0.1 2024-09-17 17:27:42,411 INFO [train.py:1198] (1/2) Epoch 15, batch 3350, loss[loss=0.2631, ctc_loss=0.1529, cr_loss=0.388, attn_decoder_loss=0.2667, over 28826.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1498, cr_loss=0.3912, attn_decoder_loss=0.2567, over 5774638.66 frames. ], batch size: 104, lr: 7.41e-03, grad_scale: 8.0 2024-09-17 17:27:44,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=266800.0, ans=0.0 2024-09-17 17:27:57,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2024-09-17 17:28:14,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=266880.0, ans=0.125 2024-09-17 17:28:25,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=266880.0, ans=0.125 2024-09-17 17:28:27,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=266880.0, ans=0.2 2024-09-17 17:28:28,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=266920.0, ans=0.125 2024-09-17 17:28:31,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.78 vs. limit=15.0 2024-09-17 17:28:34,699 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.002e+01 8.982e+01 9.740e+01 1.080e+02 2.374e+02, threshold=1.948e+02, percent-clipped=1.0 2024-09-17 17:28:41,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=266920.0, ans=0.125 2024-09-17 17:28:59,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=267000.0, ans=0.0 2024-09-17 17:29:00,483 INFO [train.py:1198] (1/2) Epoch 15, batch 3400, loss[loss=0.2293, ctc_loss=0.1345, cr_loss=0.3499, attn_decoder_loss=0.232, over 29345.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1502, cr_loss=0.3919, attn_decoder_loss=0.2568, over 5768003.61 frames. ], batch size: 67, lr: 7.41e-03, grad_scale: 8.0 2024-09-17 17:29:03,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=267000.0, ans=0.125 2024-09-17 17:29:05,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=267000.0, ans=0.0 2024-09-17 17:29:05,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=267000.0, ans=0.0 2024-09-17 17:29:08,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=267000.0, ans=0.125 2024-09-17 17:29:19,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=267040.0, ans=0.025 2024-09-17 17:29:27,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=267040.0, ans=0.125 2024-09-17 17:29:27,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.54 vs. limit=15.0 2024-09-17 17:29:33,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=267080.0, ans=0.125 2024-09-17 17:29:33,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.76 vs. limit=15.0 2024-09-17 17:29:45,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=267080.0, ans=0.125 2024-09-17 17:30:18,824 INFO [train.py:1198] (1/2) Epoch 15, batch 3450, loss[loss=0.2651, ctc_loss=0.1472, cr_loss=0.3852, attn_decoder_loss=0.2697, over 28208.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1502, cr_loss=0.3923, attn_decoder_loss=0.2572, over 5775577.85 frames. ], batch size: 111, lr: 7.40e-03, grad_scale: 8.0 2024-09-17 17:30:25,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=267200.0, ans=0.2 2024-09-17 17:30:37,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=267240.0, ans=0.125 2024-09-17 17:30:40,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=267240.0, ans=0.04949747468305833 2024-09-17 17:30:58,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=267280.0, ans=0.125 2024-09-17 17:31:00,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-09-17 17:31:00,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=22.5 2024-09-17 17:31:08,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.765e+01 9.280e+01 9.883e+01 2.461e+02, threshold=1.856e+02, percent-clipped=1.0 2024-09-17 17:31:16,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=267320.0, ans=0.04949747468305833 2024-09-17 17:31:34,402 INFO [train.py:1198] (1/2) Epoch 15, batch 3500, loss[loss=0.2301, ctc_loss=0.1268, cr_loss=0.3495, attn_decoder_loss=0.2338, over 29319.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1495, cr_loss=0.3916, attn_decoder_loss=0.2566, over 5777873.78 frames. ], batch size: 71, lr: 7.40e-03, grad_scale: 8.0 2024-09-17 17:32:28,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=267520.0, ans=0.0 2024-09-17 17:32:42,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=267560.0, ans=0.025 2024-09-17 17:32:51,207 INFO [train.py:1198] (1/2) Epoch 15, batch 3550, loss[loss=0.2649, ctc_loss=0.1521, cr_loss=0.3994, attn_decoder_loss=0.2686, over 29698.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1491, cr_loss=0.3911, attn_decoder_loss=0.2565, over 5784130.05 frames. ], batch size: 89, lr: 7.40e-03, grad_scale: 8.0 2024-09-17 17:32:51,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=267600.0, ans=0.125 2024-09-17 17:32:55,833 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:33:17,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=267640.0, ans=0.0 2024-09-17 17:33:22,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=267680.0, ans=0.125 2024-09-17 17:33:25,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=267680.0, ans=15.0 2024-09-17 17:33:39,992 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.842e+01 9.279e+01 9.951e+01 4.838e+02, threshold=1.856e+02, percent-clipped=2.0 2024-09-17 17:33:46,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.74 vs. limit=15.0 2024-09-17 17:34:00,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=267760.0, ans=0.0 2024-09-17 17:34:05,140 INFO [train.py:1198] (1/2) Epoch 15, batch 3600, loss[loss=0.249, ctc_loss=0.1496, cr_loss=0.4014, attn_decoder_loss=0.2512, over 29512.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1493, cr_loss=0.3913, attn_decoder_loss=0.2566, over 5792870.17 frames. ], batch size: 77, lr: 7.39e-03, grad_scale: 16.0 2024-09-17 17:34:05,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=267800.0, ans=0.125 2024-09-17 17:34:05,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=267800.0, ans=0.125 2024-09-17 17:34:19,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=267800.0, ans=0.125 2024-09-17 17:34:27,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=267840.0, ans=0.0 2024-09-17 17:34:31,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=267840.0, ans=0.0 2024-09-17 17:34:33,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=267840.0, ans=0.125 2024-09-17 17:34:51,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=267920.0, ans=0.125 2024-09-17 17:34:52,713 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:34:55,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267920.0, ans=0.1 2024-09-17 17:34:56,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-09-17 17:34:59,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-09-17 17:34:59,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=267920.0, ans=0.125 2024-09-17 17:35:04,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=267920.0, ans=0.025 2024-09-17 17:35:06,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=267960.0, ans=0.1 2024-09-17 17:35:19,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-09-17 17:35:22,564 INFO [train.py:1198] (1/2) Epoch 15, batch 3650, loss[loss=0.2691, ctc_loss=0.1665, cr_loss=0.4098, attn_decoder_loss=0.2714, over 29537.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1483, cr_loss=0.3897, attn_decoder_loss=0.2558, over 5793449.42 frames. ], batch size: 90, lr: 7.39e-03, grad_scale: 8.0 2024-09-17 17:35:41,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.38 vs. limit=22.5 2024-09-17 17:35:44,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=268040.0, ans=6.0 2024-09-17 17:35:45,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=268040.0, ans=0.0 2024-09-17 17:35:47,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-09-17 17:35:49,811 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:35:51,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=268080.0, ans=0.0 2024-09-17 17:35:51,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268080.0, ans=0.1 2024-09-17 17:36:13,524 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.685e+01 9.367e+01 9.867e+01 1.459e+02, threshold=1.873e+02, percent-clipped=0.0 2024-09-17 17:36:37,447 INFO [train.py:1198] (1/2) Epoch 15, batch 3700, loss[loss=0.2612, ctc_loss=0.1534, cr_loss=0.4045, attn_decoder_loss=0.2642, over 29709.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1482, cr_loss=0.3895, attn_decoder_loss=0.2558, over 5802924.34 frames. ], batch size: 84, lr: 7.39e-03, grad_scale: 8.0 2024-09-17 17:36:45,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=268200.0, ans=0.125 2024-09-17 17:36:46,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268200.0, ans=0.1 2024-09-17 17:36:48,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=268200.0, ans=0.2 2024-09-17 17:36:50,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2024-09-17 17:36:56,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=268240.0, ans=0.125 2024-09-17 17:37:51,817 INFO [train.py:1198] (1/2) Epoch 15, batch 3750, loss[loss=0.2221, ctc_loss=0.1286, cr_loss=0.3575, attn_decoder_loss=0.2245, over 29345.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.148, cr_loss=0.3891, attn_decoder_loss=0.2557, over 5807485.15 frames. ], batch size: 67, lr: 7.39e-03, grad_scale: 8.0 2024-09-17 17:37:57,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2024-09-17 17:38:00,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=268400.0, ans=0.0 2024-09-17 17:38:20,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=268480.0, ans=0.0 2024-09-17 17:38:21,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=268480.0, ans=0.125 2024-09-17 17:38:24,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268480.0, ans=0.1 2024-09-17 17:38:42,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 8.742e+01 9.217e+01 9.952e+01 4.415e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-17 17:38:48,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=268520.0, ans=0.025 2024-09-17 17:39:05,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=268560.0, ans=0.125 2024-09-17 17:39:07,745 INFO [train.py:1198] (1/2) Epoch 15, batch 3800, loss[loss=0.2717, ctc_loss=0.1601, cr_loss=0.4358, attn_decoder_loss=0.2745, over 29638.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1475, cr_loss=0.3883, attn_decoder_loss=0.2553, over 5797180.02 frames. ], batch size: 86, lr: 7.38e-03, grad_scale: 8.0 2024-09-17 17:39:10,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=268600.0, ans=0.025 2024-09-17 17:39:24,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=268640.0, ans=0.125 2024-09-17 17:39:32,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=268640.0, ans=0.0 2024-09-17 17:39:32,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.29 vs. limit=15.0 2024-09-17 17:39:40,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.18 vs. limit=22.5 2024-09-17 17:39:50,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=268680.0, ans=0.0 2024-09-17 17:40:24,549 INFO [train.py:1198] (1/2) Epoch 15, batch 3850, loss[loss=0.2713, ctc_loss=0.1514, cr_loss=0.3796, attn_decoder_loss=0.2762, over 29258.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1475, cr_loss=0.3883, attn_decoder_loss=0.2553, over 5812226.03 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 8.0 2024-09-17 17:40:28,090 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.04 vs. limit=22.5 2024-09-17 17:40:30,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=268800.0, ans=0.125 2024-09-17 17:40:51,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=268840.0, ans=0.1 2024-09-17 17:40:59,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=268880.0, ans=0.07 2024-09-17 17:40:59,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=268880.0, ans=0.025 2024-09-17 17:41:16,760 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.057e+01 9.751e+01 1.063e+02 2.027e+02, threshold=1.950e+02, percent-clipped=1.0 2024-09-17 17:41:20,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=268920.0, ans=0.0 2024-09-17 17:41:21,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=268920.0, ans=0.0 2024-09-17 17:41:22,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=268960.0, ans=0.125 2024-09-17 17:41:36,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=268960.0, ans=0.025 2024-09-17 17:41:38,870 INFO [train.py:1198] (1/2) Epoch 15, batch 3900, loss[loss=0.2643, ctc_loss=0.1572, cr_loss=0.4002, attn_decoder_loss=0.2673, over 29618.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1482, cr_loss=0.3899, attn_decoder_loss=0.2563, over 5816934.54 frames. ], batch size: 86, lr: 7.38e-03, grad_scale: 8.0 2024-09-17 17:41:52,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=269040.0, ans=0.1 2024-09-17 17:41:53,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=269040.0, ans=0.125 2024-09-17 17:42:11,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=269080.0, ans=0.0 2024-09-17 17:42:16,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.88 vs. limit=15.0 2024-09-17 17:42:52,609 INFO [train.py:1198] (1/2) Epoch 15, batch 3950, loss[loss=0.2699, ctc_loss=0.1636, cr_loss=0.4106, attn_decoder_loss=0.2726, over 29505.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1481, cr_loss=0.3897, attn_decoder_loss=0.2561, over 5836333.29 frames. ], batch size: 97, lr: 7.38e-03, grad_scale: 8.0 2024-09-17 17:42:54,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=269200.0, ans=0.0 2024-09-17 17:43:05,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.75 vs. limit=15.0 2024-09-17 17:43:09,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.04 vs. limit=10.0 2024-09-17 17:43:32,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=269280.0, ans=0.125 2024-09-17 17:43:34,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=269280.0, ans=0.1 2024-09-17 17:43:44,311 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.088e+01 9.576e+01 1.054e+02 2.878e+02, threshold=1.915e+02, percent-clipped=1.0 2024-09-17 17:43:45,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=12.0 2024-09-17 17:43:47,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=269320.0, ans=0.0 2024-09-17 17:43:53,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=269360.0, ans=0.5 2024-09-17 17:44:07,777 INFO [train.py:1198] (1/2) Epoch 15, batch 4000, loss[loss=0.2317, ctc_loss=0.1313, cr_loss=0.3675, attn_decoder_loss=0.2347, over 29521.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1482, cr_loss=0.389, attn_decoder_loss=0.2558, over 5813111.55 frames. ], batch size: 74, lr: 7.37e-03, grad_scale: 16.0 2024-09-17 17:44:21,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=269440.0, ans=0.125 2024-09-17 17:44:23,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-09-17 17:45:13,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=269560.0, ans=0.125 2024-09-17 17:45:22,408 INFO [train.py:1198] (1/2) Epoch 15, batch 4050, loss[loss=0.3008, ctc_loss=0.2226, cr_loss=0.4265, attn_decoder_loss=0.3, over 19355.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1485, cr_loss=0.3889, attn_decoder_loss=0.256, over 5796234.00 frames. ], batch size: 209, lr: 7.37e-03, grad_scale: 8.0 2024-09-17 17:45:59,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=269680.0, ans=0.125 2024-09-17 17:46:16,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.913e+01 9.474e+01 1.030e+02 4.406e+02, threshold=1.895e+02, percent-clipped=2.0 2024-09-17 17:46:25,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269760.0, ans=0.1 2024-09-17 17:46:35,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=269800.0, ans=0.2 2024-09-17 17:46:37,086 INFO [train.py:1198] (1/2) Epoch 15, batch 4100, loss[loss=0.2712, ctc_loss=0.1641, cr_loss=0.4312, attn_decoder_loss=0.2735, over 29495.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1491, cr_loss=0.3896, attn_decoder_loss=0.2563, over 5791378.60 frames. ], batch size: 90, lr: 7.37e-03, grad_scale: 8.0 2024-09-17 17:46:40,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2024-09-17 17:47:02,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=269840.0, ans=0.125 2024-09-17 17:47:08,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=269880.0, ans=0.125 2024-09-17 17:47:14,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2024-09-17 17:47:15,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=269880.0, ans=0.125 2024-09-17 17:47:15,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=269880.0, ans=0.125 2024-09-17 17:47:23,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=269920.0, ans=0.125 2024-09-17 17:47:45,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2024-09-17 17:47:50,741 INFO [train.py:1198] (1/2) Epoch 15, batch 4150, loss[loss=0.2573, ctc_loss=0.1541, cr_loss=0.4002, attn_decoder_loss=0.2599, over 29477.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1488, cr_loss=0.3893, attn_decoder_loss=0.2561, over 5796669.16 frames. ], batch size: 77, lr: 7.36e-03, grad_scale: 8.0 2024-09-17 17:48:05,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=270040.0, ans=0.0 2024-09-17 17:48:21,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=22.5 2024-09-17 17:48:35,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=270120.0, ans=0.0 2024-09-17 17:48:42,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270120.0, ans=0.1 2024-09-17 17:48:45,085 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.781e+01 9.267e+01 9.931e+01 2.534e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-17 17:48:46,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=270120.0, ans=0.2 2024-09-17 17:48:58,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=270160.0, ans=0.125 2024-09-17 17:49:06,149 INFO [train.py:1198] (1/2) Epoch 15, batch 4200, loss[loss=0.27, ctc_loss=0.1646, cr_loss=0.4262, attn_decoder_loss=0.2722, over 29486.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1488, cr_loss=0.3894, attn_decoder_loss=0.2562, over 5797682.93 frames. ], batch size: 90, lr: 7.36e-03, grad_scale: 8.0 2024-09-17 17:49:15,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=270200.0, ans=0.2 2024-09-17 17:49:15,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=270200.0, ans=0.0 2024-09-17 17:49:22,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=270240.0, ans=0.125 2024-09-17 17:49:38,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=270280.0, ans=0.2 2024-09-17 17:49:55,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=270320.0, ans=0.025 2024-09-17 17:50:06,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=270360.0, ans=0.025 2024-09-17 17:50:21,302 INFO [train.py:1198] (1/2) Epoch 15, batch 4250, loss[loss=0.2344, ctc_loss=0.1311, cr_loss=0.3534, attn_decoder_loss=0.238, over 29514.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1484, cr_loss=0.3889, attn_decoder_loss=0.2564, over 5803571.93 frames. ], batch size: 74, lr: 7.36e-03, grad_scale: 8.0 2024-09-17 17:50:24,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=270400.0, ans=0.125 2024-09-17 17:50:31,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=270400.0, ans=0.2 2024-09-17 17:50:46,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=270440.0, ans=0.04949747468305833 2024-09-17 17:50:55,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=270480.0, ans=0.125 2024-09-17 17:50:59,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=270480.0, ans=0.125 2024-09-17 17:51:11,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=270520.0, ans=0.125 2024-09-17 17:51:14,266 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.903e+01 9.469e+01 1.020e+02 2.237e+02, threshold=1.894e+02, percent-clipped=2.0 2024-09-17 17:51:19,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=270560.0, ans=0.0 2024-09-17 17:51:35,161 INFO [train.py:1198] (1/2) Epoch 15, batch 4300, loss[loss=0.2573, ctc_loss=0.1443, cr_loss=0.3775, attn_decoder_loss=0.2615, over 29549.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1486, cr_loss=0.3888, attn_decoder_loss=0.2566, over 5793885.01 frames. ], batch size: 87, lr: 7.36e-03, grad_scale: 8.0 2024-09-17 17:51:55,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=270640.0, ans=0.125 2024-09-17 17:52:03,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=270640.0, ans=0.125 2024-09-17 17:52:27,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=270720.0, ans=0.125 2024-09-17 17:52:32,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270720.0, ans=0.1 2024-09-17 17:52:40,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=22.5 2024-09-17 17:52:50,156 INFO [train.py:1198] (1/2) Epoch 15, batch 4350, loss[loss=0.2618, ctc_loss=0.1615, cr_loss=0.3952, attn_decoder_loss=0.2642, over 29471.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1516, cr_loss=0.3938, attn_decoder_loss=0.2599, over 5796545.41 frames. ], batch size: 97, lr: 7.35e-03, grad_scale: 8.0 2024-09-17 17:53:01,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=270800.0, ans=0.04949747468305833 2024-09-17 17:53:06,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=270840.0, ans=0.025 2024-09-17 17:53:10,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=270840.0, ans=0.125 2024-09-17 17:53:16,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.05 vs. limit=15.0 2024-09-17 17:53:18,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=270880.0, ans=0.0 2024-09-17 17:53:26,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=270880.0, ans=0.125 2024-09-17 17:53:27,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=270880.0, ans=0.125 2024-09-17 17:53:43,655 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.462e+01 9.074e+01 9.575e+01 1.004e+02 1.676e+02, threshold=1.915e+02, percent-clipped=0.0 2024-09-17 17:54:03,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=271000.0, ans=0.025 2024-09-17 17:54:04,374 INFO [train.py:1198] (1/2) Epoch 15, batch 4400, loss[loss=0.2705, ctc_loss=0.1694, cr_loss=0.4379, attn_decoder_loss=0.272, over 27182.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1535, cr_loss=0.3967, attn_decoder_loss=0.2623, over 5767137.01 frames. ], batch size: 124, lr: 7.35e-03, grad_scale: 16.0 2024-09-17 17:55:18,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=271200.0, ans=0.125 2024-09-17 17:55:19,873 INFO [train.py:1198] (1/2) Epoch 15, batch 4450, loss[loss=0.2816, ctc_loss=0.1984, cr_loss=0.4188, attn_decoder_loss=0.2816, over 19305.00 frames. ], tot_loss[loss=0.2622, ctc_loss=0.1585, cr_loss=0.4017, attn_decoder_loss=0.2648, over 5570144.65 frames. ], batch size: 209, lr: 7.35e-03, grad_scale: 8.0 2024-09-17 17:55:20,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=271200.0, ans=0.2 2024-09-17 17:55:25,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=22.5 2024-09-17 17:55:30,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=271200.0, ans=0.125 2024-09-17 17:55:45,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=271240.0, ans=0.125 2024-09-17 17:55:51,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2024-09-17 17:55:53,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=271280.0, ans=0.2 2024-09-17 17:56:09,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-09-17 17:56:17,153 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.262e+01 9.506e+01 1.069e+02 1.178e+02 1.981e+02, threshold=2.138e+02, percent-clipped=1.0 2024-09-17 17:56:19,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.40 vs. limit=10.0 2024-09-17 17:56:30,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=271360.0, ans=0.125 2024-09-17 17:56:35,250 INFO [train.py:1198] (1/2) Epoch 15, batch 4500, loss[loss=0.2832, ctc_loss=0.1905, cr_loss=0.4282, attn_decoder_loss=0.284, over 20377.00 frames. ], tot_loss[loss=0.2654, ctc_loss=0.1643, cr_loss=0.4043, attn_decoder_loss=0.2677, over 5229556.26 frames. ], batch size: 209, lr: 7.35e-03, grad_scale: 8.0 2024-09-17 17:56:38,645 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:56:42,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=271400.0, ans=0.125 2024-09-17 17:56:43,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-09-17 17:56:46,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=271400.0, ans=0.2 2024-09-17 17:58:05,196 INFO [train.py:1198] (1/2) Epoch 16, batch 0, loss[loss=0.2358, ctc_loss=0.136, cr_loss=0.3633, attn_decoder_loss=0.2388, over 29605.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.136, cr_loss=0.3633, attn_decoder_loss=0.2388, over 29605.00 frames. ], batch size: 73, lr: 7.11e-03, grad_scale: 16.0 2024-09-17 17:58:05,196 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 17:58:11,303 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.4.self_attn_weights, attn_weights_entropy = tensor([4.6454, 4.1344, 3.7429, 4.4286, 3.5211, 3.6254, 3.7390, 3.7935], device='cuda:1') 2024-09-17 17:58:23,630 INFO [train.py:1230] (1/2) Epoch 16, validation: loss=0.2124, ctc_loss=0.04089, cr_loss=4.638e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-17 17:58:23,630 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 17:58:29,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=271500.0, ans=0.125 2024-09-17 17:58:29,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=271500.0, ans=0.125 2024-09-17 17:58:34,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=7.48 vs. limit=12.0 2024-09-17 17:59:08,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=271620.0, ans=0.125 2024-09-17 17:59:13,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=271620.0, ans=0.025 2024-09-17 17:59:16,653 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:59:19,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-09-17 17:59:19,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=22.5 2024-09-17 17:59:40,416 INFO [train.py:1198] (1/2) Epoch 16, batch 50, loss[loss=0.2203, ctc_loss=0.1201, cr_loss=0.3373, attn_decoder_loss=0.2239, over 29449.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1521, cr_loss=0.3965, attn_decoder_loss=0.2578, over 1267704.93 frames. ], batch size: 70, lr: 7.11e-03, grad_scale: 8.0 2024-09-17 17:59:54,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=271740.0, ans=0.125 2024-09-17 18:00:01,933 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.262e+01 1.005e+02 1.104e+02 1.206e+02 4.510e+02, threshold=2.208e+02, percent-clipped=2.0 2024-09-17 18:00:19,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=271780.0, ans=15.0 2024-09-17 18:00:50,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=271860.0, ans=0.125 2024-09-17 18:00:56,280 INFO [train.py:1198] (1/2) Epoch 16, batch 100, loss[loss=0.2452, ctc_loss=0.1436, cr_loss=0.3803, attn_decoder_loss=0.2481, over 29542.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1527, cr_loss=0.3964, attn_decoder_loss=0.2603, over 2252277.18 frames. ], batch size: 76, lr: 7.10e-03, grad_scale: 8.0 2024-09-17 18:01:13,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.81 vs. limit=22.5 2024-09-17 18:01:14,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=271940.0, ans=0.0 2024-09-17 18:01:26,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.26 vs. limit=15.0 2024-09-17 18:01:33,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=271980.0, ans=0.125 2024-09-17 18:01:57,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=272020.0, ans=0.0 2024-09-17 18:02:02,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=272020.0, ans=0.1 2024-09-17 18:02:18,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=272060.0, ans=0.0 2024-09-17 18:02:21,066 INFO [train.py:1198] (1/2) Epoch 16, batch 150, loss[loss=0.2316, ctc_loss=0.1353, cr_loss=0.3637, attn_decoder_loss=0.2343, over 29432.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1494, cr_loss=0.3926, attn_decoder_loss=0.2569, over 3048296.68 frames. ], batch size: 70, lr: 7.10e-03, grad_scale: 8.0 2024-09-17 18:02:23,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-17 18:02:35,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=272140.0, ans=0.125 2024-09-17 18:02:40,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.23 vs. limit=15.0 2024-09-17 18:02:42,307 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.615e+01 9.462e+01 1.007e+02 3.571e+02, threshold=1.892e+02, percent-clipped=1.0 2024-09-17 18:02:43,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2024-09-17 18:02:52,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=272180.0, ans=0.125 2024-09-17 18:02:56,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=272180.0, ans=0.125 2024-09-17 18:03:17,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=272220.0, ans=0.125 2024-09-17 18:03:32,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=272260.0, ans=0.025 2024-09-17 18:03:38,603 INFO [train.py:1198] (1/2) Epoch 16, batch 200, loss[loss=0.271, ctc_loss=0.1639, cr_loss=0.4228, attn_decoder_loss=0.2735, over 27236.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1485, cr_loss=0.3906, attn_decoder_loss=0.2556, over 3658982.98 frames. ], batch size: 124, lr: 7.10e-03, grad_scale: 8.0 2024-09-17 18:04:54,434 INFO [train.py:1198] (1/2) Epoch 16, batch 250, loss[loss=0.2749, ctc_loss=0.164, cr_loss=0.4137, attn_decoder_loss=0.278, over 29206.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1485, cr_loss=0.3912, attn_decoder_loss=0.2562, over 4141418.96 frames. ], batch size: 100, lr: 7.10e-03, grad_scale: 8.0 2024-09-17 18:05:15,431 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.691e+01 9.311e+01 9.688e+01 2.016e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-17 18:05:15,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=272540.0, ans=0.125 2024-09-17 18:05:18,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=272540.0, ans=0.2 2024-09-17 18:05:27,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-09-17 18:05:31,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2024-09-17 18:05:43,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.79 vs. limit=15.0 2024-09-17 18:05:54,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=272620.0, ans=0.125 2024-09-17 18:05:59,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-09-17 18:06:12,071 INFO [train.py:1198] (1/2) Epoch 16, batch 300, loss[loss=0.2719, ctc_loss=0.1555, cr_loss=0.4101, attn_decoder_loss=0.2757, over 29539.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1481, cr_loss=0.3908, attn_decoder_loss=0.256, over 4510126.14 frames. ], batch size: 92, lr: 7.09e-03, grad_scale: 8.0 2024-09-17 18:06:12,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=272700.0, ans=0.025 2024-09-17 18:06:35,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-09-17 18:06:52,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=272780.0, ans=0.2 2024-09-17 18:07:12,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=272820.0, ans=0.125 2024-09-17 18:07:30,517 INFO [train.py:1198] (1/2) Epoch 16, batch 350, loss[loss=0.2283, ctc_loss=0.1268, cr_loss=0.3353, attn_decoder_loss=0.2321, over 29310.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1482, cr_loss=0.3908, attn_decoder_loss=0.2563, over 4796195.94 frames. ], batch size: 71, lr: 7.09e-03, grad_scale: 8.0 2024-09-17 18:07:33,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=272900.0, ans=0.2 2024-09-17 18:07:51,716 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.832e+01 9.583e+01 1.052e+02 2.461e+02, threshold=1.917e+02, percent-clipped=3.0 2024-09-17 18:07:59,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=272980.0, ans=0.125 2024-09-17 18:08:31,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=273060.0, ans=0.2 2024-09-17 18:08:31,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=273060.0, ans=0.0 2024-09-17 18:08:32,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=273060.0, ans=0.125 2024-09-17 18:08:45,894 INFO [train.py:1198] (1/2) Epoch 16, batch 400, loss[loss=0.2667, ctc_loss=0.1578, cr_loss=0.407, attn_decoder_loss=0.2697, over 29721.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1478, cr_loss=0.39, attn_decoder_loss=0.2559, over 5025881.51 frames. ], batch size: 82, lr: 7.09e-03, grad_scale: 16.0 2024-09-17 18:08:47,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=273100.0, ans=0.125 2024-09-17 18:08:53,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=273100.0, ans=0.2 2024-09-17 18:08:56,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=273100.0, ans=0.0 2024-09-17 18:09:00,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.32 vs. limit=22.5 2024-09-17 18:09:21,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=273180.0, ans=0.125 2024-09-17 18:09:25,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=273180.0, ans=0.125 2024-09-17 18:09:35,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=273220.0, ans=0.125 2024-09-17 18:09:38,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=273220.0, ans=0.125 2024-09-17 18:09:41,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=273220.0, ans=0.125 2024-09-17 18:09:57,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=273260.0, ans=0.0 2024-09-17 18:10:00,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=273260.0, ans=0.125 2024-09-17 18:10:04,243 INFO [train.py:1198] (1/2) Epoch 16, batch 450, loss[loss=0.261, ctc_loss=0.1526, cr_loss=0.4124, attn_decoder_loss=0.2639, over 29707.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.148, cr_loss=0.3905, attn_decoder_loss=0.256, over 5187344.06 frames. ], batch size: 83, lr: 7.09e-03, grad_scale: 8.0 2024-09-17 18:10:16,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=273300.0, ans=0.125 2024-09-17 18:10:25,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=273340.0, ans=0.0 2024-09-17 18:10:26,872 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.756e+01 9.412e+01 1.001e+02 2.554e+02, threshold=1.882e+02, percent-clipped=1.0 2024-09-17 18:11:00,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=273420.0, ans=0.2 2024-09-17 18:11:16,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=273460.0, ans=0.025 2024-09-17 18:11:22,627 INFO [train.py:1198] (1/2) Epoch 16, batch 500, loss[loss=0.2733, ctc_loss=0.165, cr_loss=0.4468, attn_decoder_loss=0.2754, over 29362.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1472, cr_loss=0.3894, attn_decoder_loss=0.255, over 5329993.98 frames. ], batch size: 94, lr: 7.08e-03, grad_scale: 8.0 2024-09-17 18:11:23,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2024-09-17 18:11:24,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=273500.0, ans=0.125 2024-09-17 18:11:26,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273500.0, ans=0.1 2024-09-17 18:11:35,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=273500.0, ans=0.125 2024-09-17 18:11:38,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=273540.0, ans=0.125 2024-09-17 18:11:41,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-09-17 18:12:30,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.14 vs. limit=15.0 2024-09-17 18:12:39,140 INFO [train.py:1198] (1/2) Epoch 16, batch 550, loss[loss=0.2708, ctc_loss=0.1583, cr_loss=0.405, attn_decoder_loss=0.2743, over 28834.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1471, cr_loss=0.3895, attn_decoder_loss=0.255, over 5421959.40 frames. ], batch size: 104, lr: 7.08e-03, grad_scale: 8.0 2024-09-17 18:12:44,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=273700.0, ans=0.2 2024-09-17 18:12:46,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-09-17 18:13:01,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.696e+01 8.801e+01 9.400e+01 1.011e+02 1.613e+02, threshold=1.880e+02, percent-clipped=0.0 2024-09-17 18:13:27,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=273820.0, ans=0.125 2024-09-17 18:13:28,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=273820.0, ans=0.125 2024-09-17 18:13:32,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.81 vs. limit=15.0 2024-09-17 18:13:33,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=273820.0, ans=0.0 2024-09-17 18:13:57,168 INFO [train.py:1198] (1/2) Epoch 16, batch 600, loss[loss=0.2582, ctc_loss=0.1462, cr_loss=0.3898, attn_decoder_loss=0.262, over 29286.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1468, cr_loss=0.3888, attn_decoder_loss=0.2548, over 5507978.34 frames. ], batch size: 100, lr: 7.08e-03, grad_scale: 8.0 2024-09-17 18:14:32,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=273980.0, ans=0.125 2024-09-17 18:14:54,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=12.0 2024-09-17 18:15:02,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.17 vs. limit=15.0 2024-09-17 18:15:04,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=274060.0, ans=0.1 2024-09-17 18:15:15,059 INFO [train.py:1198] (1/2) Epoch 16, batch 650, loss[loss=0.2572, ctc_loss=0.1436, cr_loss=0.3726, attn_decoder_loss=0.2616, over 29738.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1463, cr_loss=0.3876, attn_decoder_loss=0.2543, over 5585326.45 frames. ], batch size: 81, lr: 7.08e-03, grad_scale: 8.0 2024-09-17 18:15:33,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=274140.0, ans=0.0 2024-09-17 18:15:37,737 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.963e+01 8.981e+01 9.378e+01 1.004e+02 1.703e+02, threshold=1.876e+02, percent-clipped=0.0 2024-09-17 18:16:00,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-17 18:16:06,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=274220.0, ans=0.0 2024-09-17 18:16:13,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=274220.0, ans=0.0 2024-09-17 18:16:22,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274260.0, ans=0.1 2024-09-17 18:16:25,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2024-09-17 18:16:30,940 INFO [train.py:1198] (1/2) Epoch 16, batch 700, loss[loss=0.234, ctc_loss=0.1327, cr_loss=0.3697, attn_decoder_loss=0.237, over 29516.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.147, cr_loss=0.3885, attn_decoder_loss=0.2551, over 5635656.08 frames. ], batch size: 76, lr: 7.07e-03, grad_scale: 8.0 2024-09-17 18:16:38,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=274300.0, ans=0.0 2024-09-17 18:16:46,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274340.0, ans=0.1 2024-09-17 18:16:46,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=274340.0, ans=0.125 2024-09-17 18:16:46,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=274340.0, ans=0.0 2024-09-17 18:16:47,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=274340.0, ans=0.125 2024-09-17 18:16:51,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=274340.0, ans=0.025 2024-09-17 18:16:56,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=274340.0, ans=0.2 2024-09-17 18:16:59,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=274380.0, ans=0.0 2024-09-17 18:17:03,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.24 vs. limit=10.0 2024-09-17 18:17:18,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=274420.0, ans=0.125 2024-09-17 18:17:23,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=274420.0, ans=0.0 2024-09-17 18:17:35,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=274460.0, ans=0.125 2024-09-17 18:17:49,225 INFO [train.py:1198] (1/2) Epoch 16, batch 750, loss[loss=0.2643, ctc_loss=0.1552, cr_loss=0.4181, attn_decoder_loss=0.2671, over 29700.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1471, cr_loss=0.3882, attn_decoder_loss=0.2549, over 5674925.14 frames. ], batch size: 82, lr: 7.07e-03, grad_scale: 8.0 2024-09-17 18:17:50,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=274500.0, ans=0.2 2024-09-17 18:18:11,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.564e+01 9.225e+01 9.974e+01 3.199e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-17 18:18:17,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274580.0, ans=0.1 2024-09-17 18:18:30,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=274580.0, ans=0.1 2024-09-17 18:18:40,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=274620.0, ans=0.05 2024-09-17 18:18:50,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=274660.0, ans=0.0 2024-09-17 18:19:06,929 INFO [train.py:1198] (1/2) Epoch 16, batch 800, loss[loss=0.2239, ctc_loss=0.1218, cr_loss=0.357, attn_decoder_loss=0.2273, over 29596.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.147, cr_loss=0.3884, attn_decoder_loss=0.2549, over 5705979.64 frames. ], batch size: 73, lr: 7.07e-03, grad_scale: 16.0 2024-09-17 18:19:07,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.31 vs. limit=15.0 2024-09-17 18:19:08,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=274700.0, ans=0.125 2024-09-17 18:19:13,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=274700.0, ans=0.0 2024-09-17 18:19:14,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=274700.0, ans=0.0 2024-09-17 18:19:16,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=274700.0, ans=0.125 2024-09-17 18:19:19,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=274700.0, ans=0.025 2024-09-17 18:19:25,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=274740.0, ans=0.125 2024-09-17 18:19:34,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2024-09-17 18:20:03,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=15.0 2024-09-17 18:20:04,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=274820.0, ans=0.0 2024-09-17 18:20:16,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=274860.0, ans=0.125 2024-09-17 18:20:22,023 INFO [train.py:1198] (1/2) Epoch 16, batch 850, loss[loss=0.2617, ctc_loss=0.1451, cr_loss=0.3881, attn_decoder_loss=0.2661, over 29702.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1466, cr_loss=0.3878, attn_decoder_loss=0.2547, over 5735757.01 frames. ], batch size: 89, lr: 7.07e-03, grad_scale: 8.0 2024-09-17 18:20:45,874 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.953e+01 9.515e+01 1.010e+02 2.580e+02, threshold=1.903e+02, percent-clipped=2.0 2024-09-17 18:20:47,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=274940.0, ans=0.1 2024-09-17 18:20:50,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=274980.0, ans=0.0 2024-09-17 18:20:53,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=274980.0, ans=0.2 2024-09-17 18:20:58,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=274980.0, ans=0.1 2024-09-17 18:21:09,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.90 vs. limit=15.0 2024-09-17 18:21:28,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=275060.0, ans=0.125 2024-09-17 18:21:36,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.26 vs. limit=22.5 2024-09-17 18:21:38,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=275100.0, ans=0.0 2024-09-17 18:21:39,955 INFO [train.py:1198] (1/2) Epoch 16, batch 900, loss[loss=0.2281, ctc_loss=0.1263, cr_loss=0.3492, attn_decoder_loss=0.2316, over 29620.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1469, cr_loss=0.388, attn_decoder_loss=0.2551, over 5741851.29 frames. ], batch size: 73, lr: 7.06e-03, grad_scale: 8.0 2024-09-17 18:21:43,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2024-09-17 18:21:47,777 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:21:53,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=275140.0, ans=0.125 2024-09-17 18:21:55,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-09-17 18:22:01,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=275140.0, ans=0.025 2024-09-17 18:22:18,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=275180.0, ans=0.125 2024-09-17 18:22:41,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2024-09-17 18:22:57,939 INFO [train.py:1198] (1/2) Epoch 16, batch 950, loss[loss=0.2246, ctc_loss=0.1183, cr_loss=0.3251, attn_decoder_loss=0.2292, over 29496.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1472, cr_loss=0.3884, attn_decoder_loss=0.2554, over 5743120.89 frames. ], batch size: 74, lr: 7.06e-03, grad_scale: 8.0 2024-09-17 18:23:13,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=275340.0, ans=0.0 2024-09-17 18:23:19,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=275340.0, ans=0.0 2024-09-17 18:23:21,952 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.870e+01 9.132e+01 9.762e+01 1.082e+02 2.725e+02, threshold=1.952e+02, percent-clipped=3.0 2024-09-17 18:23:28,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=275380.0, ans=0.125 2024-09-17 18:23:34,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=275380.0, ans=0.0 2024-09-17 18:24:01,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=275460.0, ans=0.125 2024-09-17 18:24:04,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=275460.0, ans=0.0 2024-09-17 18:24:13,030 INFO [train.py:1198] (1/2) Epoch 16, batch 1000, loss[loss=0.2503, ctc_loss=0.1399, cr_loss=0.3793, attn_decoder_loss=0.2541, over 29490.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1483, cr_loss=0.39, attn_decoder_loss=0.2563, over 5736693.36 frames. ], batch size: 77, lr: 7.06e-03, grad_scale: 8.0 2024-09-17 18:24:36,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=275540.0, ans=0.125 2024-09-17 18:25:01,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2024-09-17 18:25:07,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=275620.0, ans=0.2 2024-09-17 18:25:18,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=275660.0, ans=0.0 2024-09-17 18:25:30,805 INFO [train.py:1198] (1/2) Epoch 16, batch 1050, loss[loss=0.2636, ctc_loss=0.1517, cr_loss=0.4101, attn_decoder_loss=0.2669, over 29678.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1477, cr_loss=0.3893, attn_decoder_loss=0.2557, over 5743916.50 frames. ], batch size: 85, lr: 7.06e-03, grad_scale: 8.0 2024-09-17 18:25:55,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.680e+01 9.093e+01 1.030e+02 1.882e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-17 18:26:01,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=275780.0, ans=0.125 2024-09-17 18:26:30,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=22.5 2024-09-17 18:26:49,382 INFO [train.py:1198] (1/2) Epoch 16, batch 1100, loss[loss=0.2403, ctc_loss=0.1449, cr_loss=0.3855, attn_decoder_loss=0.2423, over 29471.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1473, cr_loss=0.389, attn_decoder_loss=0.2554, over 5755359.28 frames. ], batch size: 78, lr: 7.05e-03, grad_scale: 8.0 2024-09-17 18:27:13,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=275940.0, ans=0.2 2024-09-17 18:27:35,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276020.0, ans=0.1 2024-09-17 18:27:38,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=276020.0, ans=0.0 2024-09-17 18:27:41,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=276020.0, ans=0.05 2024-09-17 18:28:05,488 INFO [train.py:1198] (1/2) Epoch 16, batch 1150, loss[loss=0.2458, ctc_loss=0.1395, cr_loss=0.3777, attn_decoder_loss=0.2492, over 29472.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1476, cr_loss=0.3892, attn_decoder_loss=0.2554, over 5754070.15 frames. ], batch size: 78, lr: 7.05e-03, grad_scale: 8.0 2024-09-17 18:28:26,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.97 vs. limit=15.0 2024-09-17 18:28:29,844 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.744e+01 9.236e+01 1.006e+02 2.528e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-17 18:28:33,434 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:28:44,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=276180.0, ans=0.125 2024-09-17 18:28:54,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=276220.0, ans=0.125 2024-09-17 18:28:57,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=276220.0, ans=0.015 2024-09-17 18:29:14,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=276260.0, ans=0.125 2024-09-17 18:29:20,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=276260.0, ans=0.2 2024-09-17 18:29:23,622 INFO [train.py:1198] (1/2) Epoch 16, batch 1200, loss[loss=0.2682, ctc_loss=0.1665, cr_loss=0.3976, attn_decoder_loss=0.2706, over 29681.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1483, cr_loss=0.39, attn_decoder_loss=0.2561, over 5745707.65 frames. ], batch size: 85, lr: 7.05e-03, grad_scale: 16.0 2024-09-17 18:29:24,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=12.0 2024-09-17 18:29:49,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=276340.0, ans=0.0 2024-09-17 18:30:00,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2024-09-17 18:30:19,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=276420.0, ans=0.125 2024-09-17 18:30:19,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276420.0, ans=0.1 2024-09-17 18:30:23,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=276420.0, ans=0.125 2024-09-17 18:30:41,712 INFO [train.py:1198] (1/2) Epoch 16, batch 1250, loss[loss=0.264, ctc_loss=0.155, cr_loss=0.4191, attn_decoder_loss=0.2668, over 29523.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1481, cr_loss=0.39, attn_decoder_loss=0.2565, over 5773844.06 frames. ], batch size: 92, lr: 7.05e-03, grad_scale: 8.0 2024-09-17 18:30:52,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=276500.0, ans=0.125 2024-09-17 18:31:07,629 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.801e+01 9.250e+01 9.945e+01 2.307e+02, threshold=1.850e+02, percent-clipped=1.0 2024-09-17 18:31:12,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=276580.0, ans=0.125 2024-09-17 18:31:24,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=276580.0, ans=0.125 2024-09-17 18:31:27,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=276620.0, ans=0.2 2024-09-17 18:31:52,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=276660.0, ans=0.0 2024-09-17 18:31:57,699 INFO [train.py:1198] (1/2) Epoch 16, batch 1300, loss[loss=0.263, ctc_loss=0.1512, cr_loss=0.3926, attn_decoder_loss=0.2667, over 28178.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1479, cr_loss=0.3897, attn_decoder_loss=0.2561, over 5779237.34 frames. ], batch size: 111, lr: 7.04e-03, grad_scale: 8.0 2024-09-17 18:31:58,044 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:32:01,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=276700.0, ans=0.0 2024-09-17 18:32:31,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=276780.0, ans=0.0 2024-09-17 18:32:33,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.67 vs. limit=22.5 2024-09-17 18:32:50,129 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:32:50,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.84 vs. limit=15.0 2024-09-17 18:33:05,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=276860.0, ans=0.125 2024-09-17 18:33:14,155 INFO [train.py:1198] (1/2) Epoch 16, batch 1350, loss[loss=0.2529, ctc_loss=0.1514, cr_loss=0.3895, attn_decoder_loss=0.2555, over 29767.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1469, cr_loss=0.3887, attn_decoder_loss=0.2554, over 5796497.26 frames. ], batch size: 81, lr: 7.04e-03, grad_scale: 8.0 2024-09-17 18:33:15,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=276900.0, ans=0.0 2024-09-17 18:33:41,945 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.515e+01 9.086e+01 9.689e+01 1.239e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-17 18:33:45,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=276980.0, ans=0.125 2024-09-17 18:33:48,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=276980.0, ans=0.125 2024-09-17 18:33:53,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2024-09-17 18:34:08,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=277020.0, ans=0.125 2024-09-17 18:34:22,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277060.0, ans=0.1 2024-09-17 18:34:23,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=277060.0, ans=0.125 2024-09-17 18:34:24,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2024-09-17 18:34:34,352 INFO [train.py:1198] (1/2) Epoch 16, batch 1400, loss[loss=0.2185, ctc_loss=0.1193, cr_loss=0.3512, attn_decoder_loss=0.2217, over 29622.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1466, cr_loss=0.3886, attn_decoder_loss=0.2552, over 5807424.80 frames. ], batch size: 69, lr: 7.04e-03, grad_scale: 8.0 2024-09-17 18:34:34,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=277100.0, ans=0.1 2024-09-17 18:34:44,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-09-17 18:35:18,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=277220.0, ans=0.0 2024-09-17 18:35:31,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=277220.0, ans=0.125 2024-09-17 18:35:36,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=12.0 2024-09-17 18:35:49,911 INFO [train.py:1198] (1/2) Epoch 16, batch 1450, loss[loss=0.2671, ctc_loss=0.16, cr_loss=0.4134, attn_decoder_loss=0.2698, over 29439.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1471, cr_loss=0.3894, attn_decoder_loss=0.2559, over 5804962.48 frames. ], batch size: 94, lr: 7.04e-03, grad_scale: 8.0 2024-09-17 18:35:51,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=277300.0, ans=0.125 2024-09-17 18:35:54,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=277300.0, ans=0.1 2024-09-17 18:35:58,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.18 vs. limit=12.0 2024-09-17 18:36:01,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.88 vs. limit=10.0 2024-09-17 18:36:13,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277340.0, ans=0.1 2024-09-17 18:36:15,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.64 vs. limit=15.0 2024-09-17 18:36:15,714 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.888e+01 9.569e+01 1.025e+02 2.533e+02, threshold=1.914e+02, percent-clipped=1.0 2024-09-17 18:36:21,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-09-17 18:36:28,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=277380.0, ans=0.125 2024-09-17 18:36:29,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=277380.0, ans=22.5 2024-09-17 18:36:30,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=22.5 2024-09-17 18:36:34,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=277420.0, ans=0.0 2024-09-17 18:37:05,607 INFO [train.py:1198] (1/2) Epoch 16, batch 1500, loss[loss=0.2679, ctc_loss=0.1665, cr_loss=0.405, attn_decoder_loss=0.2702, over 29620.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1471, cr_loss=0.39, attn_decoder_loss=0.256, over 5806325.86 frames. ], batch size: 86, lr: 7.03e-03, grad_scale: 8.0 2024-09-17 18:37:18,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=277500.0, ans=0.2 2024-09-17 18:37:23,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=277540.0, ans=0.0 2024-09-17 18:37:50,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=277580.0, ans=0.04949747468305833 2024-09-17 18:37:56,658 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:38:13,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=277660.0, ans=0.0 2024-09-17 18:38:13,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277660.0, ans=0.1 2024-09-17 18:38:18,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.08 vs. limit=22.5 2024-09-17 18:38:26,892 INFO [train.py:1198] (1/2) Epoch 16, batch 1550, loss[loss=0.2769, ctc_loss=0.1685, cr_loss=0.4355, attn_decoder_loss=0.2793, over 29518.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.148, cr_loss=0.3906, attn_decoder_loss=0.2564, over 5781554.05 frames. ], batch size: 90, lr: 7.03e-03, grad_scale: 8.0 2024-09-17 18:38:31,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=277700.0, ans=0.125 2024-09-17 18:38:34,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=277700.0, ans=0.125 2024-09-17 18:38:48,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.89 vs. limit=22.5 2024-09-17 18:38:49,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=277740.0, ans=0.1 2024-09-17 18:38:52,448 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.047e+01 8.973e+01 9.587e+01 1.017e+02 1.956e+02, threshold=1.917e+02, percent-clipped=1.0 2024-09-17 18:39:03,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=277780.0, ans=0.125 2024-09-17 18:39:15,405 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:39:21,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=277820.0, ans=0.0 2024-09-17 18:39:23,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277820.0, ans=0.1 2024-09-17 18:39:29,005 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:39:42,132 INFO [train.py:1198] (1/2) Epoch 16, batch 1600, loss[loss=0.2601, ctc_loss=0.1492, cr_loss=0.3731, attn_decoder_loss=0.2641, over 29675.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1479, cr_loss=0.3894, attn_decoder_loss=0.256, over 5765326.46 frames. ], batch size: 85, lr: 7.03e-03, grad_scale: 16.0 2024-09-17 18:39:49,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=277900.0, ans=0.125 2024-09-17 18:40:05,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=277940.0, ans=0.2 2024-09-17 18:40:11,248 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:40:16,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-17 18:40:26,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-09-17 18:40:30,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=278020.0, ans=0.0 2024-09-17 18:40:36,880 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:40:40,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2024-09-17 18:40:57,568 INFO [train.py:1198] (1/2) Epoch 16, batch 1650, loss[loss=0.2627, ctc_loss=0.1479, cr_loss=0.3908, attn_decoder_loss=0.2667, over 29737.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1478, cr_loss=0.3889, attn_decoder_loss=0.2559, over 5759843.85 frames. ], batch size: 89, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:41:06,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=278100.0, ans=0.125 2024-09-17 18:41:14,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-17 18:41:19,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=278140.0, ans=0.0 2024-09-17 18:41:26,853 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.637e+01 9.438e+01 1.013e+02 1.642e+02, threshold=1.888e+02, percent-clipped=0.0 2024-09-17 18:41:46,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=278220.0, ans=0.125 2024-09-17 18:41:50,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=278220.0, ans=0.2 2024-09-17 18:42:01,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=278260.0, ans=0.1 2024-09-17 18:42:13,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=278260.0, ans=0.0 2024-09-17 18:42:14,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2024-09-17 18:42:17,546 INFO [train.py:1198] (1/2) Epoch 16, batch 1700, loss[loss=0.2176, ctc_loss=0.1197, cr_loss=0.3311, attn_decoder_loss=0.2212, over 29556.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1473, cr_loss=0.3882, attn_decoder_loss=0.2556, over 5781333.77 frames. ], batch size: 69, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:42:26,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=278300.0, ans=0.125 2024-09-17 18:43:10,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=278420.0, ans=0.125 2024-09-17 18:43:24,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=278460.0, ans=0.025 2024-09-17 18:43:29,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=278460.0, ans=0.0 2024-09-17 18:43:32,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.25 vs. limit=15.0 2024-09-17 18:43:33,568 INFO [train.py:1198] (1/2) Epoch 16, batch 1750, loss[loss=0.2258, ctc_loss=0.1288, cr_loss=0.3537, attn_decoder_loss=0.2287, over 29292.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1466, cr_loss=0.3874, attn_decoder_loss=0.2551, over 5787408.81 frames. ], batch size: 67, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:44:00,899 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.367e+01 8.955e+01 9.624e+01 1.381e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-17 18:44:29,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=278620.0, ans=0.0 2024-09-17 18:44:49,085 INFO [train.py:1198] (1/2) Epoch 16, batch 1800, loss[loss=0.265, ctc_loss=0.1535, cr_loss=0.4126, attn_decoder_loss=0.2682, over 29674.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1472, cr_loss=0.3882, attn_decoder_loss=0.2555, over 5789023.18 frames. ], batch size: 83, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:44:53,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=278700.0, ans=0.125 2024-09-17 18:45:30,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=278780.0, ans=0.125 2024-09-17 18:45:44,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=278820.0, ans=0.125 2024-09-17 18:46:07,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=278860.0, ans=0.0 2024-09-17 18:46:09,877 INFO [train.py:1198] (1/2) Epoch 16, batch 1850, loss[loss=0.2629, ctc_loss=0.1487, cr_loss=0.4038, attn_decoder_loss=0.2666, over 29647.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1471, cr_loss=0.3883, attn_decoder_loss=0.2553, over 5795882.61 frames. ], batch size: 86, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:46:12,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.83 vs. limit=5.0 2024-09-17 18:46:20,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-09-17 18:46:36,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.719e+01 9.438e+01 1.018e+02 2.897e+02, threshold=1.888e+02, percent-clipped=1.0 2024-09-17 18:46:40,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=278980.0, ans=0.125 2024-09-17 18:46:41,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=278980.0, ans=0.125 2024-09-17 18:47:08,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=279060.0, ans=0.125 2024-09-17 18:47:09,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.16 vs. limit=22.5 2024-09-17 18:47:11,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=279060.0, ans=0.0 2024-09-17 18:47:17,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=279060.0, ans=0.125 2024-09-17 18:47:24,895 INFO [train.py:1198] (1/2) Epoch 16, batch 1900, loss[loss=0.2596, ctc_loss=0.1512, cr_loss=0.396, attn_decoder_loss=0.2628, over 29712.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1473, cr_loss=0.3887, attn_decoder_loss=0.2557, over 5803711.86 frames. ], batch size: 89, lr: 7.01e-03, grad_scale: 8.0 2024-09-17 18:47:38,964 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:47:44,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=279140.0, ans=0.025 2024-09-17 18:47:51,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.00 vs. limit=22.5 2024-09-17 18:47:52,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=279140.0, ans=0.125 2024-09-17 18:47:54,435 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:48:01,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-17 18:48:08,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=279180.0, ans=0.125 2024-09-17 18:48:10,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=279220.0, ans=0.125 2024-09-17 18:48:27,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.17 vs. limit=15.0 2024-09-17 18:48:41,359 INFO [train.py:1198] (1/2) Epoch 16, batch 1950, loss[loss=0.2397, ctc_loss=0.1322, cr_loss=0.3604, attn_decoder_loss=0.2436, over 29418.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1475, cr_loss=0.3895, attn_decoder_loss=0.2567, over 5818614.95 frames. ], batch size: 78, lr: 7.01e-03, grad_scale: 8.0 2024-09-17 18:48:55,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=279340.0, ans=0.2 2024-09-17 18:49:08,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=279340.0, ans=0.125 2024-09-17 18:49:11,013 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.858e+01 9.399e+01 1.005e+02 1.788e+02, threshold=1.880e+02, percent-clipped=1.0 2024-09-17 18:49:27,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-09-17 18:49:30,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=279420.0, ans=0.125 2024-09-17 18:49:32,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.32 vs. limit=15.0 2024-09-17 18:49:37,821 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:49:51,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=22.5 2024-09-17 18:49:54,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=279460.0, ans=0.05 2024-09-17 18:50:00,612 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:50:01,812 INFO [train.py:1198] (1/2) Epoch 16, batch 2000, loss[loss=0.2349, ctc_loss=0.1407, cr_loss=0.3738, attn_decoder_loss=0.237, over 29336.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1485, cr_loss=0.3906, attn_decoder_loss=0.2573, over 5795109.36 frames. ], batch size: 67, lr: 7.01e-03, grad_scale: 16.0 2024-09-17 18:50:20,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=279540.0, ans=0.125 2024-09-17 18:50:36,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.40 vs. limit=15.0 2024-09-17 18:50:37,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.75 vs. limit=15.0 2024-09-17 18:50:52,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=279620.0, ans=0.125 2024-09-17 18:51:17,833 INFO [train.py:1198] (1/2) Epoch 16, batch 2050, loss[loss=0.2193, ctc_loss=0.1221, cr_loss=0.3362, attn_decoder_loss=0.2227, over 29413.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1477, cr_loss=0.3892, attn_decoder_loss=0.2562, over 5787449.11 frames. ], batch size: 70, lr: 7.01e-03, grad_scale: 8.0 2024-09-17 18:51:19,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279700.0, ans=0.1 2024-09-17 18:51:21,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=279700.0, ans=0.125 2024-09-17 18:51:27,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=279700.0, ans=0.125 2024-09-17 18:51:46,760 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.024e+01 1.001e+02 1.116e+02 1.891e+02, threshold=2.001e+02, percent-clipped=1.0 2024-09-17 18:51:50,299 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:52:03,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=279820.0, ans=0.125 2024-09-17 18:52:20,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.90 vs. limit=22.5 2024-09-17 18:52:20,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.22 vs. limit=15.0 2024-09-17 18:52:33,743 INFO [train.py:1198] (1/2) Epoch 16, batch 2100, loss[loss=0.2451, ctc_loss=0.132, cr_loss=0.3543, attn_decoder_loss=0.2498, over 29723.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1476, cr_loss=0.3893, attn_decoder_loss=0.256, over 5799744.69 frames. ], batch size: 81, lr: 7.00e-03, grad_scale: 8.0 2024-09-17 18:52:37,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=279900.0, ans=0.125 2024-09-17 18:52:42,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-09-17 18:52:46,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=279900.0, ans=0.125 2024-09-17 18:53:47,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=280060.0, ans=0.0 2024-09-17 18:53:48,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=280060.0, ans=0.125 2024-09-17 18:53:53,808 INFO [train.py:1198] (1/2) Epoch 16, batch 2150, loss[loss=0.2547, ctc_loss=0.1531, cr_loss=0.4027, attn_decoder_loss=0.257, over 29456.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1468, cr_loss=0.3883, attn_decoder_loss=0.2553, over 5814794.54 frames. ], batch size: 78, lr: 7.00e-03, grad_scale: 8.0 2024-09-17 18:53:55,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=280100.0, ans=0.125 2024-09-17 18:54:08,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-09-17 18:54:12,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280140.0, ans=0.125 2024-09-17 18:54:22,950 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 8.753e+01 9.321e+01 9.810e+01 1.786e+02, threshold=1.864e+02, percent-clipped=0.0 2024-09-17 18:54:23,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=280180.0, ans=0.0 2024-09-17 18:54:23,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=280180.0, ans=0.125 2024-09-17 18:54:30,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-17 18:54:44,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280220.0, ans=0.125 2024-09-17 18:54:52,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280220.0, ans=0.1 2024-09-17 18:54:53,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=280260.0, ans=0.035 2024-09-17 18:55:00,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.54 vs. limit=22.5 2024-09-17 18:55:10,169 INFO [train.py:1198] (1/2) Epoch 16, batch 2200, loss[loss=0.2692, ctc_loss=0.1551, cr_loss=0.412, attn_decoder_loss=0.2728, over 29659.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1469, cr_loss=0.389, attn_decoder_loss=0.2553, over 5811188.11 frames. ], batch size: 86, lr: 7.00e-03, grad_scale: 8.0 2024-09-17 18:55:47,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.07 vs. limit=6.0 2024-09-17 18:55:49,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=280380.0, ans=0.125 2024-09-17 18:56:11,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280460.0, ans=0.125 2024-09-17 18:56:16,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=280460.0, ans=0.0 2024-09-17 18:56:24,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=280500.0, ans=0.125 2024-09-17 18:56:25,748 INFO [train.py:1198] (1/2) Epoch 16, batch 2250, loss[loss=0.2626, ctc_loss=0.1524, cr_loss=0.3786, attn_decoder_loss=0.2665, over 29693.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1471, cr_loss=0.3892, attn_decoder_loss=0.2555, over 5812007.59 frames. ], batch size: 82, lr: 7.00e-03, grad_scale: 8.0 2024-09-17 18:56:29,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280500.0, ans=0.1 2024-09-17 18:56:29,151 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:56:38,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=280500.0, ans=0.0 2024-09-17 18:56:54,264 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.834e+01 9.325e+01 1.002e+02 2.125e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-17 18:57:41,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=280660.0, ans=0.125 2024-09-17 18:57:45,349 INFO [train.py:1198] (1/2) Epoch 16, batch 2300, loss[loss=0.2209, ctc_loss=0.1238, cr_loss=0.3512, attn_decoder_loss=0.2239, over 29349.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1462, cr_loss=0.3873, attn_decoder_loss=0.2542, over 5800407.76 frames. ], batch size: 71, lr: 6.99e-03, grad_scale: 8.0 2024-09-17 18:57:51,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=280700.0, ans=0.025 2024-09-17 18:58:03,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=280740.0, ans=0.125 2024-09-17 18:58:06,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280740.0, ans=0.1 2024-09-17 18:58:32,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280820.0, ans=0.125 2024-09-17 18:58:48,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.76 vs. limit=15.0 2024-09-17 18:58:52,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=280860.0, ans=0.0 2024-09-17 18:58:58,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=280860.0, ans=0.0 2024-09-17 18:59:01,560 INFO [train.py:1198] (1/2) Epoch 16, batch 2350, loss[loss=0.2717, ctc_loss=0.1646, cr_loss=0.3891, attn_decoder_loss=0.275, over 29697.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1466, cr_loss=0.3879, attn_decoder_loss=0.2546, over 5805837.84 frames. ], batch size: 83, lr: 6.99e-03, grad_scale: 8.0 2024-09-17 18:59:30,268 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.837e+01 9.442e+01 1.004e+02 6.270e+02, threshold=1.888e+02, percent-clipped=1.0 2024-09-17 19:00:17,721 INFO [train.py:1198] (1/2) Epoch 16, batch 2400, loss[loss=0.2427, ctc_loss=0.1376, cr_loss=0.3628, attn_decoder_loss=0.2464, over 29521.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1463, cr_loss=0.388, attn_decoder_loss=0.2548, over 5808351.14 frames. ], batch size: 76, lr: 6.99e-03, grad_scale: 16.0 2024-09-17 19:00:44,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=10.88 vs. limit=15.0 2024-09-17 19:01:16,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=281220.0, ans=0.035 2024-09-17 19:01:28,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=281260.0, ans=0.025 2024-09-17 19:01:30,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=281260.0, ans=0.0 2024-09-17 19:01:33,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2024-09-17 19:01:36,118 INFO [train.py:1198] (1/2) Epoch 16, batch 2450, loss[loss=0.2755, ctc_loss=0.1693, cr_loss=0.4428, attn_decoder_loss=0.2774, over 29743.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1472, cr_loss=0.3891, attn_decoder_loss=0.2559, over 5786707.30 frames. ], batch size: 82, lr: 6.99e-03, grad_scale: 8.0 2024-09-17 19:01:37,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=281300.0, ans=0.1 2024-09-17 19:01:45,533 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:01:52,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=281340.0, ans=0.125 2024-09-17 19:02:02,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=281340.0, ans=0.0 2024-09-17 19:02:06,408 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 9.367e+01 1.015e+02 1.200e+02 3.423e+02, threshold=2.029e+02, percent-clipped=2.0 2024-09-17 19:02:15,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=281380.0, ans=0.0 2024-09-17 19:02:28,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=281420.0, ans=0.2 2024-09-17 19:02:35,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=281460.0, ans=0.0 2024-09-17 19:02:50,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=281500.0, ans=0.1 2024-09-17 19:02:51,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=22.5 2024-09-17 19:02:51,879 INFO [train.py:1198] (1/2) Epoch 16, batch 2500, loss[loss=0.2538, ctc_loss=0.1397, cr_loss=0.3781, attn_decoder_loss=0.258, over 29626.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1469, cr_loss=0.3887, attn_decoder_loss=0.2559, over 5796636.57 frames. ], batch size: 86, lr: 6.98e-03, grad_scale: 8.0 2024-09-17 19:02:58,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281500.0, ans=0.1 2024-09-17 19:03:13,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=281540.0, ans=0.0 2024-09-17 19:03:24,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=281580.0, ans=0.1 2024-09-17 19:03:24,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=281580.0, ans=0.0 2024-09-17 19:03:33,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=12.0 2024-09-17 19:03:45,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=281620.0, ans=0.125 2024-09-17 19:03:54,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=281660.0, ans=0.125 2024-09-17 19:03:56,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=281660.0, ans=0.125 2024-09-17 19:04:08,345 INFO [train.py:1198] (1/2) Epoch 16, batch 2550, loss[loss=0.2141, ctc_loss=0.109, cr_loss=0.3275, attn_decoder_loss=0.2185, over 29316.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1467, cr_loss=0.3882, attn_decoder_loss=0.2556, over 5798965.11 frames. ], batch size: 67, lr: 6.98e-03, grad_scale: 8.0 2024-09-17 19:04:37,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=281780.0, ans=0.0 2024-09-17 19:04:38,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.18 vs. limit=5.0 2024-09-17 19:04:38,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.661e+01 9.348e+01 1.013e+02 3.774e+02, threshold=1.870e+02, percent-clipped=2.0 2024-09-17 19:05:02,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=281820.0, ans=0.0 2024-09-17 19:05:07,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=281820.0, ans=0.125 2024-09-17 19:05:13,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=281860.0, ans=0.0 2024-09-17 19:05:28,338 INFO [train.py:1198] (1/2) Epoch 16, batch 2600, loss[loss=0.2565, ctc_loss=0.149, cr_loss=0.4052, attn_decoder_loss=0.2594, over 29437.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.147, cr_loss=0.3888, attn_decoder_loss=0.256, over 5795439.51 frames. ], batch size: 78, lr: 6.98e-03, grad_scale: 8.0 2024-09-17 19:05:38,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=281900.0, ans=0.0 2024-09-17 19:05:46,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=281940.0, ans=0.0 2024-09-17 19:05:55,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=281940.0, ans=0.1 2024-09-17 19:06:00,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=281980.0, ans=0.125 2024-09-17 19:06:22,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=282020.0, ans=0.0 2024-09-17 19:06:25,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=282020.0, ans=0.0 2024-09-17 19:06:43,657 INFO [train.py:1198] (1/2) Epoch 16, batch 2650, loss[loss=0.282, ctc_loss=0.175, cr_loss=0.4406, attn_decoder_loss=0.2841, over 29319.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1475, cr_loss=0.3898, attn_decoder_loss=0.2566, over 5802060.40 frames. ], batch size: 100, lr: 6.98e-03, grad_scale: 8.0 2024-09-17 19:06:43,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=282100.0, ans=0.1 2024-09-17 19:07:06,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=282140.0, ans=0.0 2024-09-17 19:07:13,814 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 8.834e+01 9.287e+01 9.746e+01 1.582e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-17 19:07:18,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=282180.0, ans=0.5 2024-09-17 19:07:21,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=282180.0, ans=0.1 2024-09-17 19:07:59,126 INFO [train.py:1198] (1/2) Epoch 16, batch 2700, loss[loss=0.2669, ctc_loss=0.1485, cr_loss=0.4093, attn_decoder_loss=0.2709, over 29534.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1477, cr_loss=0.3903, attn_decoder_loss=0.2567, over 5798618.99 frames. ], batch size: 87, lr: 6.97e-03, grad_scale: 8.0 2024-09-17 19:08:02,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282300.0, ans=0.1 2024-09-17 19:08:07,131 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:08:10,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=282300.0, ans=0.025 2024-09-17 19:08:17,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=282340.0, ans=0.125 2024-09-17 19:08:50,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=282420.0, ans=0.125 2024-09-17 19:09:19,244 INFO [train.py:1198] (1/2) Epoch 16, batch 2750, loss[loss=0.2401, ctc_loss=0.1438, cr_loss=0.3812, attn_decoder_loss=0.2423, over 29513.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1469, cr_loss=0.3888, attn_decoder_loss=0.2558, over 5796951.03 frames. ], batch size: 75, lr: 6.97e-03, grad_scale: 8.0 2024-09-17 19:09:44,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-09-17 19:09:49,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.861e+01 8.782e+01 9.545e+01 1.036e+02 3.066e+02, threshold=1.909e+02, percent-clipped=3.0 2024-09-17 19:09:51,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=282580.0, ans=0.0 2024-09-17 19:09:53,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=282580.0, ans=0.2 2024-09-17 19:09:57,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=282580.0, ans=0.2 2024-09-17 19:10:03,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=282620.0, ans=0.125 2024-09-17 19:10:12,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=282620.0, ans=0.07 2024-09-17 19:10:27,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=282660.0, ans=0.0 2024-09-17 19:10:34,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=282700.0, ans=0.025 2024-09-17 19:10:34,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2024-09-17 19:10:35,383 INFO [train.py:1198] (1/2) Epoch 16, batch 2800, loss[loss=0.2839, ctc_loss=0.1881, cr_loss=0.4131, attn_decoder_loss=0.2854, over 20176.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1472, cr_loss=0.3893, attn_decoder_loss=0.2559, over 5777417.17 frames. ], batch size: 209, lr: 6.97e-03, grad_scale: 16.0 2024-09-17 19:11:02,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.32 vs. limit=22.5 2024-09-17 19:11:04,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-17 19:11:08,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=282780.0, ans=0.0 2024-09-17 19:11:16,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=282780.0, ans=0.0 2024-09-17 19:11:34,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=282860.0, ans=0.125 2024-09-17 19:11:49,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=282900.0, ans=0.2 2024-09-17 19:11:50,921 INFO [train.py:1198] (1/2) Epoch 16, batch 2850, loss[loss=0.2373, ctc_loss=0.1273, cr_loss=0.3596, attn_decoder_loss=0.2415, over 29534.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1479, cr_loss=0.3903, attn_decoder_loss=0.2563, over 5760900.57 frames. ], batch size: 77, lr: 6.97e-03, grad_scale: 8.0 2024-09-17 19:11:53,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-09-17 19:12:03,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282900.0, ans=0.1 2024-09-17 19:12:14,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=282940.0, ans=0.125 2024-09-17 19:12:24,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.830e+01 9.442e+01 1.037e+02 2.855e+02, threshold=1.888e+02, percent-clipped=3.0 2024-09-17 19:12:44,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=283020.0, ans=0.09899494936611666 2024-09-17 19:12:44,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=283020.0, ans=0.125 2024-09-17 19:13:10,861 INFO [train.py:1198] (1/2) Epoch 16, batch 2900, loss[loss=0.2542, ctc_loss=0.1536, cr_loss=0.396, attn_decoder_loss=0.2565, over 29412.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1481, cr_loss=0.3916, attn_decoder_loss=0.257, over 5787422.05 frames. ], batch size: 79, lr: 6.96e-03, grad_scale: 8.0 2024-09-17 19:13:33,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=283140.0, ans=0.125 2024-09-17 19:13:45,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=283180.0, ans=0.0 2024-09-17 19:13:54,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2024-09-17 19:14:15,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=283260.0, ans=0.125 2024-09-17 19:14:18,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=283260.0, ans=0.125 2024-09-17 19:14:27,127 INFO [train.py:1198] (1/2) Epoch 16, batch 2950, loss[loss=0.2392, ctc_loss=0.1353, cr_loss=0.3813, attn_decoder_loss=0.2422, over 29510.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.147, cr_loss=0.3894, attn_decoder_loss=0.2556, over 5781908.60 frames. ], batch size: 75, lr: 6.96e-03, grad_scale: 8.0 2024-09-17 19:14:27,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=283300.0, ans=0.125 2024-09-17 19:14:44,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=283340.0, ans=0.025 2024-09-17 19:14:56,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=283380.0, ans=0.125 2024-09-17 19:14:58,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.668e+01 9.077e+01 9.673e+01 1.448e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-17 19:15:00,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=283380.0, ans=0.125 2024-09-17 19:15:11,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=283420.0, ans=0.125 2024-09-17 19:15:15,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283420.0, ans=0.1 2024-09-17 19:15:20,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=283420.0, ans=0.0 2024-09-17 19:15:23,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283420.0, ans=0.1 2024-09-17 19:15:35,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=283460.0, ans=0.125 2024-09-17 19:15:42,948 INFO [train.py:1198] (1/2) Epoch 16, batch 3000, loss[loss=0.2457, ctc_loss=0.1393, cr_loss=0.3775, attn_decoder_loss=0.2491, over 29729.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1467, cr_loss=0.3889, attn_decoder_loss=0.2552, over 5782273.08 frames. ], batch size: 81, lr: 6.96e-03, grad_scale: 8.0 2024-09-17 19:15:42,949 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 19:15:54,548 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3261, 4.9228, 4.6825, 4.6293], device='cuda:1') 2024-09-17 19:16:01,439 INFO [train.py:1230] (1/2) Epoch 16, validation: loss=0.2115, ctc_loss=0.04131, cr_loss=4.919e-15, attn_decoder_loss=0.2304, over 944034.00 frames. 2024-09-17 19:16:01,439 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 19:16:24,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=283540.0, ans=0.1 2024-09-17 19:16:53,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.50 vs. limit=12.0 2024-09-17 19:17:10,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=283660.0, ans=0.125 2024-09-17 19:17:22,015 INFO [train.py:1198] (1/2) Epoch 16, batch 3050, loss[loss=0.2464, ctc_loss=0.1441, cr_loss=0.3928, attn_decoder_loss=0.249, over 29550.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1477, cr_loss=0.3901, attn_decoder_loss=0.2564, over 5776812.89 frames. ], batch size: 76, lr: 6.96e-03, grad_scale: 8.0 2024-09-17 19:17:26,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-09-17 19:17:33,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=283700.0, ans=0.125 2024-09-17 19:17:46,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=283740.0, ans=0.2 2024-09-17 19:17:50,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.99 vs. limit=22.5 2024-09-17 19:17:53,964 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.926e+01 9.487e+01 1.019e+02 3.855e+02, threshold=1.897e+02, percent-clipped=1.0 2024-09-17 19:18:09,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=283820.0, ans=0.125 2024-09-17 19:18:27,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2024-09-17 19:18:34,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-17 19:18:36,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=283900.0, ans=0.0 2024-09-17 19:18:37,845 INFO [train.py:1198] (1/2) Epoch 16, batch 3100, loss[loss=0.2675, ctc_loss=0.154, cr_loss=0.4224, attn_decoder_loss=0.2708, over 29275.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1469, cr_loss=0.3885, attn_decoder_loss=0.2556, over 5777310.91 frames. ], batch size: 100, lr: 6.95e-03, grad_scale: 8.0 2024-09-17 19:18:41,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=283900.0, ans=0.2 2024-09-17 19:19:24,176 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:19:54,452 INFO [train.py:1198] (1/2) Epoch 16, batch 3150, loss[loss=0.2626, ctc_loss=0.1486, cr_loss=0.3934, attn_decoder_loss=0.2666, over 28977.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.147, cr_loss=0.3891, attn_decoder_loss=0.2556, over 5783122.29 frames. ], batch size: 104, lr: 6.95e-03, grad_scale: 8.0 2024-09-17 19:20:22,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2024-09-17 19:20:28,022 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.635e+01 9.420e+01 9.793e+01 2.697e+02, threshold=1.884e+02, percent-clipped=2.0 2024-09-17 19:20:45,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=284220.0, ans=0.0 2024-09-17 19:21:00,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=284260.0, ans=0.125 2024-09-17 19:21:12,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=284300.0, ans=0.125 2024-09-17 19:21:13,977 INFO [train.py:1198] (1/2) Epoch 16, batch 3200, loss[loss=0.2433, ctc_loss=0.1451, cr_loss=0.3811, attn_decoder_loss=0.2457, over 29406.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1461, cr_loss=0.3875, attn_decoder_loss=0.2549, over 5794093.13 frames. ], batch size: 79, lr: 6.95e-03, grad_scale: 16.0 2024-09-17 19:21:23,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=284300.0, ans=0.125 2024-09-17 19:21:37,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=284340.0, ans=0.125 2024-09-17 19:21:47,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284380.0, ans=0.1 2024-09-17 19:21:59,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=284420.0, ans=0.025 2024-09-17 19:22:09,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.10 vs. limit=22.5 2024-09-17 19:22:12,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=284420.0, ans=0.2 2024-09-17 19:22:13,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=284460.0, ans=0.125 2024-09-17 19:22:29,952 INFO [train.py:1198] (1/2) Epoch 16, batch 3250, loss[loss=0.2552, ctc_loss=0.142, cr_loss=0.4068, attn_decoder_loss=0.2587, over 29699.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1465, cr_loss=0.3886, attn_decoder_loss=0.2558, over 5800873.24 frames. ], batch size: 84, lr: 6.95e-03, grad_scale: 8.0 2024-09-17 19:22:39,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=284500.0, ans=0.0 2024-09-17 19:22:41,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-17 19:22:45,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=284540.0, ans=0.125 2024-09-17 19:22:53,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2024-09-17 19:23:03,115 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.619e+01 9.155e+01 9.687e+01 2.235e+02, threshold=1.831e+02, percent-clipped=1.0 2024-09-17 19:23:06,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=284580.0, ans=0.2 2024-09-17 19:23:07,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=284580.0, ans=0.125 2024-09-17 19:23:08,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=284580.0, ans=0.2 2024-09-17 19:23:26,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284620.0, ans=0.1 2024-09-17 19:23:35,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284660.0, ans=0.1 2024-09-17 19:23:45,582 INFO [train.py:1198] (1/2) Epoch 16, batch 3300, loss[loss=0.2763, ctc_loss=0.1675, cr_loss=0.4091, attn_decoder_loss=0.2793, over 28402.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1455, cr_loss=0.3865, attn_decoder_loss=0.2544, over 5797010.09 frames. ], batch size: 111, lr: 6.94e-03, grad_scale: 8.0 2024-09-17 19:24:18,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-09-17 19:24:23,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-17 19:24:57,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=284860.0, ans=0.025 2024-09-17 19:24:58,928 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:25:02,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-17 19:25:06,085 INFO [train.py:1198] (1/2) Epoch 16, batch 3350, loss[loss=0.2636, ctc_loss=0.1527, cr_loss=0.386, attn_decoder_loss=0.2673, over 28772.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1464, cr_loss=0.3877, attn_decoder_loss=0.2552, over 5772794.66 frames. ], batch size: 104, lr: 6.94e-03, grad_scale: 8.0 2024-09-17 19:25:39,361 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 8.948e+01 9.628e+01 1.043e+02 1.952e+02, threshold=1.926e+02, percent-clipped=2.0 2024-09-17 19:25:59,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=285020.0, ans=0.125 2024-09-17 19:26:21,750 INFO [train.py:1198] (1/2) Epoch 16, batch 3400, loss[loss=0.216, ctc_loss=0.1119, cr_loss=0.329, attn_decoder_loss=0.2202, over 29353.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1469, cr_loss=0.3889, attn_decoder_loss=0.2554, over 5765814.42 frames. ], batch size: 67, lr: 6.94e-03, grad_scale: 8.0 2024-09-17 19:26:22,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.31 vs. limit=12.0 2024-09-17 19:26:26,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=285100.0, ans=0.0 2024-09-17 19:26:58,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=285180.0, ans=0.0 2024-09-17 19:27:10,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=285220.0, ans=0.035 2024-09-17 19:27:28,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=285260.0, ans=0.125 2024-09-17 19:27:31,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=285260.0, ans=0.0 2024-09-17 19:27:37,292 INFO [train.py:1198] (1/2) Epoch 16, batch 3450, loss[loss=0.2523, ctc_loss=0.1347, cr_loss=0.3529, attn_decoder_loss=0.2576, over 28507.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1467, cr_loss=0.3888, attn_decoder_loss=0.2555, over 5774602.19 frames. ], batch size: 111, lr: 6.94e-03, grad_scale: 8.0 2024-09-17 19:27:41,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=285300.0, ans=0.2 2024-09-17 19:27:45,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=285300.0, ans=0.2 2024-09-17 19:27:47,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=285300.0, ans=0.2 2024-09-17 19:27:56,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=285340.0, ans=0.2 2024-09-17 19:28:07,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2024-09-17 19:28:12,413 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 9.055e+01 9.633e+01 1.034e+02 1.561e+02, threshold=1.927e+02, percent-clipped=0.0 2024-09-17 19:28:23,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=285380.0, ans=0.0 2024-09-17 19:28:57,133 INFO [train.py:1198] (1/2) Epoch 16, batch 3500, loss[loss=0.2229, ctc_loss=0.1235, cr_loss=0.3267, attn_decoder_loss=0.2266, over 29350.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1465, cr_loss=0.3883, attn_decoder_loss=0.2551, over 5775752.19 frames. ], batch size: 71, lr: 6.93e-03, grad_scale: 8.0 2024-09-17 19:29:06,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=285500.0, ans=0.0 2024-09-17 19:29:23,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-09-17 19:29:53,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=285620.0, ans=0.1 2024-09-17 19:29:59,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=285660.0, ans=0.07 2024-09-17 19:30:12,380 INFO [train.py:1198] (1/2) Epoch 16, batch 3550, loss[loss=0.2649, ctc_loss=0.1544, cr_loss=0.4097, attn_decoder_loss=0.268, over 29693.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1463, cr_loss=0.3881, attn_decoder_loss=0.2551, over 5783020.73 frames. ], batch size: 89, lr: 6.93e-03, grad_scale: 8.0 2024-09-17 19:30:30,471 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:30:41,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-09-17 19:30:42,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=285780.0, ans=0.1 2024-09-17 19:30:45,351 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.552e+01 9.135e+01 9.623e+01 1.565e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-17 19:30:50,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=285780.0, ans=0.025 2024-09-17 19:31:06,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=285820.0, ans=0.0 2024-09-17 19:31:22,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=285860.0, ans=0.0 2024-09-17 19:31:26,862 INFO [train.py:1198] (1/2) Epoch 16, batch 3600, loss[loss=0.2312, ctc_loss=0.1265, cr_loss=0.3481, attn_decoder_loss=0.2351, over 29491.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1461, cr_loss=0.3878, attn_decoder_loss=0.2551, over 5792029.02 frames. ], batch size: 77, lr: 6.93e-03, grad_scale: 16.0 2024-09-17 19:32:02,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=285980.0, ans=0.04949747468305833 2024-09-17 19:32:25,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286060.0, ans=0.1 2024-09-17 19:32:32,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=286060.0, ans=0.125 2024-09-17 19:32:41,300 INFO [train.py:1198] (1/2) Epoch 16, batch 3650, loss[loss=0.2705, ctc_loss=0.1618, cr_loss=0.4199, attn_decoder_loss=0.2733, over 29500.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1459, cr_loss=0.387, attn_decoder_loss=0.2547, over 5794237.48 frames. ], batch size: 90, lr: 6.93e-03, grad_scale: 8.0 2024-09-17 19:32:50,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-09-17 19:32:57,801 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:32:59,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=286140.0, ans=0.1 2024-09-17 19:33:17,579 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.668e+01 9.269e+01 9.880e+01 1.402e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-17 19:33:19,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=286180.0, ans=0.125 2024-09-17 19:33:32,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=286220.0, ans=0.07 2024-09-17 19:33:34,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.73 vs. limit=15.0 2024-09-17 19:33:42,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-09-17 19:33:44,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=286260.0, ans=0.0 2024-09-17 19:33:57,802 INFO [train.py:1198] (1/2) Epoch 16, batch 3700, loss[loss=0.2562, ctc_loss=0.1378, cr_loss=0.3678, attn_decoder_loss=0.2611, over 29703.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1459, cr_loss=0.3874, attn_decoder_loss=0.2548, over 5802877.04 frames. ], batch size: 84, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:34:04,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=286300.0, ans=0.125 2024-09-17 19:34:07,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=286300.0, ans=0.1 2024-09-17 19:34:59,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=286460.0, ans=0.125 2024-09-17 19:35:14,227 INFO [train.py:1198] (1/2) Epoch 16, batch 3750, loss[loss=0.2285, ctc_loss=0.1331, cr_loss=0.3592, attn_decoder_loss=0.2311, over 29339.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1456, cr_loss=0.3872, attn_decoder_loss=0.2545, over 5808048.77 frames. ], batch size: 67, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:35:30,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=286540.0, ans=0.125 2024-09-17 19:35:40,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-09-17 19:35:48,294 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 8.849e+01 9.329e+01 1.007e+02 6.454e+02, threshold=1.866e+02, percent-clipped=5.0 2024-09-17 19:36:11,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=286620.0, ans=0.1 2024-09-17 19:36:28,744 INFO [train.py:1198] (1/2) Epoch 16, batch 3800, loss[loss=0.2777, ctc_loss=0.1671, cr_loss=0.4423, attn_decoder_loss=0.2802, over 29636.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1454, cr_loss=0.3871, attn_decoder_loss=0.2544, over 5798247.87 frames. ], batch size: 86, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:36:41,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=12.0 2024-09-17 19:36:55,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=286740.0, ans=0.07 2024-09-17 19:37:03,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=286780.0, ans=0.0 2024-09-17 19:37:11,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.95 vs. limit=15.0 2024-09-17 19:37:35,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286860.0, ans=0.1 2024-09-17 19:37:42,781 INFO [train.py:1198] (1/2) Epoch 16, batch 3850, loss[loss=0.272, ctc_loss=0.1632, cr_loss=0.4281, attn_decoder_loss=0.2746, over 29224.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1449, cr_loss=0.386, attn_decoder_loss=0.2542, over 5812678.44 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:37:44,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.18 vs. limit=10.0 2024-09-17 19:37:44,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=286900.0, ans=0.2 2024-09-17 19:38:11,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286980.0, ans=0.1 2024-09-17 19:38:16,698 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.163e+01 9.754e+01 1.076e+02 2.177e+02, threshold=1.951e+02, percent-clipped=1.0 2024-09-17 19:38:18,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=286980.0, ans=0.125 2024-09-17 19:38:18,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=286980.0, ans=0.1 2024-09-17 19:38:18,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=286980.0, ans=0.1 2024-09-17 19:38:24,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=286980.0, ans=0.0 2024-09-17 19:38:24,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=286980.0, ans=0.0 2024-09-17 19:38:39,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287020.0, ans=0.1 2024-09-17 19:38:58,596 INFO [train.py:1198] (1/2) Epoch 16, batch 3900, loss[loss=0.267, ctc_loss=0.1548, cr_loss=0.4228, attn_decoder_loss=0.27, over 29617.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.145, cr_loss=0.3863, attn_decoder_loss=0.2545, over 5817036.57 frames. ], batch size: 86, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:38:58,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=287100.0, ans=0.125 2024-09-17 19:39:04,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287100.0, ans=0.1 2024-09-17 19:39:06,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=287100.0, ans=0.125 2024-09-17 19:39:08,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.39 vs. limit=10.0 2024-09-17 19:39:09,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2024-09-17 19:39:12,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=287140.0, ans=0.0 2024-09-17 19:39:19,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=287140.0, ans=0.05 2024-09-17 19:39:32,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=287180.0, ans=0.1 2024-09-17 19:39:37,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287180.0, ans=0.1 2024-09-17 19:39:43,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=287220.0, ans=0.125 2024-09-17 19:40:14,822 INFO [train.py:1198] (1/2) Epoch 16, batch 3950, loss[loss=0.2784, ctc_loss=0.1735, cr_loss=0.4445, attn_decoder_loss=0.2802, over 29486.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1451, cr_loss=0.3868, attn_decoder_loss=0.2546, over 5836259.24 frames. ], batch size: 97, lr: 6.91e-03, grad_scale: 8.0 2024-09-17 19:40:16,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=287300.0, ans=0.125 2024-09-17 19:40:40,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=12.0 2024-09-17 19:40:48,604 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.783e+01 9.413e+01 1.005e+02 2.800e+02, threshold=1.883e+02, percent-clipped=1.0 2024-09-17 19:41:15,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=287460.0, ans=0.025 2024-09-17 19:41:28,446 INFO [train.py:1198] (1/2) Epoch 16, batch 4000, loss[loss=0.2329, ctc_loss=0.1212, cr_loss=0.3256, attn_decoder_loss=0.238, over 29479.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1455, cr_loss=0.3872, attn_decoder_loss=0.2548, over 5813629.47 frames. ], batch size: 74, lr: 6.91e-03, grad_scale: 16.0 2024-09-17 19:41:33,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2024-09-17 19:42:09,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=287580.0, ans=0.125 2024-09-17 19:42:14,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=287620.0, ans=0.125 2024-09-17 19:42:20,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=287620.0, ans=0.05 2024-09-17 19:42:30,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2024-09-17 19:42:31,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=287660.0, ans=0.1 2024-09-17 19:42:42,630 INFO [train.py:1198] (1/2) Epoch 16, batch 4050, loss[loss=0.2799, ctc_loss=0.1817, cr_loss=0.4126, attn_decoder_loss=0.2816, over 20272.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1458, cr_loss=0.3873, attn_decoder_loss=0.2547, over 5797151.93 frames. ], batch size: 210, lr: 6.91e-03, grad_scale: 8.0 2024-09-17 19:42:48,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=287700.0, ans=0.2 2024-09-17 19:42:55,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.20 vs. limit=22.5 2024-09-17 19:43:01,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.86 vs. limit=15.0 2024-09-17 19:43:07,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=287740.0, ans=0.025 2024-09-17 19:43:17,956 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.851e+01 8.954e+01 9.709e+01 1.044e+02 2.247e+02, threshold=1.942e+02, percent-clipped=1.0 2024-09-17 19:43:57,685 INFO [train.py:1198] (1/2) Epoch 16, batch 4100, loss[loss=0.2717, ctc_loss=0.1623, cr_loss=0.4202, attn_decoder_loss=0.2745, over 29509.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1457, cr_loss=0.3869, attn_decoder_loss=0.2546, over 5791353.09 frames. ], batch size: 90, lr: 6.91e-03, grad_scale: 8.0 2024-09-17 19:44:02,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=287900.0, ans=0.2 2024-09-17 19:44:03,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=287900.0, ans=0.0 2024-09-17 19:45:16,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.02 vs. limit=6.0 2024-09-17 19:45:20,176 INFO [train.py:1198] (1/2) Epoch 16, batch 4150, loss[loss=0.2413, ctc_loss=0.1338, cr_loss=0.3681, attn_decoder_loss=0.2451, over 29495.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1456, cr_loss=0.3866, attn_decoder_loss=0.2544, over 5797443.28 frames. ], batch size: 77, lr: 6.90e-03, grad_scale: 8.0 2024-09-17 19:45:21,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=288100.0, ans=0.125 2024-09-17 19:45:43,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=288140.0, ans=0.0 2024-09-17 19:45:45,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=288140.0, ans=10.0 2024-09-17 19:45:46,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=288140.0, ans=0.0 2024-09-17 19:45:52,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=288180.0, ans=0.0 2024-09-17 19:45:54,965 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.454e+01 9.164e+01 9.745e+01 4.465e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-17 19:45:55,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.05 vs. limit=12.0 2024-09-17 19:46:01,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=288180.0, ans=0.0 2024-09-17 19:46:01,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.70 vs. limit=12.0 2024-09-17 19:46:04,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288220.0, ans=0.1 2024-09-17 19:46:05,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=288220.0, ans=0.015 2024-09-17 19:46:07,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=288220.0, ans=0.0 2024-09-17 19:46:16,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=288220.0, ans=0.125 2024-09-17 19:46:17,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=288260.0, ans=0.0 2024-09-17 19:46:20,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=288260.0, ans=0.025 2024-09-17 19:46:33,310 INFO [train.py:1198] (1/2) Epoch 16, batch 4200, loss[loss=0.2727, ctc_loss=0.1703, cr_loss=0.4171, attn_decoder_loss=0.2749, over 29524.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.146, cr_loss=0.387, attn_decoder_loss=0.2549, over 5799422.13 frames. ], batch size: 90, lr: 6.90e-03, grad_scale: 8.0 2024-09-17 19:46:37,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=288300.0, ans=0.0 2024-09-17 19:46:55,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=288340.0, ans=0.125 2024-09-17 19:47:11,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=288380.0, ans=0.05 2024-09-17 19:47:14,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=288380.0, ans=0.125 2024-09-17 19:47:29,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=288420.0, ans=0.0 2024-09-17 19:47:40,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=288460.0, ans=0.035 2024-09-17 19:47:45,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=288460.0, ans=0.125 2024-09-17 19:47:47,733 INFO [train.py:1198] (1/2) Epoch 16, batch 4250, loss[loss=0.2267, ctc_loss=0.1263, cr_loss=0.3493, attn_decoder_loss=0.2301, over 29521.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1458, cr_loss=0.3866, attn_decoder_loss=0.255, over 5805195.58 frames. ], batch size: 74, lr: 6.90e-03, grad_scale: 4.0 2024-09-17 19:47:47,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=288500.0, ans=10.0 2024-09-17 19:47:50,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=288500.0, ans=0.125 2024-09-17 19:47:53,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=288500.0, ans=0.125 2024-09-17 19:48:01,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=288540.0, ans=0.07 2024-09-17 19:48:06,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=288540.0, ans=0.125 2024-09-17 19:48:09,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=288540.0, ans=0.125 2024-09-17 19:48:10,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2024-09-17 19:48:10,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288540.0, ans=0.1 2024-09-17 19:48:17,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=288580.0, ans=0.125 2024-09-17 19:48:18,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=288580.0, ans=0.125 2024-09-17 19:48:24,158 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.844e+01 9.399e+01 1.005e+02 1.682e+02, threshold=1.880e+02, percent-clipped=0.0 2024-09-17 19:48:25,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=288580.0, ans=0.125 2024-09-17 19:48:37,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=288620.0, ans=0.125 2024-09-17 19:48:37,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.95 vs. limit=15.0 2024-09-17 19:48:41,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=288620.0, ans=0.025 2024-09-17 19:48:44,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=288620.0, ans=0.0 2024-09-17 19:48:50,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288660.0, ans=0.1 2024-09-17 19:49:01,838 INFO [train.py:1198] (1/2) Epoch 16, batch 4300, loss[loss=0.2671, ctc_loss=0.1608, cr_loss=0.4118, attn_decoder_loss=0.2698, over 29536.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.146, cr_loss=0.3872, attn_decoder_loss=0.2551, over 5794928.20 frames. ], batch size: 87, lr: 6.90e-03, grad_scale: 8.0 2024-09-17 19:49:11,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=288700.0, ans=0.125 2024-09-17 19:49:18,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=288740.0, ans=0.125 2024-09-17 19:49:23,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=288740.0, ans=10.0 2024-09-17 19:49:24,616 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:49:26,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2024-09-17 19:49:30,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288780.0, ans=0.1 2024-09-17 19:49:30,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288780.0, ans=0.1 2024-09-17 19:49:36,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=288780.0, ans=0.0 2024-09-17 19:49:37,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288780.0, ans=0.1 2024-09-17 19:50:03,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2024-09-17 19:50:13,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=288860.0, ans=0.0 2024-09-17 19:50:16,551 INFO [train.py:1198] (1/2) Epoch 16, batch 4350, loss[loss=0.2661, ctc_loss=0.1625, cr_loss=0.4031, attn_decoder_loss=0.2686, over 29560.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1489, cr_loss=0.3927, attn_decoder_loss=0.2584, over 5796644.58 frames. ], batch size: 97, lr: 6.89e-03, grad_scale: 8.0 2024-09-17 19:50:33,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=288940.0, ans=0.125 2024-09-17 19:50:41,217 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:50:42,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=288940.0, ans=0.0 2024-09-17 19:50:53,800 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.345e+01 8.913e+01 9.427e+01 9.937e+01 2.646e+02, threshold=1.885e+02, percent-clipped=2.0 2024-09-17 19:50:54,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.13 vs. limit=15.0 2024-09-17 19:51:24,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=289060.0, ans=0.0 2024-09-17 19:51:31,269 INFO [train.py:1198] (1/2) Epoch 16, batch 4400, loss[loss=0.2564, ctc_loss=0.1521, cr_loss=0.4102, attn_decoder_loss=0.2588, over 27262.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1504, cr_loss=0.3947, attn_decoder_loss=0.2605, over 5765276.24 frames. ], batch size: 124, lr: 6.89e-03, grad_scale: 16.0 2024-09-17 19:51:56,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=289140.0, ans=0.0 2024-09-17 19:52:02,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=289180.0, ans=0.125 2024-09-17 19:52:36,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=289260.0, ans=0.0 2024-09-17 19:52:44,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=289300.0, ans=0.0 2024-09-17 19:52:44,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.31 vs. limit=22.5 2024-09-17 19:52:45,502 INFO [train.py:1198] (1/2) Epoch 16, batch 4450, loss[loss=0.2793, ctc_loss=0.1918, cr_loss=0.4516, attn_decoder_loss=0.279, over 20795.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1552, cr_loss=0.3999, attn_decoder_loss=0.2633, over 5571692.71 frames. ], batch size: 210, lr: 6.89e-03, grad_scale: 4.0 2024-09-17 19:53:13,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=289340.0, ans=0.05 2024-09-17 19:53:15,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-09-17 19:53:19,531 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:53:26,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.172e+01 9.461e+01 1.058e+02 1.169e+02 3.185e+02, threshold=2.116e+02, percent-clipped=2.0 2024-09-17 19:53:49,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=289460.0, ans=0.0 2024-09-17 19:53:54,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-09-17 19:53:57,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=289460.0, ans=0.125 2024-09-17 19:54:01,245 INFO [train.py:1198] (1/2) Epoch 16, batch 4500, loss[loss=0.2576, ctc_loss=0.1634, cr_loss=0.37, attn_decoder_loss=0.2598, over 19871.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1601, cr_loss=0.4019, attn_decoder_loss=0.2657, over 5227858.82 frames. ], batch size: 210, lr: 6.89e-03, grad_scale: 8.0 2024-09-17 19:54:35,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=289580.0, ans=0.1 2024-09-17 19:55:24,183 INFO [train.py:1198] (1/2) Epoch 17, batch 0, loss[loss=0.2269, ctc_loss=0.1176, cr_loss=0.3275, attn_decoder_loss=0.2318, over 29590.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1176, cr_loss=0.3275, attn_decoder_loss=0.2318, over 29590.00 frames. ], batch size: 73, lr: 6.68e-03, grad_scale: 16.0 2024-09-17 19:55:24,183 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 19:55:42,766 INFO [train.py:1230] (1/2) Epoch 17, validation: loss=0.2133, ctc_loss=0.04137, cr_loss=4.881e-15, attn_decoder_loss=0.2324, over 944034.00 frames. 2024-09-17 19:55:42,766 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 19:55:50,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=289600.0, ans=0.125 2024-09-17 19:55:53,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=289600.0, ans=0.05 2024-09-17 19:55:59,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=289640.0, ans=0.0 2024-09-17 19:56:01,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=289640.0, ans=0.0 2024-09-17 19:56:02,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=289640.0, ans=0.125 2024-09-17 19:56:03,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.18 vs. limit=15.0 2024-09-17 19:56:31,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=289720.0, ans=0.2 2024-09-17 19:56:48,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=289760.0, ans=0.0 2024-09-17 19:56:54,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=289760.0, ans=0.125 2024-09-17 19:56:59,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=289800.0, ans=0.125 2024-09-17 19:57:00,372 INFO [train.py:1198] (1/2) Epoch 17, batch 50, loss[loss=0.2324, ctc_loss=0.1302, cr_loss=0.3618, attn_decoder_loss=0.2357, over 29411.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1498, cr_loss=0.392, attn_decoder_loss=0.2569, over 1269219.05 frames. ], batch size: 70, lr: 6.68e-03, grad_scale: 8.0 2024-09-17 19:57:05,049 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.027e+01 9.620e+01 1.078e+02 1.162e+02 4.794e+02, threshold=2.156e+02, percent-clipped=2.0 2024-09-17 19:57:07,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.46 vs. limit=10.0 2024-09-17 19:57:08,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289800.0, ans=0.1 2024-09-17 19:57:14,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=289840.0, ans=0.0 2024-09-17 19:57:15,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2024-09-17 19:57:15,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=289840.0, ans=0.07 2024-09-17 19:57:32,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=289880.0, ans=0.09899494936611666 2024-09-17 19:57:44,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-09-17 19:57:55,465 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:58:18,295 INFO [train.py:1198] (1/2) Epoch 17, batch 100, loss[loss=0.2419, ctc_loss=0.1431, cr_loss=0.3873, attn_decoder_loss=0.2443, over 29520.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1495, cr_loss=0.3928, attn_decoder_loss=0.2581, over 2253151.27 frames. ], batch size: 76, lr: 6.67e-03, grad_scale: 8.0 2024-09-17 19:58:20,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.32 vs. limit=22.5 2024-09-17 19:58:53,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.21 vs. limit=22.5 2024-09-17 19:58:59,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=290080.0, ans=0.0 2024-09-17 19:59:04,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-17 19:59:04,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=290120.0, ans=0.125 2024-09-17 19:59:08,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.49 vs. limit=15.0 2024-09-17 19:59:15,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=290120.0, ans=0.125 2024-09-17 19:59:19,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=290160.0, ans=0.5 2024-09-17 19:59:32,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-09-17 19:59:32,862 INFO [train.py:1198] (1/2) Epoch 17, batch 150, loss[loss=0.2333, ctc_loss=0.1338, cr_loss=0.3776, attn_decoder_loss=0.2359, over 29449.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1468, cr_loss=0.3892, attn_decoder_loss=0.2559, over 3048412.45 frames. ], batch size: 70, lr: 6.67e-03, grad_scale: 8.0 2024-09-17 19:59:37,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.030e+01 8.871e+01 9.281e+01 1.009e+02 2.332e+02, threshold=1.856e+02, percent-clipped=1.0 2024-09-17 19:59:53,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.23 vs. limit=22.5 2024-09-17 20:00:49,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290400.0, ans=0.1 2024-09-17 20:00:50,783 INFO [train.py:1198] (1/2) Epoch 17, batch 200, loss[loss=0.2669, ctc_loss=0.1562, cr_loss=0.4, attn_decoder_loss=0.2703, over 27365.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1455, cr_loss=0.3876, attn_decoder_loss=0.2545, over 3660367.05 frames. ], batch size: 124, lr: 6.67e-03, grad_scale: 8.0 2024-09-17 20:01:01,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=290400.0, ans=0.025 2024-09-17 20:01:08,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-09-17 20:01:24,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=290480.0, ans=0.2 2024-09-17 20:01:28,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-09-17 20:01:44,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290520.0, ans=0.1 2024-09-17 20:02:00,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290560.0, ans=0.1 2024-09-17 20:02:09,186 INFO [train.py:1198] (1/2) Epoch 17, batch 250, loss[loss=0.276, ctc_loss=0.163, cr_loss=0.4119, attn_decoder_loss=0.2794, over 29241.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1453, cr_loss=0.3871, attn_decoder_loss=0.2544, over 4142206.22 frames. ], batch size: 100, lr: 6.67e-03, grad_scale: 8.0 2024-09-17 20:02:13,827 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.517e+01 9.040e+01 9.817e+01 1.381e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-17 20:02:29,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=290640.0, ans=0.95 2024-09-17 20:02:29,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=290640.0, ans=0.0 2024-09-17 20:02:29,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=290640.0, ans=0.2 2024-09-17 20:02:41,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=290680.0, ans=0.125 2024-09-17 20:02:46,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-09-17 20:03:13,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-09-17 20:03:24,668 INFO [train.py:1198] (1/2) Epoch 17, batch 300, loss[loss=0.269, ctc_loss=0.1643, cr_loss=0.448, attn_decoder_loss=0.2707, over 29504.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1448, cr_loss=0.3869, attn_decoder_loss=0.254, over 4511384.75 frames. ], batch size: 92, lr: 6.66e-03, grad_scale: 8.0 2024-09-17 20:04:15,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.34 vs. limit=15.0 2024-09-17 20:04:37,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.83 vs. limit=15.0 2024-09-17 20:04:42,401 INFO [train.py:1198] (1/2) Epoch 17, batch 350, loss[loss=0.2375, ctc_loss=0.1372, cr_loss=0.3688, attn_decoder_loss=0.2404, over 29314.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1449, cr_loss=0.3864, attn_decoder_loss=0.2547, over 4796703.87 frames. ], batch size: 71, lr: 6.66e-03, grad_scale: 8.0 2024-09-17 20:04:46,787 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.690e+01 9.264e+01 9.789e+01 1.817e+02, threshold=1.853e+02, percent-clipped=1.0 2024-09-17 20:05:00,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=291040.0, ans=0.0 2024-09-17 20:05:18,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=291080.0, ans=0.0 2024-09-17 20:05:21,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=291080.0, ans=0.0 2024-09-17 20:05:36,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=291120.0, ans=0.125 2024-09-17 20:06:00,101 INFO [train.py:1198] (1/2) Epoch 17, batch 400, loss[loss=0.2532, ctc_loss=0.1331, cr_loss=0.3822, attn_decoder_loss=0.2581, over 29677.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1444, cr_loss=0.3858, attn_decoder_loss=0.2544, over 5024890.44 frames. ], batch size: 82, lr: 6.66e-03, grad_scale: 16.0 2024-09-17 20:06:03,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=291200.0, ans=0.125 2024-09-17 20:06:38,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=291280.0, ans=0.125 2024-09-17 20:06:41,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=291280.0, ans=0.125 2024-09-17 20:06:50,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=291320.0, ans=0.1 2024-09-17 20:06:53,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=291320.0, ans=0.125 2024-09-17 20:07:15,629 INFO [train.py:1198] (1/2) Epoch 17, batch 450, loss[loss=0.2569, ctc_loss=0.1469, cr_loss=0.3891, attn_decoder_loss=0.2605, over 29703.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1439, cr_loss=0.386, attn_decoder_loss=0.2544, over 5188030.81 frames. ], batch size: 83, lr: 6.66e-03, grad_scale: 8.0 2024-09-17 20:07:21,595 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.659e+01 9.188e+01 9.784e+01 2.602e+02, threshold=1.838e+02, percent-clipped=1.0 2024-09-17 20:07:23,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2024-09-17 20:07:25,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=291400.0, ans=0.05 2024-09-17 20:07:49,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.60 vs. limit=22.5 2024-09-17 20:08:23,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=291560.0, ans=15.0 2024-09-17 20:08:33,706 INFO [train.py:1198] (1/2) Epoch 17, batch 500, loss[loss=0.2697, ctc_loss=0.1599, cr_loss=0.4195, attn_decoder_loss=0.2725, over 29395.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1431, cr_loss=0.385, attn_decoder_loss=0.2534, over 5330803.42 frames. ], batch size: 94, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:08:47,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2024-09-17 20:09:19,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=291720.0, ans=0.0 2024-09-17 20:09:20,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=291720.0, ans=0.0 2024-09-17 20:09:45,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=291760.0, ans=0.0 2024-09-17 20:09:49,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=291760.0, ans=0.125 2024-09-17 20:09:49,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-09-17 20:09:51,607 INFO [train.py:1198] (1/2) Epoch 17, batch 550, loss[loss=0.2587, ctc_loss=0.1406, cr_loss=0.3771, attn_decoder_loss=0.2635, over 28813.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1438, cr_loss=0.386, attn_decoder_loss=0.2539, over 5423508.99 frames. ], batch size: 104, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:09:53,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=291800.0, ans=0.0 2024-09-17 20:09:57,716 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 9.075e+01 9.597e+01 1.052e+02 1.735e+02, threshold=1.919e+02, percent-clipped=0.0 2024-09-17 20:10:01,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=291800.0, ans=0.125 2024-09-17 20:10:04,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291800.0, ans=0.1 2024-09-17 20:10:45,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=291920.0, ans=0.125 2024-09-17 20:10:51,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=291960.0, ans=0.025 2024-09-17 20:10:54,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=291960.0, ans=0.125 2024-09-17 20:11:03,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=291960.0, ans=0.125 2024-09-17 20:11:08,249 INFO [train.py:1198] (1/2) Epoch 17, batch 600, loss[loss=0.2746, ctc_loss=0.1674, cr_loss=0.4239, attn_decoder_loss=0.2771, over 29226.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.144, cr_loss=0.3854, attn_decoder_loss=0.2542, over 5508427.99 frames. ], batch size: 100, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:11:19,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=292000.0, ans=0.2 2024-09-17 20:11:28,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2024-09-17 20:11:35,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=292040.0, ans=0.125 2024-09-17 20:11:37,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2024-09-17 20:11:43,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=292080.0, ans=0.2 2024-09-17 20:11:45,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=292080.0, ans=0.125 2024-09-17 20:12:23,219 INFO [train.py:1198] (1/2) Epoch 17, batch 650, loss[loss=0.2505, ctc_loss=0.1444, cr_loss=0.3804, attn_decoder_loss=0.2539, over 29728.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.143, cr_loss=0.3842, attn_decoder_loss=0.2533, over 5585510.98 frames. ], batch size: 81, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:12:29,212 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.569e+01 9.101e+01 9.967e+01 2.303e+02, threshold=1.820e+02, percent-clipped=2.0 2024-09-17 20:12:29,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=292200.0, ans=0.0 2024-09-17 20:12:40,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292240.0, ans=0.1 2024-09-17 20:12:40,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=292240.0, ans=0.5 2024-09-17 20:12:56,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=292280.0, ans=0.125 2024-09-17 20:13:00,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=292280.0, ans=0.125 2024-09-17 20:13:13,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=12.0 2024-09-17 20:13:23,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=292320.0, ans=0.0 2024-09-17 20:13:34,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.18 vs. limit=15.0 2024-09-17 20:13:43,855 INFO [train.py:1198] (1/2) Epoch 17, batch 700, loss[loss=0.241, ctc_loss=0.1383, cr_loss=0.3986, attn_decoder_loss=0.2436, over 29512.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1435, cr_loss=0.3857, attn_decoder_loss=0.254, over 5635172.20 frames. ], batch size: 76, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:14:08,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=292440.0, ans=0.1 2024-09-17 20:14:10,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.40 vs. limit=15.0 2024-09-17 20:14:32,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=292520.0, ans=0.125 2024-09-17 20:14:39,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-09-17 20:14:59,474 INFO [train.py:1198] (1/2) Epoch 17, batch 750, loss[loss=0.254, ctc_loss=0.1428, cr_loss=0.3912, attn_decoder_loss=0.2576, over 29710.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.143, cr_loss=0.385, attn_decoder_loss=0.2533, over 5674538.88 frames. ], batch size: 82, lr: 6.64e-03, grad_scale: 8.0 2024-09-17 20:15:05,323 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 8.913e+01 9.464e+01 1.024e+02 2.439e+02, threshold=1.893e+02, percent-clipped=2.0 2024-09-17 20:15:16,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=292640.0, ans=0.025 2024-09-17 20:15:26,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=292640.0, ans=0.0 2024-09-17 20:15:26,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=292640.0, ans=0.0 2024-09-17 20:15:51,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=292720.0, ans=0.125 2024-09-17 20:16:05,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-09-17 20:16:06,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=292760.0, ans=0.025 2024-09-17 20:16:15,488 INFO [train.py:1198] (1/2) Epoch 17, batch 800, loss[loss=0.2342, ctc_loss=0.1319, cr_loss=0.3583, attn_decoder_loss=0.2376, over 29621.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1432, cr_loss=0.3847, attn_decoder_loss=0.2533, over 5703738.45 frames. ], batch size: 73, lr: 6.64e-03, grad_scale: 16.0 2024-09-17 20:16:15,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=292800.0, ans=0.125 2024-09-17 20:16:31,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.13 vs. limit=15.0 2024-09-17 20:16:52,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=292880.0, ans=0.125 2024-09-17 20:17:27,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-09-17 20:17:33,058 INFO [train.py:1198] (1/2) Epoch 17, batch 850, loss[loss=0.2584, ctc_loss=0.1579, cr_loss=0.3855, attn_decoder_loss=0.261, over 29710.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1434, cr_loss=0.385, attn_decoder_loss=0.2533, over 5733189.97 frames. ], batch size: 89, lr: 6.64e-03, grad_scale: 8.0 2024-09-17 20:17:38,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=293000.0, ans=0.0 2024-09-17 20:17:42,755 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.745e+01 9.386e+01 1.018e+02 1.977e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-17 20:17:47,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=293000.0, ans=0.125 2024-09-17 20:18:25,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=293120.0, ans=0.1 2024-09-17 20:18:51,106 INFO [train.py:1198] (1/2) Epoch 17, batch 900, loss[loss=0.2297, ctc_loss=0.1245, cr_loss=0.3449, attn_decoder_loss=0.2338, over 29614.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1436, cr_loss=0.3852, attn_decoder_loss=0.2535, over 5738760.40 frames. ], batch size: 73, lr: 6.64e-03, grad_scale: 8.0 2024-09-17 20:18:52,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.81 vs. limit=15.0 2024-09-17 20:19:24,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=15.0 2024-09-17 20:19:26,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=293280.0, ans=15.0 2024-09-17 20:19:53,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=293360.0, ans=0.025 2024-09-17 20:19:56,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=293360.0, ans=0.0 2024-09-17 20:19:59,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=293360.0, ans=0.125 2024-09-17 20:20:06,848 INFO [train.py:1198] (1/2) Epoch 17, batch 950, loss[loss=0.227, ctc_loss=0.1196, cr_loss=0.3479, attn_decoder_loss=0.2312, over 29515.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1437, cr_loss=0.3855, attn_decoder_loss=0.2536, over 5741744.29 frames. ], batch size: 74, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:20:14,253 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.889e+01 9.768e+01 1.117e+02 1.855e+02, threshold=1.954e+02, percent-clipped=0.0 2024-09-17 20:20:19,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293400.0, ans=0.1 2024-09-17 20:20:33,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=293440.0, ans=0.125 2024-09-17 20:20:41,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=293480.0, ans=0.0 2024-09-17 20:20:46,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.14 vs. limit=15.0 2024-09-17 20:20:54,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=293520.0, ans=0.025 2024-09-17 20:20:54,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=293520.0, ans=0.125 2024-09-17 20:20:57,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=293520.0, ans=0.0 2024-09-17 20:21:06,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=293520.0, ans=0.0 2024-09-17 20:21:13,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.11 vs. limit=22.5 2024-09-17 20:21:25,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=293600.0, ans=0.125 2024-09-17 20:21:26,828 INFO [train.py:1198] (1/2) Epoch 17, batch 1000, loss[loss=0.2415, ctc_loss=0.1332, cr_loss=0.3764, attn_decoder_loss=0.2451, over 29536.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1441, cr_loss=0.3856, attn_decoder_loss=0.254, over 5735231.65 frames. ], batch size: 77, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:21:46,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2024-09-17 20:22:16,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=12.0 2024-09-17 20:22:17,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=293720.0, ans=0.125 2024-09-17 20:22:23,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.91 vs. limit=10.0 2024-09-17 20:22:29,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=293760.0, ans=0.125 2024-09-17 20:22:29,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=293760.0, ans=0.0 2024-09-17 20:22:41,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=293800.0, ans=0.0 2024-09-17 20:22:42,677 INFO [train.py:1198] (1/2) Epoch 17, batch 1050, loss[loss=0.2608, ctc_loss=0.1483, cr_loss=0.4005, attn_decoder_loss=0.2644, over 29687.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1437, cr_loss=0.3852, attn_decoder_loss=0.2532, over 5743943.39 frames. ], batch size: 85, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:22:43,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=293800.0, ans=0.1 2024-09-17 20:22:50,129 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.852e+01 9.385e+01 1.035e+02 1.958e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-17 20:22:53,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=293800.0, ans=0.0 2024-09-17 20:22:53,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=293800.0, ans=0.1 2024-09-17 20:22:58,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=293840.0, ans=10.0 2024-09-17 20:23:07,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=293840.0, ans=0.1 2024-09-17 20:23:10,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=293840.0, ans=0.125 2024-09-17 20:23:11,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=293880.0, ans=0.0 2024-09-17 20:23:28,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=293920.0, ans=0.1 2024-09-17 20:23:33,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2024-09-17 20:23:34,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=293920.0, ans=0.0 2024-09-17 20:23:36,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=293920.0, ans=0.125 2024-09-17 20:23:37,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=293920.0, ans=0.2 2024-09-17 20:23:43,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=293960.0, ans=0.5 2024-09-17 20:23:48,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293960.0, ans=0.1 2024-09-17 20:23:58,416 INFO [train.py:1198] (1/2) Epoch 17, batch 1100, loss[loss=0.2497, ctc_loss=0.15, cr_loss=0.4219, attn_decoder_loss=0.2514, over 29446.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1437, cr_loss=0.3858, attn_decoder_loss=0.2531, over 5756246.83 frames. ], batch size: 78, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:24:07,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=294000.0, ans=0.125 2024-09-17 20:24:19,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=294040.0, ans=0.125 2024-09-17 20:24:19,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=294040.0, ans=0.0 2024-09-17 20:24:27,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=294080.0, ans=0.0 2024-09-17 20:25:04,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=294160.0, ans=0.2 2024-09-17 20:25:12,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=294160.0, ans=0.125 2024-09-17 20:25:18,670 INFO [train.py:1198] (1/2) Epoch 17, batch 1150, loss[loss=0.2508, ctc_loss=0.1521, cr_loss=0.4033, attn_decoder_loss=0.2528, over 29474.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1444, cr_loss=0.3862, attn_decoder_loss=0.2533, over 5756503.17 frames. ], batch size: 78, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:25:19,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=294200.0, ans=0.125 2024-09-17 20:25:26,292 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.857e+01 8.746e+01 9.258e+01 9.833e+01 4.199e+02, threshold=1.852e+02, percent-clipped=3.0 2024-09-17 20:25:27,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2024-09-17 20:25:38,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=294240.0, ans=0.125 2024-09-17 20:25:45,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=294240.0, ans=0.0 2024-09-17 20:26:00,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=294280.0, ans=0.125 2024-09-17 20:26:17,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.14 vs. limit=6.0 2024-09-17 20:26:23,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2024-09-17 20:26:24,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=294360.0, ans=0.125 2024-09-17 20:26:24,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=12.0 2024-09-17 20:26:34,873 INFO [train.py:1198] (1/2) Epoch 17, batch 1200, loss[loss=0.2708, ctc_loss=0.1593, cr_loss=0.4077, attn_decoder_loss=0.2741, over 29663.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1451, cr_loss=0.3876, attn_decoder_loss=0.2544, over 5748074.47 frames. ], batch size: 85, lr: 6.62e-03, grad_scale: 16.0 2024-09-17 20:26:52,067 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:26:53,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=294440.0, ans=0.0 2024-09-17 20:26:56,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=294440.0, ans=10.0 2024-09-17 20:27:33,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=294520.0, ans=0.07 2024-09-17 20:27:50,823 INFO [train.py:1198] (1/2) Epoch 17, batch 1250, loss[loss=0.2711, ctc_loss=0.1661, cr_loss=0.4275, attn_decoder_loss=0.2732, over 29559.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1457, cr_loss=0.3895, attn_decoder_loss=0.2549, over 5775011.99 frames. ], batch size: 92, lr: 6.62e-03, grad_scale: 8.0 2024-09-17 20:27:55,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=294600.0, ans=0.125 2024-09-17 20:27:56,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2024-09-17 20:27:59,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 8.886e+01 9.388e+01 9.868e+01 1.541e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-17 20:28:00,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294600.0, ans=0.1 2024-09-17 20:28:03,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=294600.0, ans=0.025 2024-09-17 20:28:11,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=294640.0, ans=6.0 2024-09-17 20:28:29,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=294680.0, ans=0.125 2024-09-17 20:28:34,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=294680.0, ans=0.125 2024-09-17 20:29:09,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294800.0, ans=0.1 2024-09-17 20:29:10,543 INFO [train.py:1198] (1/2) Epoch 17, batch 1300, loss[loss=0.2519, ctc_loss=0.1384, cr_loss=0.3643, attn_decoder_loss=0.2564, over 28363.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.145, cr_loss=0.388, attn_decoder_loss=0.2542, over 5781302.40 frames. ], batch size: 112, lr: 6.62e-03, grad_scale: 8.0 2024-09-17 20:29:18,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=294800.0, ans=0.1 2024-09-17 20:29:19,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294800.0, ans=0.1 2024-09-17 20:29:21,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=294800.0, ans=0.09899494936611666 2024-09-17 20:29:38,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=294840.0, ans=0.1 2024-09-17 20:30:17,439 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:30:26,392 INFO [train.py:1198] (1/2) Epoch 17, batch 1350, loss[loss=0.242, ctc_loss=0.1357, cr_loss=0.3505, attn_decoder_loss=0.2461, over 29761.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1447, cr_loss=0.3877, attn_decoder_loss=0.2541, over 5796191.80 frames. ], batch size: 81, lr: 6.62e-03, grad_scale: 8.0 2024-09-17 20:30:31,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=295000.0, ans=0.125 2024-09-17 20:30:35,309 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.707e+01 9.188e+01 9.676e+01 1.559e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-17 20:30:46,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.89 vs. limit=15.0 2024-09-17 20:30:48,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=295040.0, ans=0.125 2024-09-17 20:30:57,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=295080.0, ans=0.125 2024-09-17 20:31:08,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=295080.0, ans=0.125 2024-09-17 20:31:21,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=295120.0, ans=0.2 2024-09-17 20:31:35,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=295160.0, ans=0.07 2024-09-17 20:31:37,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=295160.0, ans=0.125 2024-09-17 20:31:40,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=295200.0, ans=0.125 2024-09-17 20:31:41,836 INFO [train.py:1198] (1/2) Epoch 17, batch 1400, loss[loss=0.2159, ctc_loss=0.1139, cr_loss=0.3259, attn_decoder_loss=0.22, over 29585.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1445, cr_loss=0.3878, attn_decoder_loss=0.2538, over 5807140.12 frames. ], batch size: 69, lr: 6.61e-03, grad_scale: 8.0 2024-09-17 20:31:55,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=295240.0, ans=0.125 2024-09-17 20:31:58,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=295240.0, ans=0.125 2024-09-17 20:32:14,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.04 vs. limit=15.0 2024-09-17 20:32:28,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=295320.0, ans=0.125 2024-09-17 20:32:29,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295320.0, ans=0.1 2024-09-17 20:32:32,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.59 vs. limit=15.0 2024-09-17 20:32:41,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=295320.0, ans=0.125 2024-09-17 20:32:46,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=295360.0, ans=0.025 2024-09-17 20:33:01,974 INFO [train.py:1198] (1/2) Epoch 17, batch 1450, loss[loss=0.2592, ctc_loss=0.1486, cr_loss=0.3831, attn_decoder_loss=0.263, over 29448.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1441, cr_loss=0.3871, attn_decoder_loss=0.254, over 5803787.58 frames. ], batch size: 94, lr: 6.61e-03, grad_scale: 8.0 2024-09-17 20:33:03,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=295400.0, ans=10.0 2024-09-17 20:33:10,933 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.631e+01 9.209e+01 9.989e+01 1.746e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-17 20:33:21,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=295440.0, ans=0.0 2024-09-17 20:34:02,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=295560.0, ans=0.125 2024-09-17 20:34:16,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=295600.0, ans=0.2 2024-09-17 20:34:17,471 INFO [train.py:1198] (1/2) Epoch 17, batch 1500, loss[loss=0.2645, ctc_loss=0.1486, cr_loss=0.3835, attn_decoder_loss=0.2688, over 29627.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1441, cr_loss=0.387, attn_decoder_loss=0.2545, over 5806449.30 frames. ], batch size: 86, lr: 6.61e-03, grad_scale: 8.0 2024-09-17 20:34:36,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.30 vs. limit=15.0 2024-09-17 20:34:55,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=295680.0, ans=0.125 2024-09-17 20:34:57,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2024-09-17 20:35:04,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=295720.0, ans=0.125 2024-09-17 20:35:17,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=295760.0, ans=0.125 2024-09-17 20:35:29,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=295760.0, ans=0.125 2024-09-17 20:35:33,486 INFO [train.py:1198] (1/2) Epoch 17, batch 1550, loss[loss=0.2755, ctc_loss=0.1642, cr_loss=0.4302, attn_decoder_loss=0.2783, over 29521.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1449, cr_loss=0.3873, attn_decoder_loss=0.2547, over 5782761.53 frames. ], batch size: 90, lr: 6.61e-03, grad_scale: 8.0 2024-09-17 20:35:33,970 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:35:42,528 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 9.019e+01 9.707e+01 1.076e+02 7.268e+02, threshold=1.941e+02, percent-clipped=2.0 2024-09-17 20:35:49,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2024-09-17 20:36:06,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=295880.0, ans=0.0 2024-09-17 20:36:08,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=295880.0, ans=0.07 2024-09-17 20:36:30,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=295920.0, ans=0.125 2024-09-17 20:36:33,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=295920.0, ans=0.0 2024-09-17 20:36:50,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=295960.0, ans=0.125 2024-09-17 20:36:53,526 INFO [train.py:1198] (1/2) Epoch 17, batch 1600, loss[loss=0.2581, ctc_loss=0.1477, cr_loss=0.395, attn_decoder_loss=0.2616, over 29681.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1451, cr_loss=0.3872, attn_decoder_loss=0.2546, over 5765606.35 frames. ], batch size: 85, lr: 6.61e-03, grad_scale: 16.0 2024-09-17 20:37:31,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=22.5 2024-09-17 20:37:43,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=296120.0, ans=0.125 2024-09-17 20:37:51,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=296120.0, ans=0.125 2024-09-17 20:38:03,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2024-09-17 20:38:08,986 INFO [train.py:1198] (1/2) Epoch 17, batch 1650, loss[loss=0.2677, ctc_loss=0.1505, cr_loss=0.3901, attn_decoder_loss=0.2721, over 29703.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1448, cr_loss=0.3863, attn_decoder_loss=0.2545, over 5761835.03 frames. ], batch size: 89, lr: 6.60e-03, grad_scale: 8.0 2024-09-17 20:38:10,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=296200.0, ans=0.125 2024-09-17 20:38:17,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=296200.0, ans=0.125 2024-09-17 20:38:18,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=296200.0, ans=0.2 2024-09-17 20:38:19,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.329e+01 8.617e+01 9.352e+01 1.025e+02 5.265e+02, threshold=1.870e+02, percent-clipped=3.0 2024-09-17 20:38:21,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=296200.0, ans=0.2 2024-09-17 20:38:28,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.90 vs. limit=10.0 2024-09-17 20:38:41,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2024-09-17 20:39:24,831 INFO [train.py:1198] (1/2) Epoch 17, batch 1700, loss[loss=0.2157, ctc_loss=0.1232, cr_loss=0.3483, attn_decoder_loss=0.2182, over 29564.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1443, cr_loss=0.3866, attn_decoder_loss=0.2541, over 5782256.54 frames. ], batch size: 69, lr: 6.60e-03, grad_scale: 8.0 2024-09-17 20:39:31,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=296400.0, ans=0.2 2024-09-17 20:39:41,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=296440.0, ans=0.125 2024-09-17 20:39:53,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=296480.0, ans=0.0 2024-09-17 20:40:04,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=296480.0, ans=0.125 2024-09-17 20:40:06,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=12.0 2024-09-17 20:40:07,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=296480.0, ans=0.07 2024-09-17 20:40:24,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=296520.0, ans=0.125 2024-09-17 20:40:26,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2024-09-17 20:40:31,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=296560.0, ans=0.09899494936611666 2024-09-17 20:40:44,382 INFO [train.py:1198] (1/2) Epoch 17, batch 1750, loss[loss=0.2218, ctc_loss=0.1186, cr_loss=0.3389, attn_decoder_loss=0.2257, over 29387.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.144, cr_loss=0.3862, attn_decoder_loss=0.2538, over 5790634.97 frames. ], batch size: 67, lr: 6.60e-03, grad_scale: 8.0 2024-09-17 20:40:45,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.67 vs. limit=15.0 2024-09-17 20:40:46,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296600.0, ans=0.1 2024-09-17 20:40:47,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296600.0, ans=0.1 2024-09-17 20:40:54,992 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.563e+01 9.059e+01 9.719e+01 2.142e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-17 20:41:14,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296680.0, ans=0.1 2024-09-17 20:41:34,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296720.0, ans=0.1 2024-09-17 20:41:34,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=296720.0, ans=0.2 2024-09-17 20:41:46,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=296760.0, ans=0.125 2024-09-17 20:41:47,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=296760.0, ans=0.0 2024-09-17 20:42:00,004 INFO [train.py:1198] (1/2) Epoch 17, batch 1800, loss[loss=0.2509, ctc_loss=0.1445, cr_loss=0.3899, attn_decoder_loss=0.2541, over 29693.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1442, cr_loss=0.3865, attn_decoder_loss=0.2542, over 5793146.56 frames. ], batch size: 83, lr: 6.60e-03, grad_scale: 8.0 2024-09-17 20:42:01,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=296800.0, ans=0.125 2024-09-17 20:42:03,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=296800.0, ans=0.0 2024-09-17 20:42:06,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=296800.0, ans=0.0 2024-09-17 20:42:27,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=296840.0, ans=0.125 2024-09-17 20:42:29,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=296880.0, ans=0.125 2024-09-17 20:42:58,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=296920.0, ans=0.125 2024-09-17 20:43:07,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=296960.0, ans=0.125 2024-09-17 20:43:08,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296960.0, ans=0.1 2024-09-17 20:43:16,000 INFO [train.py:1198] (1/2) Epoch 17, batch 1850, loss[loss=0.2678, ctc_loss=0.1511, cr_loss=0.4075, attn_decoder_loss=0.2717, over 29643.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.144, cr_loss=0.3863, attn_decoder_loss=0.2541, over 5799165.19 frames. ], batch size: 86, lr: 6.59e-03, grad_scale: 8.0 2024-09-17 20:43:26,330 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 8.992e+01 9.506e+01 1.016e+02 2.077e+02, threshold=1.901e+02, percent-clipped=1.0 2024-09-17 20:43:40,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=297040.0, ans=0.125 2024-09-17 20:43:45,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.04 vs. limit=22.5 2024-09-17 20:43:54,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297080.0, ans=0.1 2024-09-17 20:43:54,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.50 vs. limit=15.0 2024-09-17 20:43:57,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=297080.0, ans=0.125 2024-09-17 20:44:12,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=297120.0, ans=0.125 2024-09-17 20:44:12,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2024-09-17 20:44:24,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=297160.0, ans=0.125 2024-09-17 20:44:26,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=297160.0, ans=0.0 2024-09-17 20:44:26,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=297160.0, ans=0.0 2024-09-17 20:44:35,805 INFO [train.py:1198] (1/2) Epoch 17, batch 1900, loss[loss=0.2557, ctc_loss=0.1383, cr_loss=0.3853, attn_decoder_loss=0.2602, over 29688.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1442, cr_loss=0.3868, attn_decoder_loss=0.2543, over 5806205.11 frames. ], batch size: 89, lr: 6.59e-03, grad_scale: 8.0 2024-09-17 20:44:38,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2024-09-17 20:45:02,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-09-17 20:45:09,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2024-09-17 20:45:18,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=297280.0, ans=0.025 2024-09-17 20:45:41,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=297360.0, ans=0.025 2024-09-17 20:45:44,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=297360.0, ans=0.0 2024-09-17 20:45:46,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=297360.0, ans=0.025 2024-09-17 20:45:49,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=297360.0, ans=0.125 2024-09-17 20:45:52,073 INFO [train.py:1198] (1/2) Epoch 17, batch 1950, loss[loss=0.2521, ctc_loss=0.14, cr_loss=0.3732, attn_decoder_loss=0.2563, over 29447.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1448, cr_loss=0.3874, attn_decoder_loss=0.2554, over 5820150.49 frames. ], batch size: 78, lr: 6.59e-03, grad_scale: 8.0 2024-09-17 20:46:00,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-17 20:46:02,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.875e+01 9.464e+01 9.894e+01 2.247e+02, threshold=1.893e+02, percent-clipped=1.0 2024-09-17 20:46:04,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=297400.0, ans=0.125 2024-09-17 20:46:06,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=297440.0, ans=0.125 2024-09-17 20:46:11,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.07 vs. limit=15.0 2024-09-17 20:46:12,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=297440.0, ans=0.125 2024-09-17 20:46:19,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=297440.0, ans=0.0 2024-09-17 20:46:28,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=297480.0, ans=0.125 2024-09-17 20:46:36,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=297520.0, ans=0.125 2024-09-17 20:46:40,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-17 20:46:43,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=22.5 2024-09-17 20:47:08,297 INFO [train.py:1198] (1/2) Epoch 17, batch 2000, loss[loss=0.2177, ctc_loss=0.1171, cr_loss=0.336, attn_decoder_loss=0.2214, over 29334.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1454, cr_loss=0.3882, attn_decoder_loss=0.2559, over 5797091.44 frames. ], batch size: 67, lr: 6.59e-03, grad_scale: 16.0 2024-09-17 20:47:16,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2024-09-17 20:47:20,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=297600.0, ans=0.125 2024-09-17 20:47:28,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=297640.0, ans=0.0 2024-09-17 20:47:37,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-09-17 20:47:43,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=297680.0, ans=0.2 2024-09-17 20:48:26,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=297800.0, ans=0.2 2024-09-17 20:48:27,878 INFO [train.py:1198] (1/2) Epoch 17, batch 2050, loss[loss=0.2316, ctc_loss=0.1259, cr_loss=0.3635, attn_decoder_loss=0.2353, over 29409.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1449, cr_loss=0.3872, attn_decoder_loss=0.2548, over 5789545.18 frames. ], batch size: 70, lr: 6.59e-03, grad_scale: 8.0 2024-09-17 20:48:40,029 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.707e+01 9.110e+01 9.757e+01 1.726e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-17 20:48:59,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=297880.0, ans=0.0 2024-09-17 20:49:07,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=297880.0, ans=0.125 2024-09-17 20:49:08,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=297880.0, ans=0.1 2024-09-17 20:49:09,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2024-09-17 20:49:17,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=297920.0, ans=0.125 2024-09-17 20:49:31,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=297960.0, ans=0.025 2024-09-17 20:49:38,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=297960.0, ans=0.0 2024-09-17 20:49:38,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=297960.0, ans=0.1 2024-09-17 20:49:39,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2024-09-17 20:49:41,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=298000.0, ans=0.04949747468305833 2024-09-17 20:49:43,061 INFO [train.py:1198] (1/2) Epoch 17, batch 2100, loss[loss=0.2344, ctc_loss=0.1304, cr_loss=0.3573, attn_decoder_loss=0.2381, over 29785.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1436, cr_loss=0.385, attn_decoder_loss=0.2536, over 5802375.64 frames. ], batch size: 81, lr: 6.58e-03, grad_scale: 8.0 2024-09-17 20:49:44,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=298000.0, ans=0.0 2024-09-17 20:49:50,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=298000.0, ans=0.04949747468305833 2024-09-17 20:49:58,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=298040.0, ans=15.0 2024-09-17 20:50:29,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-17 20:50:46,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=298160.0, ans=0.125 2024-09-17 20:50:47,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.14 vs. limit=12.0 2024-09-17 20:50:58,163 INFO [train.py:1198] (1/2) Epoch 17, batch 2150, loss[loss=0.2541, ctc_loss=0.1492, cr_loss=0.4085, attn_decoder_loss=0.2567, over 29479.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1431, cr_loss=0.3845, attn_decoder_loss=0.2531, over 5817133.15 frames. ], batch size: 78, lr: 6.58e-03, grad_scale: 8.0 2024-09-17 20:51:01,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=298200.0, ans=0.125 2024-09-17 20:51:10,382 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.718e+01 9.185e+01 9.940e+01 1.615e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-17 20:51:42,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=298320.0, ans=0.025 2024-09-17 20:51:49,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=298320.0, ans=0.125 2024-09-17 20:51:55,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=298320.0, ans=0.2 2024-09-17 20:51:57,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=298360.0, ans=0.125 2024-09-17 20:52:03,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2024-09-17 20:52:13,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=298360.0, ans=0.1 2024-09-17 20:52:17,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2024-09-17 20:52:18,711 INFO [train.py:1198] (1/2) Epoch 17, batch 2200, loss[loss=0.2542, ctc_loss=0.1366, cr_loss=0.3647, attn_decoder_loss=0.2591, over 29641.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1431, cr_loss=0.3848, attn_decoder_loss=0.2535, over 5812660.70 frames. ], batch size: 86, lr: 6.58e-03, grad_scale: 8.0 2024-09-17 20:53:11,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=298520.0, ans=0.125 2024-09-17 20:53:14,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298520.0, ans=0.1 2024-09-17 20:53:20,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=298560.0, ans=0.125 2024-09-17 20:53:24,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=15.0 2024-09-17 20:53:25,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=298560.0, ans=0.125 2024-09-17 20:53:34,402 INFO [train.py:1198] (1/2) Epoch 17, batch 2250, loss[loss=0.2573, ctc_loss=0.1478, cr_loss=0.4012, attn_decoder_loss=0.2605, over 29680.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1433, cr_loss=0.3846, attn_decoder_loss=0.2536, over 5811318.95 frames. ], batch size: 82, lr: 6.58e-03, grad_scale: 8.0 2024-09-17 20:53:37,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2024-09-17 20:53:46,690 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.540e+01 9.223e+01 9.820e+01 2.780e+02, threshold=1.845e+02, percent-clipped=3.0 2024-09-17 20:53:55,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=298640.0, ans=0.0 2024-09-17 20:54:01,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=298640.0, ans=0.125 2024-09-17 20:54:26,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2024-09-17 20:54:30,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=298720.0, ans=0.125 2024-09-17 20:54:33,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=298760.0, ans=0.125 2024-09-17 20:54:39,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=298760.0, ans=0.0 2024-09-17 20:54:43,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-09-17 20:54:50,222 INFO [train.py:1198] (1/2) Epoch 17, batch 2300, loss[loss=0.228, ctc_loss=0.1261, cr_loss=0.363, attn_decoder_loss=0.2313, over 29361.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1429, cr_loss=0.384, attn_decoder_loss=0.2528, over 5797241.33 frames. ], batch size: 71, lr: 6.57e-03, grad_scale: 8.0 2024-09-17 20:54:56,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298800.0, ans=0.1 2024-09-17 20:55:04,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298840.0, ans=0.1 2024-09-17 20:55:10,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=298840.0, ans=0.125 2024-09-17 20:55:39,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298920.0, ans=0.1 2024-09-17 20:55:39,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=298920.0, ans=0.0 2024-09-17 20:55:48,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=298920.0, ans=0.0 2024-09-17 20:55:55,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=298960.0, ans=0.05 2024-09-17 20:55:59,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-09-17 20:56:00,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-09-17 20:56:07,994 INFO [train.py:1198] (1/2) Epoch 17, batch 2350, loss[loss=0.2635, ctc_loss=0.1509, cr_loss=0.3915, attn_decoder_loss=0.2673, over 29680.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1432, cr_loss=0.3848, attn_decoder_loss=0.2533, over 5801847.80 frames. ], batch size: 83, lr: 6.57e-03, grad_scale: 8.0 2024-09-17 20:56:21,978 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.665e+01 8.873e+01 9.644e+01 1.055e+02 1.144e+03, threshold=1.929e+02, percent-clipped=2.0 2024-09-17 20:57:03,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=299120.0, ans=0.125 2024-09-17 20:57:26,176 INFO [train.py:1198] (1/2) Epoch 17, batch 2400, loss[loss=0.2335, ctc_loss=0.1327, cr_loss=0.3736, attn_decoder_loss=0.2364, over 29520.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1438, cr_loss=0.3857, attn_decoder_loss=0.2537, over 5806576.09 frames. ], batch size: 76, lr: 6.57e-03, grad_scale: 16.0 2024-09-17 20:57:35,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=299200.0, ans=0.125 2024-09-17 20:57:38,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=299200.0, ans=0.09899494936611666 2024-09-17 20:57:50,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=299240.0, ans=0.0 2024-09-17 20:57:52,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=299240.0, ans=0.1 2024-09-17 20:57:55,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=299280.0, ans=0.125 2024-09-17 20:58:10,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299320.0, ans=0.1 2024-09-17 20:58:23,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299320.0, ans=0.1 2024-09-17 20:58:24,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.09 vs. limit=15.0 2024-09-17 20:58:38,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.23 vs. limit=15.0 2024-09-17 20:58:39,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=299360.0, ans=0.0 2024-09-17 20:58:41,907 INFO [train.py:1198] (1/2) Epoch 17, batch 2450, loss[loss=0.2463, ctc_loss=0.1304, cr_loss=0.3649, attn_decoder_loss=0.2511, over 29697.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1443, cr_loss=0.3868, attn_decoder_loss=0.2548, over 5784365.42 frames. ], batch size: 82, lr: 6.57e-03, grad_scale: 8.0 2024-09-17 20:58:55,497 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 9.066e+01 9.720e+01 1.171e+02 1.991e+02, threshold=1.944e+02, percent-clipped=1.0 2024-09-17 20:58:55,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299440.0, ans=0.1 2024-09-17 20:59:19,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=299480.0, ans=0.125 2024-09-17 20:59:26,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=299520.0, ans=0.125 2024-09-17 20:59:27,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=299520.0, ans=0.125 2024-09-17 20:59:42,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=299560.0, ans=0.125 2024-09-17 20:59:54,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2024-09-17 20:59:59,621 INFO [train.py:1198] (1/2) Epoch 17, batch 2500, loss[loss=0.2448, ctc_loss=0.1281, cr_loss=0.366, attn_decoder_loss=0.2496, over 29641.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1445, cr_loss=0.3872, attn_decoder_loss=0.2549, over 5794824.45 frames. ], batch size: 86, lr: 6.57e-03, grad_scale: 8.0 2024-09-17 21:00:19,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299640.0, ans=0.1 2024-09-17 21:00:20,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=299640.0, ans=0.125 2024-09-17 21:00:36,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.06 vs. limit=10.0 2024-09-17 21:00:50,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.26 vs. limit=15.0 2024-09-17 21:01:18,017 INFO [train.py:1198] (1/2) Epoch 17, batch 2550, loss[loss=0.2229, ctc_loss=0.1246, cr_loss=0.3465, attn_decoder_loss=0.2261, over 29357.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1443, cr_loss=0.387, attn_decoder_loss=0.2548, over 5797727.52 frames. ], batch size: 67, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:01:31,610 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 8.659e+01 9.126e+01 9.764e+01 1.342e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-17 21:01:36,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=299840.0, ans=0.0 2024-09-17 21:01:49,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=299880.0, ans=0.125 2024-09-17 21:01:51,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=299880.0, ans=0.125 2024-09-17 21:02:08,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=299920.0, ans=0.0 2024-09-17 21:02:09,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=299920.0, ans=0.2 2024-09-17 21:02:18,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=299960.0, ans=0.125 2024-09-17 21:02:23,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299960.0, ans=0.1 2024-09-17 21:02:34,196 INFO [train.py:1198] (1/2) Epoch 17, batch 2600, loss[loss=0.249, ctc_loss=0.144, cr_loss=0.3847, attn_decoder_loss=0.2522, over 29451.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1445, cr_loss=0.3875, attn_decoder_loss=0.255, over 5794294.29 frames. ], batch size: 78, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:02:39,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=300000.0, ans=0.0 2024-09-17 21:02:48,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.96 vs. limit=22.5 2024-09-17 21:02:53,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=300040.0, ans=0.2 2024-09-17 21:03:02,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=300080.0, ans=0.125 2024-09-17 21:03:38,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=300160.0, ans=0.0 2024-09-17 21:03:51,168 INFO [train.py:1198] (1/2) Epoch 17, batch 2650, loss[loss=0.259, ctc_loss=0.1456, cr_loss=0.4127, attn_decoder_loss=0.2624, over 29254.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1448, cr_loss=0.388, attn_decoder_loss=0.2554, over 5799683.73 frames. ], batch size: 100, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:03:54,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=300200.0, ans=0.125 2024-09-17 21:03:56,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-09-17 21:04:06,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 8.955e+01 9.384e+01 9.945e+01 2.228e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-17 21:04:27,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=300280.0, ans=0.07 2024-09-17 21:04:49,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=300320.0, ans=0.2 2024-09-17 21:05:09,151 INFO [train.py:1198] (1/2) Epoch 17, batch 2700, loss[loss=0.2575, ctc_loss=0.1457, cr_loss=0.4069, attn_decoder_loss=0.2609, over 29541.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1446, cr_loss=0.388, attn_decoder_loss=0.2555, over 5794745.25 frames. ], batch size: 87, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:05:28,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=300440.0, ans=0.125 2024-09-17 21:05:47,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=300480.0, ans=0.0 2024-09-17 21:05:50,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=300480.0, ans=0.0 2024-09-17 21:05:53,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.92 vs. limit=6.0 2024-09-17 21:05:59,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=300520.0, ans=0.5 2024-09-17 21:06:00,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300520.0, ans=0.1 2024-09-17 21:06:16,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=12.0 2024-09-17 21:06:24,706 INFO [train.py:1198] (1/2) Epoch 17, batch 2750, loss[loss=0.2289, ctc_loss=0.1308, cr_loss=0.3706, attn_decoder_loss=0.2315, over 29526.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1437, cr_loss=0.3863, attn_decoder_loss=0.2541, over 5795235.05 frames. ], batch size: 75, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:06:38,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.681e+01 9.439e+01 1.052e+02 4.745e+02, threshold=1.888e+02, percent-clipped=3.0 2024-09-17 21:06:52,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=300640.0, ans=0.125 2024-09-17 21:07:10,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=300720.0, ans=0.125 2024-09-17 21:07:29,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=300760.0, ans=0.025 2024-09-17 21:07:35,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=300760.0, ans=0.1 2024-09-17 21:07:43,693 INFO [train.py:1198] (1/2) Epoch 17, batch 2800, loss[loss=0.2793, ctc_loss=0.1869, cr_loss=0.4242, attn_decoder_loss=0.2802, over 20448.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1436, cr_loss=0.3859, attn_decoder_loss=0.2541, over 5776567.61 frames. ], batch size: 209, lr: 6.55e-03, grad_scale: 16.0 2024-09-17 21:07:59,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=300840.0, ans=0.0 2024-09-17 21:08:05,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=300840.0, ans=0.125 2024-09-17 21:08:07,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2024-09-17 21:08:10,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=300840.0, ans=0.09899494936611666 2024-09-17 21:08:13,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=300840.0, ans=0.125 2024-09-17 21:08:17,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=300880.0, ans=0.0 2024-09-17 21:08:40,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=300920.0, ans=0.0 2024-09-17 21:08:43,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=300920.0, ans=0.125 2024-09-17 21:08:52,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=300960.0, ans=0.0 2024-09-17 21:08:56,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2024-09-17 21:09:01,395 INFO [train.py:1198] (1/2) Epoch 17, batch 2850, loss[loss=0.2507, ctc_loss=0.141, cr_loss=0.3737, attn_decoder_loss=0.2546, over 29492.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1443, cr_loss=0.3864, attn_decoder_loss=0.2547, over 5763011.02 frames. ], batch size: 77, lr: 6.55e-03, grad_scale: 8.0 2024-09-17 21:09:09,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2024-09-17 21:09:15,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=301040.0, ans=0.1 2024-09-17 21:09:16,420 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.947e+01 9.466e+01 1.049e+02 1.883e+02, threshold=1.893e+02, percent-clipped=0.0 2024-09-17 21:09:45,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=301120.0, ans=0.95 2024-09-17 21:09:55,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-17 21:10:05,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=301160.0, ans=0.0 2024-09-17 21:10:06,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=301160.0, ans=0.125 2024-09-17 21:10:06,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=301160.0, ans=0.125 2024-09-17 21:10:11,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=301160.0, ans=0.025 2024-09-17 21:10:17,118 INFO [train.py:1198] (1/2) Epoch 17, batch 2900, loss[loss=0.2463, ctc_loss=0.1383, cr_loss=0.3771, attn_decoder_loss=0.2499, over 29797.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1448, cr_loss=0.388, attn_decoder_loss=0.2558, over 5788203.50 frames. ], batch size: 80, lr: 6.55e-03, grad_scale: 8.0 2024-09-17 21:10:21,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=301200.0, ans=0.125 2024-09-17 21:10:21,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=301200.0, ans=0.125 2024-09-17 21:10:24,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=301200.0, ans=0.0 2024-09-17 21:10:28,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=301200.0, ans=0.1 2024-09-17 21:10:38,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=301240.0, ans=0.125 2024-09-17 21:10:42,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.18 vs. limit=15.0 2024-09-17 21:10:46,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=301280.0, ans=0.125 2024-09-17 21:10:52,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=301280.0, ans=0.125 2024-09-17 21:10:52,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=301280.0, ans=0.125 2024-09-17 21:11:01,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=301320.0, ans=0.125 2024-09-17 21:11:35,008 INFO [train.py:1198] (1/2) Epoch 17, batch 2950, loss[loss=0.2381, ctc_loss=0.1235, cr_loss=0.3487, attn_decoder_loss=0.2431, over 29519.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.144, cr_loss=0.3862, attn_decoder_loss=0.2545, over 5782269.75 frames. ], batch size: 75, lr: 6.55e-03, grad_scale: 8.0 2024-09-17 21:11:42,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=301400.0, ans=0.125 2024-09-17 21:11:42,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=301400.0, ans=0.125 2024-09-17 21:11:52,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.656e+01 9.103e+01 9.738e+01 1.377e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-17 21:12:07,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=301480.0, ans=10.0 2024-09-17 21:12:21,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=301520.0, ans=0.025 2024-09-17 21:12:21,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2024-09-17 21:12:27,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=301520.0, ans=0.0 2024-09-17 21:12:45,266 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:12:48,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2024-09-17 21:12:51,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=301600.0, ans=0.0 2024-09-17 21:12:52,856 INFO [train.py:1198] (1/2) Epoch 17, batch 3000, loss[loss=0.2506, ctc_loss=0.1386, cr_loss=0.3829, attn_decoder_loss=0.2546, over 29758.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1439, cr_loss=0.3864, attn_decoder_loss=0.2541, over 5782437.83 frames. ], batch size: 81, lr: 6.54e-03, grad_scale: 8.0 2024-09-17 21:12:52,856 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 21:12:58,622 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.7307, 6.6486, 6.1153, 6.1799], device='cuda:1') 2024-09-17 21:13:11,356 INFO [train.py:1230] (1/2) Epoch 17, validation: loss=0.2115, ctc_loss=0.04066, cr_loss=4.995e-15, attn_decoder_loss=0.2305, over 944034.00 frames. 2024-09-17 21:13:11,357 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 21:13:20,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=301600.0, ans=0.125 2024-09-17 21:13:34,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=301640.0, ans=0.2 2024-09-17 21:13:47,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=301680.0, ans=0.0 2024-09-17 21:14:04,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=301720.0, ans=0.125 2024-09-17 21:14:27,356 INFO [train.py:1198] (1/2) Epoch 17, batch 3050, loss[loss=0.2425, ctc_loss=0.1385, cr_loss=0.39, attn_decoder_loss=0.2454, over 29525.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1448, cr_loss=0.3877, attn_decoder_loss=0.2551, over 5776740.95 frames. ], batch size: 76, lr: 6.54e-03, grad_scale: 8.0 2024-09-17 21:14:30,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=301800.0, ans=0.04949747468305833 2024-09-17 21:14:39,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=301800.0, ans=0.125 2024-09-17 21:14:42,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 9.363e+01 1.016e+02 1.140e+02 2.796e+02, threshold=2.033e+02, percent-clipped=4.0 2024-09-17 21:14:56,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=301880.0, ans=0.125 2024-09-17 21:14:59,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-17 21:15:36,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=301960.0, ans=0.0 2024-09-17 21:15:43,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-17 21:15:46,820 INFO [train.py:1198] (1/2) Epoch 17, batch 3100, loss[loss=0.2712, ctc_loss=0.1551, cr_loss=0.4218, attn_decoder_loss=0.2747, over 29321.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1445, cr_loss=0.387, attn_decoder_loss=0.2547, over 5777170.20 frames. ], batch size: 100, lr: 6.54e-03, grad_scale: 8.0 2024-09-17 21:16:21,671 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:17:02,401 INFO [train.py:1198] (1/2) Epoch 17, batch 3150, loss[loss=0.2595, ctc_loss=0.1506, cr_loss=0.3995, attn_decoder_loss=0.2627, over 28847.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1444, cr_loss=0.3868, attn_decoder_loss=0.2548, over 5783813.29 frames. ], batch size: 104, lr: 6.54e-03, grad_scale: 8.0 2024-09-17 21:17:04,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=302200.0, ans=0.0 2024-09-17 21:17:07,422 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:17:17,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.912e+01 9.257e+01 9.921e+01 1.761e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-17 21:17:35,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302280.0, ans=0.1 2024-09-17 21:17:55,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=302320.0, ans=0.0 2024-09-17 21:18:00,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=302320.0, ans=0.2 2024-09-17 21:18:03,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=302360.0, ans=0.0 2024-09-17 21:18:18,471 INFO [train.py:1198] (1/2) Epoch 17, batch 3200, loss[loss=0.2556, ctc_loss=0.1512, cr_loss=0.4037, attn_decoder_loss=0.2582, over 29415.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1436, cr_loss=0.3853, attn_decoder_loss=0.2539, over 5793515.37 frames. ], batch size: 79, lr: 6.54e-03, grad_scale: 16.0 2024-09-17 21:18:27,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=302400.0, ans=0.0 2024-09-17 21:18:32,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=302440.0, ans=0.125 2024-09-17 21:18:46,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.68 vs. limit=15.0 2024-09-17 21:19:09,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=302520.0, ans=0.025 2024-09-17 21:19:19,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=302560.0, ans=0.125 2024-09-17 21:19:38,256 INFO [train.py:1198] (1/2) Epoch 17, batch 3250, loss[loss=0.2581, ctc_loss=0.16, cr_loss=0.414, attn_decoder_loss=0.2598, over 29705.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1438, cr_loss=0.3862, attn_decoder_loss=0.2543, over 5801104.28 frames. ], batch size: 84, lr: 6.53e-03, grad_scale: 8.0 2024-09-17 21:19:44,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=302600.0, ans=0.125 2024-09-17 21:19:54,947 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.527e+01 9.036e+01 9.665e+01 1.223e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-17 21:19:55,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=302640.0, ans=0.125 2024-09-17 21:19:58,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.21 vs. limit=15.0 2024-09-17 21:20:17,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=302680.0, ans=0.09899494936611666 2024-09-17 21:20:51,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.64 vs. limit=15.0 2024-09-17 21:20:53,908 INFO [train.py:1198] (1/2) Epoch 17, batch 3300, loss[loss=0.2536, ctc_loss=0.1389, cr_loss=0.3865, attn_decoder_loss=0.2578, over 28170.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1431, cr_loss=0.385, attn_decoder_loss=0.2532, over 5798973.56 frames. ], batch size: 111, lr: 6.53e-03, grad_scale: 8.0 2024-09-17 21:21:03,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=302800.0, ans=0.125 2024-09-17 21:21:17,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302840.0, ans=0.1 2024-09-17 21:21:35,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=302880.0, ans=0.0 2024-09-17 21:21:36,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=302880.0, ans=0.0 2024-09-17 21:21:38,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=302920.0, ans=0.05 2024-09-17 21:21:41,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=302920.0, ans=0.0 2024-09-17 21:21:41,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=302920.0, ans=0.125 2024-09-17 21:21:53,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=302960.0, ans=0.125 2024-09-17 21:22:09,780 INFO [train.py:1198] (1/2) Epoch 17, batch 3350, loss[loss=0.2601, ctc_loss=0.1523, cr_loss=0.3961, attn_decoder_loss=0.2632, over 28758.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1438, cr_loss=0.386, attn_decoder_loss=0.2541, over 5775278.52 frames. ], batch size: 104, lr: 6.53e-03, grad_scale: 4.0 2024-09-17 21:22:11,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=303000.0, ans=0.0 2024-09-17 21:22:28,072 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.973e+01 8.919e+01 9.576e+01 1.043e+02 2.558e+02, threshold=1.915e+02, percent-clipped=2.0 2024-09-17 21:22:40,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=303080.0, ans=0.2 2024-09-17 21:23:29,947 INFO [train.py:1198] (1/2) Epoch 17, batch 3400, loss[loss=0.2277, ctc_loss=0.1271, cr_loss=0.3516, attn_decoder_loss=0.2311, over 29354.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1442, cr_loss=0.3864, attn_decoder_loss=0.2544, over 5767419.01 frames. ], batch size: 67, lr: 6.53e-03, grad_scale: 8.0 2024-09-17 21:24:23,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=303320.0, ans=0.125 2024-09-17 21:24:40,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=303360.0, ans=0.2 2024-09-17 21:24:45,978 INFO [train.py:1198] (1/2) Epoch 17, batch 3450, loss[loss=0.256, ctc_loss=0.1406, cr_loss=0.3923, attn_decoder_loss=0.2601, over 28324.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1438, cr_loss=0.3861, attn_decoder_loss=0.2545, over 5774327.29 frames. ], batch size: 111, lr: 6.53e-03, grad_scale: 8.0 2024-09-17 21:24:55,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=303400.0, ans=0.0 2024-09-17 21:25:00,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=303440.0, ans=0.125 2024-09-17 21:25:00,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=303440.0, ans=0.125 2024-09-17 21:25:04,358 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 9.059e+01 9.380e+01 1.001e+02 2.094e+02, threshold=1.876e+02, percent-clipped=1.0 2024-09-17 21:25:13,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=303440.0, ans=0.07 2024-09-17 21:25:15,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=303480.0, ans=0.0 2024-09-17 21:25:16,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=303480.0, ans=0.125 2024-09-17 21:25:19,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=303480.0, ans=0.1 2024-09-17 21:25:24,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=303480.0, ans=0.125 2024-09-17 21:25:38,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=303520.0, ans=0.125 2024-09-17 21:26:01,983 INFO [train.py:1198] (1/2) Epoch 17, batch 3500, loss[loss=0.2194, ctc_loss=0.1136, cr_loss=0.337, attn_decoder_loss=0.2237, over 29302.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1432, cr_loss=0.3851, attn_decoder_loss=0.2537, over 5775872.86 frames. ], batch size: 71, lr: 6.52e-03, grad_scale: 8.0 2024-09-17 21:26:11,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303600.0, ans=0.1 2024-09-17 21:26:14,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=10.0 2024-09-17 21:26:32,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303680.0, ans=0.1 2024-09-17 21:26:35,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.27 vs. limit=15.0 2024-09-17 21:26:42,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=303680.0, ans=0.125 2024-09-17 21:26:58,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=303720.0, ans=0.025 2024-09-17 21:27:01,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.73 vs. limit=12.0 2024-09-17 21:27:02,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=303760.0, ans=0.125 2024-09-17 21:27:18,819 INFO [train.py:1198] (1/2) Epoch 17, batch 3550, loss[loss=0.2577, ctc_loss=0.1349, cr_loss=0.3564, attn_decoder_loss=0.2634, over 29705.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1433, cr_loss=0.3857, attn_decoder_loss=0.2537, over 5781493.67 frames. ], batch size: 89, lr: 6.52e-03, grad_scale: 8.0 2024-09-17 21:27:36,512 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.716e+01 9.254e+01 9.841e+01 2.209e+02, threshold=1.851e+02, percent-clipped=2.0 2024-09-17 21:27:39,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=303840.0, ans=0.125 2024-09-17 21:27:57,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=303880.0, ans=0.025 2024-09-17 21:28:00,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=303880.0, ans=0.125 2024-09-17 21:28:18,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.20 vs. limit=15.0 2024-09-17 21:28:43,014 INFO [train.py:1198] (1/2) Epoch 17, batch 3600, loss[loss=0.2273, ctc_loss=0.1189, cr_loss=0.3231, attn_decoder_loss=0.2322, over 29502.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1437, cr_loss=0.3862, attn_decoder_loss=0.254, over 5791569.09 frames. ], batch size: 77, lr: 6.52e-03, grad_scale: 16.0 2024-09-17 21:28:43,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=304000.0, ans=0.035 2024-09-17 21:28:43,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=304000.0, ans=0.125 2024-09-17 21:29:01,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=304040.0, ans=0.1 2024-09-17 21:29:18,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.32 vs. limit=15.0 2024-09-17 21:29:25,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=304080.0, ans=0.0 2024-09-17 21:29:26,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=304120.0, ans=0.125 2024-09-17 21:29:37,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=304120.0, ans=0.2 2024-09-17 21:29:57,763 INFO [train.py:1198] (1/2) Epoch 17, batch 3650, loss[loss=0.2597, ctc_loss=0.1519, cr_loss=0.4026, attn_decoder_loss=0.2627, over 29521.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1429, cr_loss=0.3845, attn_decoder_loss=0.2532, over 5794204.23 frames. ], batch size: 90, lr: 6.52e-03, grad_scale: 8.0 2024-09-17 21:30:11,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=304240.0, ans=0.07 2024-09-17 21:30:17,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.843e+01 9.212e+01 9.798e+01 3.342e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-17 21:30:46,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-09-17 21:31:12,236 INFO [train.py:1198] (1/2) Epoch 17, batch 3700, loss[loss=0.2551, ctc_loss=0.1444, cr_loss=0.3868, attn_decoder_loss=0.2588, over 29703.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1428, cr_loss=0.3843, attn_decoder_loss=0.2533, over 5804124.89 frames. ], batch size: 84, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:31:18,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=304400.0, ans=0.0 2024-09-17 21:31:20,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=304400.0, ans=0.025 2024-09-17 21:31:21,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304400.0, ans=0.1 2024-09-17 21:31:24,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=304400.0, ans=0.95 2024-09-17 21:31:27,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=304440.0, ans=0.1 2024-09-17 21:31:36,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=304440.0, ans=0.125 2024-09-17 21:31:36,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=304440.0, ans=0.2 2024-09-17 21:31:50,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.56 vs. limit=15.0 2024-09-17 21:31:53,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.46 vs. limit=15.0 2024-09-17 21:31:59,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=304520.0, ans=0.0 2024-09-17 21:32:02,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2024-09-17 21:32:06,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=304520.0, ans=15.0 2024-09-17 21:32:06,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2024-09-17 21:32:15,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.64 vs. limit=10.0 2024-09-17 21:32:26,330 INFO [train.py:1198] (1/2) Epoch 17, batch 3750, loss[loss=0.2158, ctc_loss=0.1149, cr_loss=0.3274, attn_decoder_loss=0.2197, over 29322.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1429, cr_loss=0.3838, attn_decoder_loss=0.2532, over 5807798.41 frames. ], batch size: 67, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:32:38,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=304600.0, ans=0.125 2024-09-17 21:32:45,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.729e+01 9.186e+01 9.795e+01 2.542e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-17 21:32:48,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=12.0 2024-09-17 21:32:53,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=304640.0, ans=0.125 2024-09-17 21:33:17,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=304720.0, ans=0.125 2024-09-17 21:33:20,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=304720.0, ans=0.125 2024-09-17 21:33:31,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=304760.0, ans=0.025 2024-09-17 21:33:43,105 INFO [train.py:1198] (1/2) Epoch 17, batch 3800, loss[loss=0.2596, ctc_loss=0.1454, cr_loss=0.3975, attn_decoder_loss=0.2634, over 29626.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1427, cr_loss=0.3838, attn_decoder_loss=0.2532, over 5799061.58 frames. ], batch size: 86, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:33:49,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304800.0, ans=0.1 2024-09-17 21:33:50,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2024-09-17 21:33:59,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=304840.0, ans=0.2 2024-09-17 21:34:17,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304880.0, ans=0.1 2024-09-17 21:34:20,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=304880.0, ans=0.025 2024-09-17 21:34:50,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=304960.0, ans=0.125 2024-09-17 21:34:51,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-09-17 21:34:59,151 INFO [train.py:1198] (1/2) Epoch 17, batch 3850, loss[loss=0.264, ctc_loss=0.1512, cr_loss=0.4122, attn_decoder_loss=0.2674, over 29224.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1428, cr_loss=0.3846, attn_decoder_loss=0.2533, over 5813045.93 frames. ], batch size: 100, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:35:00,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.35 vs. limit=15.0 2024-09-17 21:35:00,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.54 vs. limit=12.0 2024-09-17 21:35:16,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2024-09-17 21:35:18,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.722e+01 9.215e+01 9.828e+01 1.401e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-17 21:35:20,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=305040.0, ans=0.125 2024-09-17 21:35:36,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2024-09-17 21:35:48,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=305120.0, ans=0.025 2024-09-17 21:35:54,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=305120.0, ans=0.0 2024-09-17 21:36:04,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.14 vs. limit=15.0 2024-09-17 21:36:12,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=305200.0, ans=0.0 2024-09-17 21:36:13,794 INFO [train.py:1198] (1/2) Epoch 17, batch 3900, loss[loss=0.2668, ctc_loss=0.152, cr_loss=0.4096, attn_decoder_loss=0.2705, over 29631.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1429, cr_loss=0.3846, attn_decoder_loss=0.2537, over 5817121.57 frames. ], batch size: 86, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:36:48,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=305280.0, ans=0.125 2024-09-17 21:36:52,717 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:36:58,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=305320.0, ans=0.025 2024-09-17 21:37:05,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=305320.0, ans=0.0 2024-09-17 21:37:07,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=305320.0, ans=0.0 2024-09-17 21:37:19,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=305360.0, ans=0.0 2024-09-17 21:37:28,055 INFO [train.py:1198] (1/2) Epoch 17, batch 3950, loss[loss=0.2714, ctc_loss=0.1637, cr_loss=0.4438, attn_decoder_loss=0.2735, over 29483.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1426, cr_loss=0.385, attn_decoder_loss=0.2536, over 5836425.41 frames. ], batch size: 97, lr: 6.50e-03, grad_scale: 8.0 2024-09-17 21:37:32,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=305400.0, ans=0.0 2024-09-17 21:37:34,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=305400.0, ans=0.125 2024-09-17 21:37:47,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=15.0 2024-09-17 21:37:47,421 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.762e+01 9.164e+01 9.964e+01 1.868e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-17 21:37:50,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=305440.0, ans=0.2 2024-09-17 21:37:56,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=305480.0, ans=0.125 2024-09-17 21:37:56,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=305480.0, ans=0.07 2024-09-17 21:37:59,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=22.5 2024-09-17 21:38:06,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=305480.0, ans=0.1 2024-09-17 21:38:09,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=305480.0, ans=0.125 2024-09-17 21:38:09,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=305480.0, ans=0.125 2024-09-17 21:38:40,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.81 vs. limit=15.0 2024-09-17 21:38:44,016 INFO [train.py:1198] (1/2) Epoch 17, batch 4000, loss[loss=0.2347, ctc_loss=0.1289, cr_loss=0.3485, attn_decoder_loss=0.2387, over 29506.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1432, cr_loss=0.3853, attn_decoder_loss=0.2537, over 5812671.69 frames. ], batch size: 74, lr: 6.50e-03, grad_scale: 16.0 2024-09-17 21:39:04,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=305640.0, ans=0.125 2024-09-17 21:39:09,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=305640.0, ans=0.09899494936611666 2024-09-17 21:39:18,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=305680.0, ans=0.0 2024-09-17 21:39:27,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=305720.0, ans=0.0 2024-09-17 21:39:38,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2024-09-17 21:39:55,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=12.0 2024-09-17 21:39:59,434 INFO [train.py:1198] (1/2) Epoch 17, batch 4050, loss[loss=0.2865, ctc_loss=0.1909, cr_loss=0.4292, attn_decoder_loss=0.2876, over 20173.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1432, cr_loss=0.3855, attn_decoder_loss=0.2538, over 5796427.29 frames. ], batch size: 210, lr: 6.50e-03, grad_scale: 8.0 2024-09-17 21:40:01,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305800.0, ans=0.1 2024-09-17 21:40:19,859 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.726e+01 9.314e+01 1.066e+02 2.595e+02, threshold=1.863e+02, percent-clipped=1.0 2024-09-17 21:40:22,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.64 vs. limit=15.0 2024-09-17 21:40:27,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=305880.0, ans=0.125 2024-09-17 21:40:46,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=305920.0, ans=0.125 2024-09-17 21:41:06,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.44 vs. limit=15.0 2024-09-17 21:41:07,237 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:41:08,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=305960.0, ans=0.0 2024-09-17 21:41:11,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306000.0, ans=0.1 2024-09-17 21:41:12,870 INFO [train.py:1198] (1/2) Epoch 17, batch 4100, loss[loss=0.2661, ctc_loss=0.1571, cr_loss=0.4374, attn_decoder_loss=0.2685, over 29506.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1437, cr_loss=0.3861, attn_decoder_loss=0.2541, over 5792194.11 frames. ], batch size: 90, lr: 6.50e-03, grad_scale: 8.0 2024-09-17 21:41:29,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2024-09-17 21:41:48,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=306080.0, ans=0.0 2024-09-17 21:41:54,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306080.0, ans=0.1 2024-09-17 21:41:57,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-17 21:42:26,513 INFO [train.py:1198] (1/2) Epoch 17, batch 4150, loss[loss=0.2478, ctc_loss=0.1438, cr_loss=0.391, attn_decoder_loss=0.2506, over 29488.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1435, cr_loss=0.3854, attn_decoder_loss=0.2535, over 5797381.48 frames. ], batch size: 77, lr: 6.50e-03, grad_scale: 8.0 2024-09-17 21:42:28,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=306200.0, ans=0.125 2024-09-17 21:42:42,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=306240.0, ans=0.2 2024-09-17 21:42:47,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=306240.0, ans=0.0 2024-09-17 21:42:48,267 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.896e+01 9.299e+01 9.873e+01 2.442e+02, threshold=1.860e+02, percent-clipped=1.0 2024-09-17 21:43:00,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=306280.0, ans=0.125 2024-09-17 21:43:00,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=306280.0, ans=0.07 2024-09-17 21:43:03,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=306280.0, ans=0.125 2024-09-17 21:43:12,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=306320.0, ans=0.0 2024-09-17 21:43:29,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=306360.0, ans=0.125 2024-09-17 21:43:30,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=12.0 2024-09-17 21:43:41,494 INFO [train.py:1198] (1/2) Epoch 17, batch 4200, loss[loss=0.2684, ctc_loss=0.1702, cr_loss=0.4485, attn_decoder_loss=0.2694, over 29498.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1435, cr_loss=0.3861, attn_decoder_loss=0.2536, over 5799648.90 frames. ], batch size: 90, lr: 6.49e-03, grad_scale: 8.0 2024-09-17 21:43:52,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-17 21:44:02,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306440.0, ans=0.1 2024-09-17 21:44:17,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.38 vs. limit=22.5 2024-09-17 21:44:24,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=306480.0, ans=0.0 2024-09-17 21:44:26,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=306520.0, ans=0.0 2024-09-17 21:44:27,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=306520.0, ans=0.1 2024-09-17 21:44:56,160 INFO [train.py:1198] (1/2) Epoch 17, batch 4250, loss[loss=0.2353, ctc_loss=0.1293, cr_loss=0.3602, attn_decoder_loss=0.2391, over 29501.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.143, cr_loss=0.3851, attn_decoder_loss=0.2535, over 5805525.99 frames. ], batch size: 74, lr: 6.49e-03, grad_scale: 8.0 2024-09-17 21:44:57,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=306600.0, ans=0.125 2024-09-17 21:45:08,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-09-17 21:45:16,392 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.736e+01 9.267e+01 9.996e+01 5.774e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-17 21:45:23,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=306680.0, ans=0.0 2024-09-17 21:45:25,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=306680.0, ans=0.125 2024-09-17 21:45:39,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.98 vs. limit=15.0 2024-09-17 21:45:55,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=306760.0, ans=0.125 2024-09-17 21:46:02,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=306760.0, ans=0.0 2024-09-17 21:46:02,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=306760.0, ans=0.2 2024-09-17 21:46:05,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=306760.0, ans=0.2 2024-09-17 21:46:09,565 INFO [train.py:1198] (1/2) Epoch 17, batch 4300, loss[loss=0.2659, ctc_loss=0.1528, cr_loss=0.3939, attn_decoder_loss=0.2697, over 29524.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1434, cr_loss=0.3852, attn_decoder_loss=0.254, over 5796078.35 frames. ], batch size: 87, lr: 6.49e-03, grad_scale: 8.0 2024-09-17 21:46:10,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=306800.0, ans=0.125 2024-09-17 21:46:31,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=306840.0, ans=0.05 2024-09-17 21:46:45,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-09-17 21:46:47,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.94 vs. limit=22.5 2024-09-17 21:46:55,207 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:47:08,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=306960.0, ans=0.0 2024-09-17 21:47:16,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=306960.0, ans=0.125 2024-09-17 21:47:18,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=306960.0, ans=0.1 2024-09-17 21:47:25,584 INFO [train.py:1198] (1/2) Epoch 17, batch 4350, loss[loss=0.2602, ctc_loss=0.1515, cr_loss=0.4, attn_decoder_loss=0.2634, over 29486.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1458, cr_loss=0.39, attn_decoder_loss=0.2571, over 5798655.14 frames. ], batch size: 97, lr: 6.49e-03, grad_scale: 8.0 2024-09-17 21:47:25,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=307000.0, ans=0.125 2024-09-17 21:47:42,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=307040.0, ans=0.025 2024-09-17 21:47:43,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=307040.0, ans=0.125 2024-09-17 21:47:45,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-09-17 21:47:46,025 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.056e+01 9.451e+01 1.005e+02 2.709e+02, threshold=1.890e+02, percent-clipped=3.0 2024-09-17 21:47:52,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=307040.0, ans=0.2 2024-09-17 21:48:24,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=307160.0, ans=0.2 2024-09-17 21:48:30,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=307160.0, ans=0.125 2024-09-17 21:48:34,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=307160.0, ans=0.125 2024-09-17 21:48:39,024 INFO [train.py:1198] (1/2) Epoch 17, batch 4400, loss[loss=0.2599, ctc_loss=0.1501, cr_loss=0.3658, attn_decoder_loss=0.264, over 27120.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1477, cr_loss=0.3934, attn_decoder_loss=0.2594, over 5766936.15 frames. ], batch size: 124, lr: 6.49e-03, grad_scale: 16.0 2024-09-17 21:48:42,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=307200.0, ans=0.125 2024-09-17 21:48:57,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307240.0, ans=0.1 2024-09-17 21:49:03,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=307240.0, ans=0.2 2024-09-17 21:49:06,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=307240.0, ans=22.5 2024-09-17 21:49:26,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=307320.0, ans=0.0 2024-09-17 21:49:37,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=307360.0, ans=0.0 2024-09-17 21:49:43,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=307360.0, ans=0.125 2024-09-17 21:49:54,141 INFO [train.py:1198] (1/2) Epoch 17, batch 4450, loss[loss=0.2719, ctc_loss=0.1688, cr_loss=0.4145, attn_decoder_loss=0.2741, over 20321.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1524, cr_loss=0.3986, attn_decoder_loss=0.262, over 5574321.89 frames. ], batch size: 210, lr: 6.48e-03, grad_scale: 8.0 2024-09-17 21:49:58,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=22.5 2024-09-17 21:50:16,945 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.924e+01 9.297e+01 9.769e+01 1.205e+02 1.699e+02, threshold=1.954e+02, percent-clipped=0.0 2024-09-17 21:50:29,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=307480.0, ans=0.025 2024-09-17 21:50:34,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2024-09-17 21:50:38,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=307520.0, ans=0.125 2024-09-17 21:51:10,065 INFO [train.py:1198] (1/2) Epoch 17, batch 4500, loss[loss=0.2873, ctc_loss=0.1965, cr_loss=0.4461, attn_decoder_loss=0.2875, over 19754.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1583, cr_loss=0.4011, attn_decoder_loss=0.265, over 5235070.29 frames. ], batch size: 210, lr: 6.48e-03, grad_scale: 8.0 2024-09-17 21:51:14,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=307600.0, ans=0.125 2024-09-17 21:51:19,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=307600.0, ans=0.125 2024-09-17 21:51:24,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.75 vs. limit=15.0 2024-09-17 21:51:28,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=307640.0, ans=0.5 2024-09-17 21:51:33,314 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:51:39,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=307680.0, ans=0.125 2024-09-17 21:51:45,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-09-17 21:52:34,419 INFO [train.py:1198] (1/2) Epoch 18, batch 0, loss[loss=0.2279, ctc_loss=0.1244, cr_loss=0.3611, attn_decoder_loss=0.2313, over 29621.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1244, cr_loss=0.3611, attn_decoder_loss=0.2313, over 29621.00 frames. ], batch size: 73, lr: 6.29e-03, grad_scale: 16.0 2024-09-17 21:52:34,419 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 21:52:52,855 INFO [train.py:1230] (1/2) Epoch 18, validation: loss=0.2122, ctc_loss=0.03991, cr_loss=4.926e-15, attn_decoder_loss=0.2314, over 944034.00 frames. 2024-09-17 21:52:52,856 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 21:52:57,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=307700.0, ans=0.125 2024-09-17 21:53:12,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=307740.0, ans=0.125 2024-09-17 21:53:37,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=307780.0, ans=0.1 2024-09-17 21:53:56,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.012e+01 9.686e+01 1.126e+02 1.212e+02 3.801e+02, threshold=2.253e+02, percent-clipped=2.0 2024-09-17 21:54:10,448 INFO [train.py:1198] (1/2) Epoch 18, batch 50, loss[loss=0.2191, ctc_loss=0.1146, cr_loss=0.3401, attn_decoder_loss=0.2232, over 29473.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1459, cr_loss=0.3888, attn_decoder_loss=0.2548, over 1268906.64 frames. ], batch size: 70, lr: 6.29e-03, grad_scale: 8.0 2024-09-17 21:54:11,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.02 vs. limit=12.0 2024-09-17 21:54:13,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=307900.0, ans=0.125 2024-09-17 21:54:16,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=307900.0, ans=0.0 2024-09-17 21:54:27,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=307940.0, ans=0.0 2024-09-17 21:54:55,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=15.0 2024-09-17 21:54:57,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-17 21:55:02,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=308020.0, ans=0.2 2024-09-17 21:55:25,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=308100.0, ans=0.125 2024-09-17 21:55:26,559 INFO [train.py:1198] (1/2) Epoch 18, batch 100, loss[loss=0.2445, ctc_loss=0.1411, cr_loss=0.3801, attn_decoder_loss=0.2476, over 29546.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.147, cr_loss=0.3925, attn_decoder_loss=0.2574, over 2252356.58 frames. ], batch size: 76, lr: 6.29e-03, grad_scale: 8.0 2024-09-17 21:55:52,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=308140.0, ans=0.04949747468305833 2024-09-17 21:55:52,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=308140.0, ans=0.2 2024-09-17 21:56:01,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=308180.0, ans=0.07 2024-09-17 21:56:06,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=308180.0, ans=0.125 2024-09-17 21:56:30,070 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.618e+01 9.118e+01 9.635e+01 1.582e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-17 21:56:30,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=308260.0, ans=0.125 2024-09-17 21:56:43,557 INFO [train.py:1198] (1/2) Epoch 18, batch 150, loss[loss=0.2153, ctc_loss=0.1159, cr_loss=0.3412, attn_decoder_loss=0.2188, over 29407.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1436, cr_loss=0.3861, attn_decoder_loss=0.2543, over 3048509.20 frames. ], batch size: 70, lr: 6.29e-03, grad_scale: 8.0 2024-09-17 21:56:43,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=308300.0, ans=0.05 2024-09-17 21:57:28,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=308380.0, ans=0.04949747468305833 2024-09-17 21:57:40,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=308420.0, ans=0.125 2024-09-17 21:57:59,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=308500.0, ans=0.125 2024-09-17 21:58:01,110 INFO [train.py:1198] (1/2) Epoch 18, batch 200, loss[loss=0.2625, ctc_loss=0.1638, cr_loss=0.4227, attn_decoder_loss=0.264, over 27203.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1433, cr_loss=0.3865, attn_decoder_loss=0.2537, over 3660338.02 frames. ], batch size: 124, lr: 6.29e-03, grad_scale: 8.0 2024-09-17 21:58:16,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=308540.0, ans=0.125 2024-09-17 21:58:35,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=22.5 2024-09-17 21:58:55,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=308620.0, ans=0.1 2024-09-17 21:59:03,043 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.718e+01 9.535e+01 1.012e+02 1.370e+02, threshold=1.907e+02, percent-clipped=0.0 2024-09-17 21:59:09,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=308660.0, ans=0.125 2024-09-17 21:59:16,548 INFO [train.py:1198] (1/2) Epoch 18, batch 250, loss[loss=0.2616, ctc_loss=0.1481, cr_loss=0.3946, attn_decoder_loss=0.2654, over 29252.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.143, cr_loss=0.3863, attn_decoder_loss=0.2537, over 4142604.43 frames. ], batch size: 100, lr: 6.28e-03, grad_scale: 8.0 2024-09-17 21:59:24,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=308700.0, ans=0.125 2024-09-17 21:59:26,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=308700.0, ans=0.2 2024-09-17 21:59:33,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=308740.0, ans=0.125 2024-09-17 21:59:33,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=308740.0, ans=0.125 2024-09-17 22:00:31,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=308860.0, ans=0.09899494936611666 2024-09-17 22:00:35,431 INFO [train.py:1198] (1/2) Epoch 18, batch 300, loss[loss=0.2553, ctc_loss=0.1451, cr_loss=0.3944, attn_decoder_loss=0.2588, over 29542.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1428, cr_loss=0.3859, attn_decoder_loss=0.2534, over 4510851.58 frames. ], batch size: 92, lr: 6.28e-03, grad_scale: 8.0 2024-09-17 22:00:42,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=308900.0, ans=15.0 2024-09-17 22:01:18,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=308980.0, ans=0.0 2024-09-17 22:01:25,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=22.5 2024-09-17 22:01:39,753 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.552e+01 9.008e+01 9.448e+01 1.517e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-17 22:01:53,397 INFO [train.py:1198] (1/2) Epoch 18, batch 350, loss[loss=0.2208, ctc_loss=0.1153, cr_loss=0.3421, attn_decoder_loss=0.2249, over 29314.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1427, cr_loss=0.3864, attn_decoder_loss=0.2537, over 4796341.63 frames. ], batch size: 71, lr: 6.28e-03, grad_scale: 8.0 2024-09-17 22:01:53,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=309100.0, ans=0.05 2024-09-17 22:02:04,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=309100.0, ans=0.2 2024-09-17 22:02:17,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=309140.0, ans=0.0 2024-09-17 22:02:23,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=309180.0, ans=0.1 2024-09-17 22:02:44,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=309220.0, ans=0.0 2024-09-17 22:02:49,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.29 vs. limit=15.0 2024-09-17 22:02:50,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=309220.0, ans=0.0 2024-09-17 22:02:53,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=309260.0, ans=0.1 2024-09-17 22:03:08,711 INFO [train.py:1198] (1/2) Epoch 18, batch 400, loss[loss=0.2489, ctc_loss=0.1334, cr_loss=0.3733, attn_decoder_loss=0.2535, over 29735.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1423, cr_loss=0.386, attn_decoder_loss=0.2532, over 5025927.94 frames. ], batch size: 82, lr: 6.28e-03, grad_scale: 16.0 2024-09-17 22:03:09,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.97 vs. limit=22.5 2024-09-17 22:03:10,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=309300.0, ans=0.125 2024-09-17 22:03:15,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=309300.0, ans=0.125 2024-09-17 22:03:30,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=309340.0, ans=0.5 2024-09-17 22:03:39,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=309380.0, ans=0.125 2024-09-17 22:03:41,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2024-09-17 22:04:00,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=309420.0, ans=0.2 2024-09-17 22:04:14,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309460.0, ans=0.1 2024-09-17 22:04:15,317 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.825e+01 9.596e+01 1.056e+02 3.642e+02, threshold=1.919e+02, percent-clipped=2.0 2024-09-17 22:04:18,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=309460.0, ans=0.05 2024-09-17 22:04:20,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=309460.0, ans=0.125 2024-09-17 22:04:27,498 INFO [train.py:1198] (1/2) Epoch 18, batch 450, loss[loss=0.2502, ctc_loss=0.1358, cr_loss=0.3674, attn_decoder_loss=0.2548, over 29695.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1422, cr_loss=0.385, attn_decoder_loss=0.2531, over 5188270.06 frames. ], batch size: 83, lr: 6.28e-03, grad_scale: 8.0 2024-09-17 22:04:29,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=309500.0, ans=0.125 2024-09-17 22:04:44,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=309540.0, ans=0.0 2024-09-17 22:04:51,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=309540.0, ans=0.125 2024-09-17 22:04:59,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=309580.0, ans=0.125 2024-09-17 22:05:11,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=309580.0, ans=0.1 2024-09-17 22:05:13,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=309580.0, ans=0.125 2024-09-17 22:05:16,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=309620.0, ans=0.025 2024-09-17 22:05:27,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=309620.0, ans=0.125 2024-09-17 22:05:33,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=309660.0, ans=0.125 2024-09-17 22:05:42,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.51 vs. limit=12.0 2024-09-17 22:05:45,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.60 vs. limit=15.0 2024-09-17 22:05:46,475 INFO [train.py:1198] (1/2) Epoch 18, batch 500, loss[loss=0.2626, ctc_loss=0.1439, cr_loss=0.4062, attn_decoder_loss=0.2668, over 29441.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1416, cr_loss=0.3847, attn_decoder_loss=0.2525, over 5331125.46 frames. ], batch size: 94, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:05:57,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=309700.0, ans=0.025 2024-09-17 22:06:13,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.58 vs. limit=15.0 2024-09-17 22:06:19,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=309780.0, ans=0.125 2024-09-17 22:06:19,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309780.0, ans=0.1 2024-09-17 22:06:29,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309780.0, ans=0.1 2024-09-17 22:06:33,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=309820.0, ans=0.2 2024-09-17 22:06:35,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-09-17 22:06:44,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=309820.0, ans=0.125 2024-09-17 22:06:48,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=309860.0, ans=0.125 2024-09-17 22:06:50,089 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.497e+01 9.185e+01 1.006e+02 4.777e+02, threshold=1.837e+02, percent-clipped=3.0 2024-09-17 22:07:01,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=309900.0, ans=0.1 2024-09-17 22:07:02,204 INFO [train.py:1198] (1/2) Epoch 18, batch 550, loss[loss=0.2633, ctc_loss=0.1524, cr_loss=0.4074, attn_decoder_loss=0.2666, over 28853.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1413, cr_loss=0.3838, attn_decoder_loss=0.2525, over 5423934.93 frames. ], batch size: 104, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:07:04,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=309900.0, ans=0.125 2024-09-17 22:07:07,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=309900.0, ans=0.125 2024-09-17 22:07:08,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=309900.0, ans=0.1 2024-09-17 22:07:35,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=309980.0, ans=0.0 2024-09-17 22:08:12,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=310060.0, ans=12.0 2024-09-17 22:08:13,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=310060.0, ans=0.035 2024-09-17 22:08:17,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=310060.0, ans=0.125 2024-09-17 22:08:19,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=310100.0, ans=0.0 2024-09-17 22:08:20,548 INFO [train.py:1198] (1/2) Epoch 18, batch 600, loss[loss=0.2622, ctc_loss=0.1523, cr_loss=0.3967, attn_decoder_loss=0.2656, over 29218.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1414, cr_loss=0.3841, attn_decoder_loss=0.2529, over 5511555.18 frames. ], batch size: 100, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:08:28,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=310100.0, ans=0.0 2024-09-17 22:08:29,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=310100.0, ans=0.2 2024-09-17 22:08:38,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=310140.0, ans=0.0 2024-09-17 22:08:40,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=310140.0, ans=0.125 2024-09-17 22:09:02,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=310180.0, ans=0.05 2024-09-17 22:09:25,967 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.500e+01 9.114e+01 9.640e+01 1.427e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-17 22:09:38,130 INFO [train.py:1198] (1/2) Epoch 18, batch 650, loss[loss=0.2465, ctc_loss=0.1388, cr_loss=0.387, attn_decoder_loss=0.2498, over 29734.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1407, cr_loss=0.3829, attn_decoder_loss=0.252, over 5588592.75 frames. ], batch size: 81, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:09:56,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310340.0, ans=0.1 2024-09-17 22:09:58,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310340.0, ans=0.1 2024-09-17 22:10:15,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.46 vs. limit=15.0 2024-09-17 22:10:32,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.04 vs. limit=22.5 2024-09-17 22:10:36,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=310420.0, ans=0.0 2024-09-17 22:10:54,162 INFO [train.py:1198] (1/2) Epoch 18, batch 700, loss[loss=0.2357, ctc_loss=0.1332, cr_loss=0.3745, attn_decoder_loss=0.2388, over 29540.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1414, cr_loss=0.384, attn_decoder_loss=0.2527, over 5639186.27 frames. ], batch size: 76, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:11:09,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=310540.0, ans=0.07 2024-09-17 22:11:16,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=310540.0, ans=0.125 2024-09-17 22:11:39,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=310620.0, ans=0.125 2024-09-17 22:11:57,450 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.847e+01 9.240e+01 9.883e+01 4.255e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-17 22:12:00,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.78 vs. limit=15.0 2024-09-17 22:12:11,909 INFO [train.py:1198] (1/2) Epoch 18, batch 750, loss[loss=0.2609, ctc_loss=0.1509, cr_loss=0.3984, attn_decoder_loss=0.2643, over 29720.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1414, cr_loss=0.3836, attn_decoder_loss=0.2526, over 5678215.49 frames. ], batch size: 82, lr: 6.26e-03, grad_scale: 8.0 2024-09-17 22:12:31,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=310740.0, ans=0.0 2024-09-17 22:13:08,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=310820.0, ans=0.015 2024-09-17 22:13:11,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=310820.0, ans=0.125 2024-09-17 22:13:19,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2024-09-17 22:13:23,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=310860.0, ans=0.125 2024-09-17 22:13:29,310 INFO [train.py:1198] (1/2) Epoch 18, batch 800, loss[loss=0.2173, ctc_loss=0.1131, cr_loss=0.3287, attn_decoder_loss=0.2216, over 29598.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1412, cr_loss=0.3834, attn_decoder_loss=0.2522, over 5706743.57 frames. ], batch size: 73, lr: 6.26e-03, grad_scale: 16.0 2024-09-17 22:14:34,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.772e+01 9.230e+01 9.952e+01 3.129e+02, threshold=1.846e+02, percent-clipped=1.0 2024-09-17 22:14:44,960 INFO [train.py:1198] (1/2) Epoch 18, batch 850, loss[loss=0.2666, ctc_loss=0.1523, cr_loss=0.4101, attn_decoder_loss=0.2702, over 29703.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1406, cr_loss=0.3823, attn_decoder_loss=0.2518, over 5735837.18 frames. ], batch size: 89, lr: 6.26e-03, grad_scale: 8.0 2024-09-17 22:14:58,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=311140.0, ans=0.1 2024-09-17 22:15:13,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=311180.0, ans=0.04949747468305833 2024-09-17 22:15:16,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=311180.0, ans=0.125 2024-09-17 22:15:55,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=311260.0, ans=0.125 2024-09-17 22:16:03,544 INFO [train.py:1198] (1/2) Epoch 18, batch 900, loss[loss=0.2306, ctc_loss=0.1252, cr_loss=0.3539, attn_decoder_loss=0.2345, over 29599.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1411, cr_loss=0.3833, attn_decoder_loss=0.2522, over 5740743.28 frames. ], batch size: 73, lr: 6.26e-03, grad_scale: 8.0 2024-09-17 22:16:15,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=311300.0, ans=0.125 2024-09-17 22:16:18,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=311340.0, ans=0.1 2024-09-17 22:16:23,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=311340.0, ans=0.2 2024-09-17 22:16:29,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-09-17 22:16:41,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=311380.0, ans=0.125 2024-09-17 22:17:10,605 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.926e+01 9.503e+01 1.103e+02 2.746e+02, threshold=1.901e+02, percent-clipped=1.0 2024-09-17 22:17:21,151 INFO [train.py:1198] (1/2) Epoch 18, batch 950, loss[loss=0.2364, ctc_loss=0.1351, cr_loss=0.3757, attn_decoder_loss=0.2393, over 29509.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1413, cr_loss=0.3837, attn_decoder_loss=0.2524, over 5744103.78 frames. ], batch size: 74, lr: 6.26e-03, grad_scale: 8.0 2024-09-17 22:17:29,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.62 vs. limit=15.0 2024-09-17 22:17:30,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=311500.0, ans=0.0 2024-09-17 22:17:32,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2024-09-17 22:17:45,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=311540.0, ans=0.2 2024-09-17 22:18:14,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=311620.0, ans=0.0 2024-09-17 22:18:17,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.99 vs. limit=15.0 2024-09-17 22:18:21,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=311660.0, ans=0.0 2024-09-17 22:18:21,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=311660.0, ans=0.0 2024-09-17 22:18:26,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=311660.0, ans=0.035 2024-09-17 22:18:36,550 INFO [train.py:1198] (1/2) Epoch 18, batch 1000, loss[loss=0.2398, ctc_loss=0.1379, cr_loss=0.3698, attn_decoder_loss=0.2429, over 29512.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1425, cr_loss=0.3854, attn_decoder_loss=0.2534, over 5737884.16 frames. ], batch size: 77, lr: 6.25e-03, grad_scale: 8.0 2024-09-17 22:18:47,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=311700.0, ans=0.0 2024-09-17 22:18:48,962 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:18:51,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=311740.0, ans=0.0 2024-09-17 22:18:52,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=311740.0, ans=0.025 2024-09-17 22:18:59,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=311740.0, ans=0.1 2024-09-17 22:19:26,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=311820.0, ans=0.125 2024-09-17 22:19:37,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=311860.0, ans=0.0 2024-09-17 22:19:41,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.748e+01 9.283e+01 1.021e+02 2.281e+02, threshold=1.857e+02, percent-clipped=1.0 2024-09-17 22:19:49,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=311860.0, ans=0.04949747468305833 2024-09-17 22:19:51,944 INFO [train.py:1198] (1/2) Epoch 18, batch 1050, loss[loss=0.2467, ctc_loss=0.1376, cr_loss=0.3853, attn_decoder_loss=0.2503, over 29659.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1418, cr_loss=0.3844, attn_decoder_loss=0.2525, over 5744902.10 frames. ], batch size: 85, lr: 6.25e-03, grad_scale: 8.0 2024-09-17 22:20:08,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=311940.0, ans=0.125 2024-09-17 22:20:14,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.02 vs. limit=15.0 2024-09-17 22:20:16,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=311940.0, ans=0.125 2024-09-17 22:20:51,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.49 vs. limit=15.0 2024-09-17 22:20:56,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=312060.0, ans=0.125 2024-09-17 22:21:08,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=312060.0, ans=0.5 2024-09-17 22:21:13,197 INFO [train.py:1198] (1/2) Epoch 18, batch 1100, loss[loss=0.2688, ctc_loss=0.1682, cr_loss=0.4193, attn_decoder_loss=0.2706, over 29457.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1414, cr_loss=0.3836, attn_decoder_loss=0.2521, over 5756834.93 frames. ], batch size: 78, lr: 6.25e-03, grad_scale: 8.0 2024-09-17 22:21:13,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=312100.0, ans=0.02 2024-09-17 22:21:20,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=312100.0, ans=0.0 2024-09-17 22:21:22,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=312100.0, ans=0.0 2024-09-17 22:21:27,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=312140.0, ans=0.125 2024-09-17 22:21:30,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=312140.0, ans=0.2 2024-09-17 22:22:03,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=312220.0, ans=0.125 2024-09-17 22:22:03,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=312220.0, ans=0.1 2024-09-17 22:22:18,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.510e+01 9.386e+01 9.841e+01 2.672e+02, threshold=1.877e+02, percent-clipped=2.0 2024-09-17 22:22:28,971 INFO [train.py:1198] (1/2) Epoch 18, batch 1150, loss[loss=0.2335, ctc_loss=0.1315, cr_loss=0.3597, attn_decoder_loss=0.2369, over 29433.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1418, cr_loss=0.384, attn_decoder_loss=0.2526, over 5755834.76 frames. ], batch size: 78, lr: 6.25e-03, grad_scale: 8.0 2024-09-17 22:22:32,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=312300.0, ans=0.125 2024-09-17 22:22:36,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.67 vs. limit=10.0 2024-09-17 22:22:47,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=312340.0, ans=0.125 2024-09-17 22:22:58,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.95 vs. limit=22.5 2024-09-17 22:22:59,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=312380.0, ans=0.04949747468305833 2024-09-17 22:23:12,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=312380.0, ans=0.0 2024-09-17 22:23:44,699 INFO [train.py:1198] (1/2) Epoch 18, batch 1200, loss[loss=0.2487, ctc_loss=0.1319, cr_loss=0.359, attn_decoder_loss=0.2537, over 29681.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1426, cr_loss=0.3849, attn_decoder_loss=0.2534, over 5746765.61 frames. ], batch size: 85, lr: 6.25e-03, grad_scale: 16.0 2024-09-17 22:23:46,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-17 22:24:18,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-09-17 22:24:21,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=312580.0, ans=0.125 2024-09-17 22:24:24,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.46 vs. limit=22.5 2024-09-17 22:24:26,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=312580.0, ans=0.2 2024-09-17 22:24:31,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=312580.0, ans=0.2 2024-09-17 22:24:39,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=312620.0, ans=0.1 2024-09-17 22:24:44,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=312620.0, ans=0.07 2024-09-17 22:24:44,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2024-09-17 22:24:44,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-09-17 22:24:55,872 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 9.002e+01 9.543e+01 1.051e+02 1.930e+02, threshold=1.909e+02, percent-clipped=1.0 2024-09-17 22:25:04,860 INFO [train.py:1198] (1/2) Epoch 18, batch 1250, loss[loss=0.2596, ctc_loss=0.149, cr_loss=0.3783, attn_decoder_loss=0.2635, over 29519.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.143, cr_loss=0.3867, attn_decoder_loss=0.2539, over 5774830.40 frames. ], batch size: 92, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:25:48,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2024-09-17 22:25:54,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=312820.0, ans=0.125 2024-09-17 22:25:57,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=312820.0, ans=0.1 2024-09-17 22:26:20,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=312900.0, ans=0.0 2024-09-17 22:26:21,430 INFO [train.py:1198] (1/2) Epoch 18, batch 1300, loss[loss=0.2516, ctc_loss=0.1397, cr_loss=0.3813, attn_decoder_loss=0.2555, over 28637.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1421, cr_loss=0.3849, attn_decoder_loss=0.253, over 5780381.28 frames. ], batch size: 112, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:26:26,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-09-17 22:27:04,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=312980.0, ans=0.1 2024-09-17 22:27:07,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=313020.0, ans=0.125 2024-09-17 22:27:25,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=313060.0, ans=0.2 2024-09-17 22:27:28,522 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.616e+01 9.113e+01 9.632e+01 1.228e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-17 22:27:37,670 INFO [train.py:1198] (1/2) Epoch 18, batch 1350, loss[loss=0.2447, ctc_loss=0.1366, cr_loss=0.3753, attn_decoder_loss=0.2484, over 29747.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1421, cr_loss=0.3856, attn_decoder_loss=0.253, over 5795291.99 frames. ], batch size: 81, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:27:50,221 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:28:20,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=313180.0, ans=0.0 2024-09-17 22:28:25,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.37 vs. limit=22.5 2024-09-17 22:28:57,962 INFO [train.py:1198] (1/2) Epoch 18, batch 1400, loss[loss=0.2203, ctc_loss=0.1203, cr_loss=0.3453, attn_decoder_loss=0.2238, over 29583.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1416, cr_loss=0.3846, attn_decoder_loss=0.2525, over 5806550.19 frames. ], batch size: 69, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:29:29,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=313380.0, ans=0.0 2024-09-17 22:29:31,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2024-09-17 22:29:33,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=313380.0, ans=0.125 2024-09-17 22:29:34,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=313380.0, ans=0.0 2024-09-17 22:29:34,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=313380.0, ans=0.0 2024-09-17 22:29:45,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=313420.0, ans=0.125 2024-09-17 22:30:04,782 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.593e+01 9.088e+01 9.649e+01 1.870e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-17 22:30:05,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=313460.0, ans=0.04949747468305833 2024-09-17 22:30:13,973 INFO [train.py:1198] (1/2) Epoch 18, batch 1450, loss[loss=0.2637, ctc_loss=0.1492, cr_loss=0.3949, attn_decoder_loss=0.2677, over 29464.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1419, cr_loss=0.3852, attn_decoder_loss=0.253, over 5801877.81 frames. ], batch size: 94, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:30:15,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=313500.0, ans=0.125 2024-09-17 22:30:18,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=313500.0, ans=0.05 2024-09-17 22:30:22,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.96 vs. limit=6.0 2024-09-17 22:30:27,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=313540.0, ans=0.025 2024-09-17 22:30:31,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=313540.0, ans=0.0 2024-09-17 22:30:35,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=313540.0, ans=0.0 2024-09-17 22:30:38,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=313540.0, ans=0.125 2024-09-17 22:30:38,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=313540.0, ans=0.0 2024-09-17 22:30:48,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=313580.0, ans=6.0 2024-09-17 22:31:07,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=313620.0, ans=0.0 2024-09-17 22:31:16,790 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:31:30,413 INFO [train.py:1198] (1/2) Epoch 18, batch 1500, loss[loss=0.272, ctc_loss=0.1606, cr_loss=0.4254, attn_decoder_loss=0.2749, over 29634.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1423, cr_loss=0.3853, attn_decoder_loss=0.2535, over 5804347.19 frames. ], batch size: 86, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:31:44,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=313740.0, ans=0.125 2024-09-17 22:32:08,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=313780.0, ans=0.0 2024-09-17 22:32:22,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=313820.0, ans=0.125 2024-09-17 22:32:42,104 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.582e+01 9.289e+01 9.829e+01 2.134e+02, threshold=1.858e+02, percent-clipped=2.0 2024-09-17 22:32:42,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=313860.0, ans=0.125 2024-09-17 22:32:51,077 INFO [train.py:1198] (1/2) Epoch 18, batch 1550, loss[loss=0.2513, ctc_loss=0.1442, cr_loss=0.3917, attn_decoder_loss=0.2545, over 29536.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1426, cr_loss=0.3859, attn_decoder_loss=0.2536, over 5780934.11 frames. ], batch size: 90, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:32:56,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-09-17 22:33:00,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=313900.0, ans=0.0 2024-09-17 22:33:29,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313980.0, ans=0.1 2024-09-17 22:33:38,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=314020.0, ans=0.1 2024-09-17 22:33:59,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=314060.0, ans=0.0 2024-09-17 22:34:06,769 INFO [train.py:1198] (1/2) Epoch 18, batch 1600, loss[loss=0.2541, ctc_loss=0.1405, cr_loss=0.3709, attn_decoder_loss=0.2585, over 29664.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.143, cr_loss=0.3861, attn_decoder_loss=0.2536, over 5763002.25 frames. ], batch size: 85, lr: 6.23e-03, grad_scale: 16.0 2024-09-17 22:34:09,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=314100.0, ans=0.035 2024-09-17 22:34:11,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314100.0, ans=0.1 2024-09-17 22:34:31,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314140.0, ans=0.1 2024-09-17 22:34:31,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=314140.0, ans=0.125 2024-09-17 22:34:32,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=314140.0, ans=0.125 2024-09-17 22:35:00,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=22.5 2024-09-17 22:35:14,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.89 vs. limit=15.0 2024-09-17 22:35:14,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-17 22:35:14,876 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.911e+01 9.926e+01 1.138e+02 4.601e+02, threshold=1.985e+02, percent-clipped=5.0 2024-09-17 22:35:22,532 INFO [train.py:1198] (1/2) Epoch 18, batch 1650, loss[loss=0.2673, ctc_loss=0.1499, cr_loss=0.4073, attn_decoder_loss=0.2713, over 29737.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1429, cr_loss=0.3861, attn_decoder_loss=0.2536, over 5759318.20 frames. ], batch size: 89, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:35:36,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-09-17 22:35:45,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=314340.0, ans=0.0 2024-09-17 22:35:50,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=314340.0, ans=0.125 2024-09-17 22:35:55,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=314380.0, ans=0.0 2024-09-17 22:35:58,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=314380.0, ans=0.125 2024-09-17 22:36:05,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=314380.0, ans=0.125 2024-09-17 22:36:06,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=314380.0, ans=0.0 2024-09-17 22:36:38,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=314460.0, ans=0.0 2024-09-17 22:36:40,862 INFO [train.py:1198] (1/2) Epoch 18, batch 1700, loss[loss=0.2164, ctc_loss=0.116, cr_loss=0.3265, attn_decoder_loss=0.2203, over 29553.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1422, cr_loss=0.3851, attn_decoder_loss=0.2531, over 5780144.60 frames. ], batch size: 69, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:36:48,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=314500.0, ans=0.125 2024-09-17 22:36:51,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314500.0, ans=0.1 2024-09-17 22:37:05,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=314540.0, ans=0.05 2024-09-17 22:37:13,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=314580.0, ans=0.125 2024-09-17 22:37:38,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=314620.0, ans=0.125 2024-09-17 22:37:49,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.791e+01 8.699e+01 9.289e+01 9.898e+01 1.574e+02, threshold=1.858e+02, percent-clipped=0.0 2024-09-17 22:37:49,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=314660.0, ans=0.125 2024-09-17 22:37:56,799 INFO [train.py:1198] (1/2) Epoch 18, batch 1750, loss[loss=0.2137, ctc_loss=0.1092, cr_loss=0.3274, attn_decoder_loss=0.218, over 29374.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1417, cr_loss=0.3844, attn_decoder_loss=0.2527, over 5788774.12 frames. ], batch size: 67, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:38:07,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314700.0, ans=0.1 2024-09-17 22:38:07,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=314700.0, ans=0.0 2024-09-17 22:38:13,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=314740.0, ans=0.025 2024-09-17 22:38:19,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=314740.0, ans=0.125 2024-09-17 22:38:36,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=314780.0, ans=0.125 2024-09-17 22:38:39,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=314780.0, ans=0.125 2024-09-17 22:38:44,132 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:38:59,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=314860.0, ans=0.1 2024-09-17 22:39:10,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=314900.0, ans=0.125 2024-09-17 22:39:12,152 INFO [train.py:1198] (1/2) Epoch 18, batch 1800, loss[loss=0.2599, ctc_loss=0.1468, cr_loss=0.3896, attn_decoder_loss=0.2638, over 29684.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1422, cr_loss=0.3857, attn_decoder_loss=0.2532, over 5790736.65 frames. ], batch size: 83, lr: 6.22e-03, grad_scale: 8.0 2024-09-17 22:39:12,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=314900.0, ans=0.125 2024-09-17 22:39:54,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=314980.0, ans=0.1 2024-09-17 22:40:11,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=315020.0, ans=0.0 2024-09-17 22:40:17,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=315060.0, ans=0.1 2024-09-17 22:40:24,496 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.622e+01 9.178e+01 9.904e+01 1.304e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-17 22:40:27,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-09-17 22:40:28,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.59 vs. limit=15.0 2024-09-17 22:40:30,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2024-09-17 22:40:32,251 INFO [train.py:1198] (1/2) Epoch 18, batch 1850, loss[loss=0.2659, ctc_loss=0.1562, cr_loss=0.4293, attn_decoder_loss=0.2685, over 29630.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1418, cr_loss=0.3855, attn_decoder_loss=0.2529, over 5796032.73 frames. ], batch size: 86, lr: 6.22e-03, grad_scale: 8.0 2024-09-17 22:40:37,033 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:40:44,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-09-17 22:40:53,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=315140.0, ans=0.125 2024-09-17 22:41:02,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=315180.0, ans=0.125 2024-09-17 22:41:28,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=315220.0, ans=0.0 2024-09-17 22:41:28,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=315220.0, ans=0.125 2024-09-17 22:41:48,285 INFO [train.py:1198] (1/2) Epoch 18, batch 1900, loss[loss=0.2614, ctc_loss=0.1536, cr_loss=0.4035, attn_decoder_loss=0.2644, over 29693.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1427, cr_loss=0.3865, attn_decoder_loss=0.2539, over 5803499.03 frames. ], batch size: 89, lr: 6.22e-03, grad_scale: 8.0 2024-09-17 22:42:43,240 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:42:56,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.839e+01 9.337e+01 9.876e+01 1.224e+02, threshold=1.867e+02, percent-clipped=0.0 2024-09-17 22:43:04,086 INFO [train.py:1198] (1/2) Epoch 18, batch 1950, loss[loss=0.2506, ctc_loss=0.1432, cr_loss=0.3958, attn_decoder_loss=0.2537, over 29427.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1432, cr_loss=0.3881, attn_decoder_loss=0.2548, over 5818515.02 frames. ], batch size: 78, lr: 6.22e-03, grad_scale: 8.0 2024-09-17 22:43:13,651 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:43:20,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.20 vs. limit=10.0 2024-09-17 22:43:41,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=315580.0, ans=0.0 2024-09-17 22:44:03,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=315620.0, ans=0.125 2024-09-17 22:44:23,104 INFO [train.py:1198] (1/2) Epoch 18, batch 2000, loss[loss=0.2206, ctc_loss=0.1229, cr_loss=0.3404, attn_decoder_loss=0.2238, over 29366.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1436, cr_loss=0.3882, attn_decoder_loss=0.255, over 5796999.11 frames. ], batch size: 67, lr: 6.22e-03, grad_scale: 16.0 2024-09-17 22:44:23,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=315700.0, ans=6.0 2024-09-17 22:44:31,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=315700.0, ans=0.025 2024-09-17 22:44:32,703 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:45:00,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=315780.0, ans=0.025 2024-09-17 22:45:03,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2024-09-17 22:45:05,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-09-17 22:45:11,325 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-09-17 22:45:27,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=315860.0, ans=0.025 2024-09-17 22:45:31,338 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.333e+01 8.725e+01 9.284e+01 1.004e+02 4.329e+02, threshold=1.857e+02, percent-clipped=1.0 2024-09-17 22:45:38,903 INFO [train.py:1198] (1/2) Epoch 18, batch 2050, loss[loss=0.2265, ctc_loss=0.124, cr_loss=0.3415, attn_decoder_loss=0.2303, over 29470.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1429, cr_loss=0.3864, attn_decoder_loss=0.2539, over 5788999.40 frames. ], batch size: 70, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:46:23,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=316020.0, ans=0.125 2024-09-17 22:46:23,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=316020.0, ans=0.2 2024-09-17 22:46:37,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=316020.0, ans=0.2 2024-09-17 22:46:43,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.59 vs. limit=10.0 2024-09-17 22:46:54,760 INFO [train.py:1198] (1/2) Epoch 18, batch 2100, loss[loss=0.2568, ctc_loss=0.1467, cr_loss=0.4019, attn_decoder_loss=0.2601, over 29770.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1422, cr_loss=0.3852, attn_decoder_loss=0.2533, over 5800811.32 frames. ], batch size: 81, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:47:16,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=316140.0, ans=0.2 2024-09-17 22:47:28,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=316180.0, ans=0.1 2024-09-17 22:48:08,069 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.905e+01 8.595e+01 9.065e+01 9.620e+01 1.690e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-17 22:48:14,162 INFO [train.py:1198] (1/2) Epoch 18, batch 2150, loss[loss=0.256, ctc_loss=0.1528, cr_loss=0.4027, attn_decoder_loss=0.2586, over 29436.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1415, cr_loss=0.3839, attn_decoder_loss=0.2524, over 5816389.60 frames. ], batch size: 78, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:48:25,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-09-17 22:48:55,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=316380.0, ans=0.125 2024-09-17 22:49:02,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=316420.0, ans=0.0 2024-09-17 22:49:04,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=316420.0, ans=0.0 2024-09-17 22:49:07,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=316420.0, ans=0.0 2024-09-17 22:49:19,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=316460.0, ans=0.04949747468305833 2024-09-17 22:49:21,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=316460.0, ans=0.125 2024-09-17 22:49:29,883 INFO [train.py:1198] (1/2) Epoch 18, batch 2200, loss[loss=0.2542, ctc_loss=0.135, cr_loss=0.3709, attn_decoder_loss=0.2593, over 29607.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1416, cr_loss=0.384, attn_decoder_loss=0.2527, over 5811698.57 frames. ], batch size: 86, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:50:03,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2024-09-17 22:50:09,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=316580.0, ans=0.125 2024-09-17 22:50:18,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=316620.0, ans=0.2 2024-09-17 22:50:18,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=316620.0, ans=0.07 2024-09-17 22:50:39,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.790e+01 9.365e+01 1.003e+02 3.289e+02, threshold=1.873e+02, percent-clipped=2.0 2024-09-17 22:50:45,435 INFO [train.py:1198] (1/2) Epoch 18, batch 2250, loss[loss=0.2552, ctc_loss=0.1436, cr_loss=0.3979, attn_decoder_loss=0.2587, over 29712.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1416, cr_loss=0.3841, attn_decoder_loss=0.2527, over 5810344.63 frames. ], batch size: 82, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:50:56,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=316700.0, ans=0.0 2024-09-17 22:50:57,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=316700.0, ans=0.125 2024-09-17 22:51:03,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=316740.0, ans=0.125 2024-09-17 22:51:47,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=316820.0, ans=0.125 2024-09-17 22:52:02,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316860.0, ans=0.1 2024-09-17 22:52:05,589 INFO [train.py:1198] (1/2) Epoch 18, batch 2300, loss[loss=0.2228, ctc_loss=0.1142, cr_loss=0.3404, attn_decoder_loss=0.2274, over 29337.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1406, cr_loss=0.3821, attn_decoder_loss=0.2516, over 5798754.99 frames. ], batch size: 71, lr: 6.20e-03, grad_scale: 8.0 2024-09-17 22:52:08,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=316900.0, ans=0.125 2024-09-17 22:52:16,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316900.0, ans=0.1 2024-09-17 22:52:25,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=316940.0, ans=0.025 2024-09-17 22:52:30,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.23 vs. limit=15.0 2024-09-17 22:52:41,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-17 22:53:05,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.42 vs. limit=22.5 2024-09-17 22:53:15,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.549e+01 8.919e+01 9.975e+01 1.965e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-17 22:53:21,367 INFO [train.py:1198] (1/2) Epoch 18, batch 2350, loss[loss=0.2508, ctc_loss=0.1362, cr_loss=0.3727, attn_decoder_loss=0.2553, over 29698.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1406, cr_loss=0.3821, attn_decoder_loss=0.2517, over 5803470.23 frames. ], batch size: 83, lr: 6.20e-03, grad_scale: 8.0 2024-09-17 22:53:23,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317100.0, ans=0.1 2024-09-17 22:53:35,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=317140.0, ans=0.0 2024-09-17 22:53:38,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=317140.0, ans=0.125 2024-09-17 22:53:54,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=317180.0, ans=0.125 2024-09-17 22:53:56,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=317180.0, ans=0.09899494936611666 2024-09-17 22:53:57,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=35.40 vs. limit=22.5 2024-09-17 22:53:57,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=317180.0, ans=0.125 2024-09-17 22:54:12,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=317220.0, ans=0.07 2024-09-17 22:54:37,095 INFO [train.py:1198] (1/2) Epoch 18, batch 2400, loss[loss=0.2488, ctc_loss=0.1443, cr_loss=0.3953, attn_decoder_loss=0.2516, over 29510.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1412, cr_loss=0.383, attn_decoder_loss=0.2523, over 5807236.49 frames. ], batch size: 76, lr: 6.20e-03, grad_scale: 16.0 2024-09-17 22:54:39,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.67 vs. limit=15.0 2024-09-17 22:54:43,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=317300.0, ans=0.125 2024-09-17 22:54:55,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=317340.0, ans=0.0 2024-09-17 22:54:58,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317340.0, ans=0.1 2024-09-17 22:54:58,653 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:55:01,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=317340.0, ans=0.125 2024-09-17 22:55:52,493 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.608e+01 9.226e+01 9.922e+01 2.120e+02, threshold=1.845e+02, percent-clipped=2.0 2024-09-17 22:55:57,160 INFO [train.py:1198] (1/2) Epoch 18, batch 2450, loss[loss=0.2672, ctc_loss=0.1648, cr_loss=0.4324, attn_decoder_loss=0.269, over 29703.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1417, cr_loss=0.3838, attn_decoder_loss=0.2532, over 5784168.36 frames. ], batch size: 82, lr: 6.20e-03, grad_scale: 8.0 2024-09-17 22:56:01,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=317500.0, ans=0.125 2024-09-17 22:56:06,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=317500.0, ans=0.125 2024-09-17 22:56:27,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=317580.0, ans=0.125 2024-09-17 22:56:47,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=317620.0, ans=0.0 2024-09-17 22:56:58,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=317660.0, ans=0.0 2024-09-17 22:57:07,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=317660.0, ans=0.0 2024-09-17 22:57:13,354 INFO [train.py:1198] (1/2) Epoch 18, batch 2500, loss[loss=0.2606, ctc_loss=0.1449, cr_loss=0.3792, attn_decoder_loss=0.265, over 29613.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1419, cr_loss=0.3841, attn_decoder_loss=0.2531, over 5794590.06 frames. ], batch size: 86, lr: 6.20e-03, grad_scale: 8.0 2024-09-17 22:57:54,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=317780.0, ans=0.125 2024-09-17 22:57:59,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=317820.0, ans=0.125 2024-09-17 22:58:08,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=317820.0, ans=0.125 2024-09-17 22:58:14,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317860.0, ans=0.1 2024-09-17 22:58:21,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=317860.0, ans=0.0 2024-09-17 22:58:24,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.741e+01 9.201e+01 9.902e+01 1.726e+02, threshold=1.840e+02, percent-clipped=0.0 2024-09-17 22:58:28,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2024-09-17 22:58:29,147 INFO [train.py:1198] (1/2) Epoch 18, batch 2550, loss[loss=0.225, ctc_loss=0.127, cr_loss=0.3528, attn_decoder_loss=0.2281, over 29338.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1417, cr_loss=0.3843, attn_decoder_loss=0.2532, over 5798041.64 frames. ], batch size: 67, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 22:58:29,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=317900.0, ans=0.0 2024-09-17 22:58:45,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=317940.0, ans=0.125 2024-09-17 22:59:06,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=317980.0, ans=0.2 2024-09-17 22:59:09,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317980.0, ans=0.1 2024-09-17 22:59:14,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-09-17 22:59:33,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-17 22:59:49,434 INFO [train.py:1198] (1/2) Epoch 18, batch 2600, loss[loss=0.2484, ctc_loss=0.1353, cr_loss=0.3549, attn_decoder_loss=0.253, over 29420.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1422, cr_loss=0.3855, attn_decoder_loss=0.2537, over 5795298.07 frames. ], batch size: 78, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 22:59:50,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.08 vs. limit=12.0 2024-09-17 22:59:52,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=8.0 2024-09-17 23:00:33,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=318220.0, ans=0.0 2024-09-17 23:00:33,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=318220.0, ans=0.1 2024-09-17 23:00:38,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=318220.0, ans=0.0 2024-09-17 23:00:45,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=318220.0, ans=0.125 2024-09-17 23:00:57,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=318260.0, ans=0.125 2024-09-17 23:01:00,205 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.599e+01 9.133e+01 9.930e+01 1.773e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-17 23:01:04,598 INFO [train.py:1198] (1/2) Epoch 18, batch 2650, loss[loss=0.2625, ctc_loss=0.151, cr_loss=0.3897, attn_decoder_loss=0.2662, over 29291.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1422, cr_loss=0.3851, attn_decoder_loss=0.2539, over 5800742.64 frames. ], batch size: 100, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 23:01:47,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=318380.0, ans=0.0 2024-09-17 23:01:49,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2024-09-17 23:02:04,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=318460.0, ans=0.0 2024-09-17 23:02:05,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2024-09-17 23:02:19,080 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:02:20,280 INFO [train.py:1198] (1/2) Epoch 18, batch 2700, loss[loss=0.2511, ctc_loss=0.1376, cr_loss=0.3915, attn_decoder_loss=0.255, over 29539.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1424, cr_loss=0.3853, attn_decoder_loss=0.2541, over 5796436.19 frames. ], batch size: 87, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 23:02:37,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=318540.0, ans=0.125 2024-09-17 23:02:42,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=318540.0, ans=22.5 2024-09-17 23:02:46,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=318540.0, ans=0.125 2024-09-17 23:02:55,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318580.0, ans=0.125 2024-09-17 23:03:00,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.64 vs. limit=15.0 2024-09-17 23:03:16,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=318620.0, ans=0.125 2024-09-17 23:03:28,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=318660.0, ans=0.0 2024-09-17 23:03:36,482 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 8.599e+01 9.139e+01 9.802e+01 1.659e+02, threshold=1.828e+02, percent-clipped=0.0 2024-09-17 23:03:41,065 INFO [train.py:1198] (1/2) Epoch 18, batch 2750, loss[loss=0.2379, ctc_loss=0.1333, cr_loss=0.3793, attn_decoder_loss=0.2411, over 29536.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1413, cr_loss=0.3827, attn_decoder_loss=0.2527, over 5795386.03 frames. ], batch size: 75, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 23:03:43,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=318700.0, ans=0.025 2024-09-17 23:03:50,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=7.13 vs. limit=12.0 2024-09-17 23:03:56,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=318740.0, ans=0.0 2024-09-17 23:04:02,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=318740.0, ans=0.04949747468305833 2024-09-17 23:04:11,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.22 vs. limit=10.0 2024-09-17 23:04:40,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.69 vs. limit=15.0 2024-09-17 23:04:51,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=318860.0, ans=0.125 2024-09-17 23:04:57,092 INFO [train.py:1198] (1/2) Epoch 18, batch 2800, loss[loss=0.2726, ctc_loss=0.173, cr_loss=0.3852, attn_decoder_loss=0.2751, over 20366.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1416, cr_loss=0.3831, attn_decoder_loss=0.253, over 5775981.26 frames. ], batch size: 209, lr: 6.18e-03, grad_scale: 16.0 2024-09-17 23:04:58,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=318900.0, ans=0.125 2024-09-17 23:05:04,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=318900.0, ans=0.0 2024-09-17 23:05:58,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=319060.0, ans=0.0 2024-09-17 23:06:02,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.14 vs. limit=15.0 2024-09-17 23:06:02,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=319060.0, ans=0.125 2024-09-17 23:06:04,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=319060.0, ans=0.025 2024-09-17 23:06:10,010 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.902e+01 8.963e+01 9.729e+01 1.060e+02 3.606e+02, threshold=1.946e+02, percent-clipped=3.0 2024-09-17 23:06:13,107 INFO [train.py:1198] (1/2) Epoch 18, batch 2850, loss[loss=0.2539, ctc_loss=0.1433, cr_loss=0.4071, attn_decoder_loss=0.2571, over 29476.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1421, cr_loss=0.3842, attn_decoder_loss=0.2534, over 5761301.44 frames. ], batch size: 77, lr: 6.18e-03, grad_scale: 8.0 2024-09-17 23:06:16,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=319100.0, ans=0.0 2024-09-17 23:06:24,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2024-09-17 23:06:32,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.26 vs. limit=22.5 2024-09-17 23:06:33,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=319140.0, ans=0.2 2024-09-17 23:06:40,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=319140.0, ans=0.0 2024-09-17 23:07:12,052 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:07:33,501 INFO [train.py:1198] (1/2) Epoch 18, batch 2900, loss[loss=0.2435, ctc_loss=0.1343, cr_loss=0.3835, attn_decoder_loss=0.2471, over 29437.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1428, cr_loss=0.386, attn_decoder_loss=0.2547, over 5786888.91 frames. ], batch size: 79, lr: 6.18e-03, grad_scale: 8.0 2024-09-17 23:07:44,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=319300.0, ans=0.0 2024-09-17 23:07:48,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=22.5 2024-09-17 23:07:53,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=319340.0, ans=0.0 2024-09-17 23:08:02,939 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:08:13,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=319380.0, ans=0.1 2024-09-17 23:08:18,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2024-09-17 23:08:23,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2024-09-17 23:08:46,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.577e+01 9.174e+01 9.696e+01 1.530e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-17 23:08:49,581 INFO [train.py:1198] (1/2) Epoch 18, batch 2950, loss[loss=0.2396, ctc_loss=0.1392, cr_loss=0.3909, attn_decoder_loss=0.242, over 29522.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1417, cr_loss=0.3837, attn_decoder_loss=0.2532, over 5780572.99 frames. ], batch size: 75, lr: 6.18e-03, grad_scale: 8.0 2024-09-17 23:08:56,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-09-17 23:08:58,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=319500.0, ans=0.125 2024-09-17 23:09:13,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=15.0 2024-09-17 23:09:41,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=319620.0, ans=0.125 2024-09-17 23:09:45,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-09-17 23:09:46,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=319620.0, ans=0.125 2024-09-17 23:09:55,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319660.0, ans=0.1 2024-09-17 23:10:04,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=319700.0, ans=0.1 2024-09-17 23:10:05,406 INFO [train.py:1198] (1/2) Epoch 18, batch 3000, loss[loss=0.2416, ctc_loss=0.1308, cr_loss=0.3675, attn_decoder_loss=0.2457, over 29756.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1415, cr_loss=0.3834, attn_decoder_loss=0.2529, over 5782820.89 frames. ], batch size: 81, lr: 6.18e-03, grad_scale: 8.0 2024-09-17 23:10:05,406 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 23:10:24,020 INFO [train.py:1230] (1/2) Epoch 18, validation: loss=0.211, ctc_loss=0.04071, cr_loss=4.994e-15, attn_decoder_loss=0.23, over 944034.00 frames. 2024-09-17 23:10:24,021 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 23:10:40,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=319740.0, ans=0.0 2024-09-17 23:10:55,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=319780.0, ans=0.2 2024-09-17 23:11:05,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=319780.0, ans=0.2 2024-09-17 23:11:20,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=319820.0, ans=0.025 2024-09-17 23:11:30,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2024-09-17 23:11:33,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.28 vs. limit=15.0 2024-09-17 23:11:41,318 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 9.075e+01 9.591e+01 1.002e+02 5.340e+02, threshold=1.918e+02, percent-clipped=4.0 2024-09-17 23:11:44,493 INFO [train.py:1198] (1/2) Epoch 18, batch 3050, loss[loss=0.2359, ctc_loss=0.1217, cr_loss=0.3426, attn_decoder_loss=0.241, over 29537.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.142, cr_loss=0.3837, attn_decoder_loss=0.2536, over 5776498.69 frames. ], batch size: 76, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:12:02,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=319940.0, ans=0.0 2024-09-17 23:12:04,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=319940.0, ans=0.1 2024-09-17 23:12:08,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=319940.0, ans=0.125 2024-09-17 23:12:15,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=319980.0, ans=0.0 2024-09-17 23:12:57,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320060.0, ans=0.1 2024-09-17 23:13:07,354 INFO [train.py:1198] (1/2) Epoch 18, batch 3100, loss[loss=0.2615, ctc_loss=0.1447, cr_loss=0.3753, attn_decoder_loss=0.2661, over 29308.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1418, cr_loss=0.3836, attn_decoder_loss=0.2532, over 5777190.37 frames. ], batch size: 100, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:13:12,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=320100.0, ans=0.125 2024-09-17 23:13:21,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=320140.0, ans=0.025 2024-09-17 23:13:25,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=320140.0, ans=0.2 2024-09-17 23:13:55,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=320220.0, ans=10.0 2024-09-17 23:14:21,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.202e+01 8.516e+01 8.988e+01 9.440e+01 2.574e+02, threshold=1.798e+02, percent-clipped=3.0 2024-09-17 23:14:25,207 INFO [train.py:1198] (1/2) Epoch 18, batch 3150, loss[loss=0.258, ctc_loss=0.1501, cr_loss=0.414, attn_decoder_loss=0.2607, over 28934.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1414, cr_loss=0.3837, attn_decoder_loss=0.253, over 5782951.44 frames. ], batch size: 104, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:15:29,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=320460.0, ans=0.5 2024-09-17 23:15:31,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=320460.0, ans=0.125 2024-09-17 23:15:34,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=320460.0, ans=0.0 2024-09-17 23:15:40,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=320460.0, ans=0.0 2024-09-17 23:15:41,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=320500.0, ans=0.125 2024-09-17 23:15:43,028 INFO [train.py:1198] (1/2) Epoch 18, batch 3200, loss[loss=0.2523, ctc_loss=0.1411, cr_loss=0.41, attn_decoder_loss=0.2556, over 29423.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1407, cr_loss=0.3825, attn_decoder_loss=0.2522, over 5793601.15 frames. ], batch size: 79, lr: 6.17e-03, grad_scale: 16.0 2024-09-17 23:15:50,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=320500.0, ans=0.2 2024-09-17 23:15:56,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=320540.0, ans=0.0 2024-09-17 23:16:14,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2024-09-17 23:16:22,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=320580.0, ans=0.05 2024-09-17 23:16:45,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=320660.0, ans=0.1 2024-09-17 23:16:52,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.49 vs. limit=15.0 2024-09-17 23:16:58,817 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.773e+01 9.216e+01 9.587e+01 1.476e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-17 23:16:58,843 INFO [train.py:1198] (1/2) Epoch 18, batch 3250, loss[loss=0.248, ctc_loss=0.1329, cr_loss=0.3573, attn_decoder_loss=0.2528, over 29684.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.141, cr_loss=0.383, attn_decoder_loss=0.2528, over 5800607.63 frames. ], batch size: 84, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:17:08,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=320700.0, ans=0.125 2024-09-17 23:17:16,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=320740.0, ans=0.0 2024-09-17 23:17:24,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320740.0, ans=0.1 2024-09-17 23:17:27,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=320780.0, ans=0.09899494936611666 2024-09-17 23:17:32,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=320780.0, ans=0.125 2024-09-17 23:17:44,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=320820.0, ans=0.09899494936611666 2024-09-17 23:17:47,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=320820.0, ans=0.125 2024-09-17 23:18:10,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=320860.0, ans=0.05 2024-09-17 23:18:13,781 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:18:14,951 INFO [train.py:1198] (1/2) Epoch 18, batch 3300, loss[loss=0.2586, ctc_loss=0.145, cr_loss=0.3946, attn_decoder_loss=0.2625, over 28217.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1403, cr_loss=0.3816, attn_decoder_loss=0.2518, over 5797593.44 frames. ], batch size: 111, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:18:54,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=15.0 2024-09-17 23:19:17,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=321020.0, ans=0.125 2024-09-17 23:19:35,429 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 8.646e+01 9.242e+01 9.946e+01 1.965e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-17 23:19:35,450 INFO [train.py:1198] (1/2) Epoch 18, batch 3350, loss[loss=0.2632, ctc_loss=0.1483, cr_loss=0.4147, attn_decoder_loss=0.2668, over 28890.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1412, cr_loss=0.3831, attn_decoder_loss=0.2528, over 5774036.44 frames. ], batch size: 104, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:19:46,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=321100.0, ans=0.1 2024-09-17 23:20:17,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=321180.0, ans=0.0 2024-09-17 23:20:18,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=321180.0, ans=0.125 2024-09-17 23:20:21,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=321220.0, ans=0.025 2024-09-17 23:20:22,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-09-17 23:20:23,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.07 vs. limit=15.0 2024-09-17 23:20:50,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=321300.0, ans=0.125 2024-09-17 23:20:51,647 INFO [train.py:1198] (1/2) Epoch 18, batch 3400, loss[loss=0.2183, ctc_loss=0.1191, cr_loss=0.3276, attn_decoder_loss=0.2221, over 29352.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1417, cr_loss=0.3833, attn_decoder_loss=0.2529, over 5766874.59 frames. ], batch size: 67, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:20:55,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-09-17 23:21:14,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=321340.0, ans=0.0 2024-09-17 23:21:18,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=321340.0, ans=15.0 2024-09-17 23:21:19,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.61 vs. limit=15.0 2024-09-17 23:21:28,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.12 vs. limit=15.0 2024-09-17 23:21:31,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=321380.0, ans=0.025 2024-09-17 23:21:36,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=321420.0, ans=0.125 2024-09-17 23:21:47,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.16 vs. limit=22.5 2024-09-17 23:22:09,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.32 vs. limit=15.0 2024-09-17 23:22:09,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.690e+01 9.213e+01 9.933e+01 2.095e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-17 23:22:09,848 INFO [train.py:1198] (1/2) Epoch 18, batch 3450, loss[loss=0.2657, ctc_loss=0.1534, cr_loss=0.4109, attn_decoder_loss=0.269, over 28192.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1419, cr_loss=0.3841, attn_decoder_loss=0.2533, over 5774627.38 frames. ], batch size: 111, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:22:29,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=321540.0, ans=0.125 2024-09-17 23:22:34,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-09-17 23:22:43,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=321580.0, ans=0.125 2024-09-17 23:22:43,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-09-17 23:22:57,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=321620.0, ans=0.025 2024-09-17 23:22:58,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=321620.0, ans=0.1 2024-09-17 23:23:10,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=321620.0, ans=0.125 2024-09-17 23:23:10,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=321620.0, ans=0.125 2024-09-17 23:23:14,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=321660.0, ans=0.0 2024-09-17 23:23:25,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=321660.0, ans=0.125 2024-09-17 23:23:28,089 INFO [train.py:1198] (1/2) Epoch 18, batch 3500, loss[loss=0.2144, ctc_loss=0.1154, cr_loss=0.3299, attn_decoder_loss=0.2181, over 29337.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1414, cr_loss=0.3837, attn_decoder_loss=0.2525, over 5775331.27 frames. ], batch size: 71, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:23:36,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=321700.0, ans=0.125 2024-09-17 23:23:40,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=321700.0, ans=0.0 2024-09-17 23:23:45,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321740.0, ans=0.1 2024-09-17 23:23:45,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=321740.0, ans=0.125 2024-09-17 23:23:58,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=321780.0, ans=0.125 2024-09-17 23:24:10,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=321780.0, ans=0.035 2024-09-17 23:24:31,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=321860.0, ans=0.2 2024-09-17 23:24:35,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=321860.0, ans=0.0 2024-09-17 23:24:42,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.524e+01 9.219e+01 9.966e+01 2.449e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-17 23:24:42,831 INFO [train.py:1198] (1/2) Epoch 18, batch 3550, loss[loss=0.2534, ctc_loss=0.1331, cr_loss=0.3615, attn_decoder_loss=0.2587, over 29714.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1412, cr_loss=0.3831, attn_decoder_loss=0.2526, over 5782403.15 frames. ], batch size: 89, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:24:43,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-09-17 23:25:11,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=321980.0, ans=0.1 2024-09-17 23:25:28,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=322020.0, ans=22.5 2024-09-17 23:25:29,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-17 23:25:33,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=322020.0, ans=10.0 2024-09-17 23:25:57,198 INFO [train.py:1198] (1/2) Epoch 18, batch 3600, loss[loss=0.2446, ctc_loss=0.1271, cr_loss=0.3596, attn_decoder_loss=0.2497, over 29489.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1413, cr_loss=0.3837, attn_decoder_loss=0.253, over 5791700.76 frames. ], batch size: 77, lr: 6.15e-03, grad_scale: 16.0 2024-09-17 23:26:16,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-09-17 23:26:49,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=322220.0, ans=0.125 2024-09-17 23:26:54,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=322220.0, ans=0.0 2024-09-17 23:27:11,822 INFO [train.py:1198] (1/2) Epoch 18, batch 3650, loss[loss=0.2639, ctc_loss=0.1519, cr_loss=0.4094, attn_decoder_loss=0.2673, over 29475.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1407, cr_loss=0.3829, attn_decoder_loss=0.2524, over 5793055.85 frames. ], batch size: 90, lr: 6.15e-03, grad_scale: 8.0 2024-09-17 23:27:13,207 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.529e+01 9.051e+01 9.513e+01 1.639e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-17 23:27:27,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=322340.0, ans=0.0 2024-09-17 23:27:27,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=322340.0, ans=0.125 2024-09-17 23:27:28,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=322340.0, ans=0.1 2024-09-17 23:27:34,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=322340.0, ans=0.1 2024-09-17 23:27:34,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=322340.0, ans=0.09899494936611666 2024-09-17 23:27:46,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=322380.0, ans=0.125 2024-09-17 23:28:01,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=322420.0, ans=0.0 2024-09-17 23:28:05,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=322420.0, ans=0.07 2024-09-17 23:28:29,087 INFO [train.py:1198] (1/2) Epoch 18, batch 3700, loss[loss=0.2679, ctc_loss=0.1524, cr_loss=0.4168, attn_decoder_loss=0.2715, over 29712.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1408, cr_loss=0.3835, attn_decoder_loss=0.2526, over 5803210.26 frames. ], batch size: 84, lr: 6.15e-03, grad_scale: 8.0 2024-09-17 23:28:44,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=322540.0, ans=0.125 2024-09-17 23:28:51,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322540.0, ans=0.1 2024-09-17 23:29:04,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.25 vs. limit=15.0 2024-09-17 23:29:24,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=322620.0, ans=0.125 2024-09-17 23:29:44,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.60 vs. limit=15.0 2024-09-17 23:29:45,361 INFO [train.py:1198] (1/2) Epoch 18, batch 3750, loss[loss=0.2218, ctc_loss=0.1219, cr_loss=0.3412, attn_decoder_loss=0.2253, over 29373.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1412, cr_loss=0.3845, attn_decoder_loss=0.2525, over 5807066.96 frames. ], batch size: 67, lr: 6.15e-03, grad_scale: 8.0 2024-09-17 23:29:46,817 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.942e+01 8.687e+01 9.263e+01 1.001e+02 2.346e+02, threshold=1.853e+02, percent-clipped=1.0 2024-09-17 23:29:48,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.83 vs. limit=22.5 2024-09-17 23:29:56,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=322700.0, ans=0.125 2024-09-17 23:30:24,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=322780.0, ans=0.2 2024-09-17 23:30:31,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=322820.0, ans=0.0 2024-09-17 23:30:40,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=322820.0, ans=0.2 2024-09-17 23:30:43,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.76 vs. limit=10.0 2024-09-17 23:30:56,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=322860.0, ans=0.125 2024-09-17 23:30:58,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=322900.0, ans=0.015 2024-09-17 23:31:00,152 INFO [train.py:1198] (1/2) Epoch 18, batch 3800, loss[loss=0.2569, ctc_loss=0.1382, cr_loss=0.3848, attn_decoder_loss=0.2616, over 29643.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1409, cr_loss=0.3844, attn_decoder_loss=0.252, over 5797128.56 frames. ], batch size: 86, lr: 6.15e-03, grad_scale: 8.0 2024-09-17 23:31:31,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2024-09-17 23:31:40,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=322980.0, ans=0.5 2024-09-17 23:31:48,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=323020.0, ans=0.125 2024-09-17 23:31:48,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=323020.0, ans=0.05 2024-09-17 23:31:49,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=323020.0, ans=0.0 2024-09-17 23:32:06,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=323060.0, ans=0.1 2024-09-17 23:32:08,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2024-09-17 23:32:14,631 INFO [train.py:1198] (1/2) Epoch 18, batch 3850, loss[loss=0.2638, ctc_loss=0.1483, cr_loss=0.4157, attn_decoder_loss=0.2674, over 29182.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1404, cr_loss=0.3833, attn_decoder_loss=0.2519, over 5811274.43 frames. ], batch size: 100, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:32:16,117 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.799e+01 9.187e+01 9.877e+01 1.493e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-17 23:32:22,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=323100.0, ans=0.0 2024-09-17 23:32:26,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=323100.0, ans=0.0 2024-09-17 23:32:26,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=323100.0, ans=0.125 2024-09-17 23:32:36,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=323140.0, ans=0.125 2024-09-17 23:32:40,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.58 vs. limit=15.0 2024-09-17 23:32:55,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=323180.0, ans=0.0 2024-09-17 23:33:01,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=323220.0, ans=0.125 2024-09-17 23:33:06,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.39 vs. limit=15.0 2024-09-17 23:33:31,048 INFO [train.py:1198] (1/2) Epoch 18, batch 3900, loss[loss=0.2622, ctc_loss=0.1493, cr_loss=0.403, attn_decoder_loss=0.2658, over 29645.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1411, cr_loss=0.3846, attn_decoder_loss=0.2525, over 5815352.59 frames. ], batch size: 86, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:33:34,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=323300.0, ans=0.125 2024-09-17 23:34:10,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2024-09-17 23:34:24,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=323420.0, ans=0.2 2024-09-17 23:34:44,931 INFO [train.py:1198] (1/2) Epoch 18, batch 3950, loss[loss=0.2584, ctc_loss=0.1486, cr_loss=0.3954, attn_decoder_loss=0.2618, over 29495.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1408, cr_loss=0.3844, attn_decoder_loss=0.2526, over 5834893.91 frames. ], batch size: 97, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:34:46,429 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 8.710e+01 9.175e+01 9.677e+01 1.510e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-17 23:34:54,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.60 vs. limit=12.0 2024-09-17 23:35:00,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=323540.0, ans=0.125 2024-09-17 23:35:01,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=323540.0, ans=0.2 2024-09-17 23:35:06,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=12.0 2024-09-17 23:35:18,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.36 vs. limit=15.0 2024-09-17 23:35:27,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-09-17 23:35:40,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=12.0 2024-09-17 23:35:41,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.11 vs. limit=22.5 2024-09-17 23:35:53,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=323660.0, ans=0.125 2024-09-17 23:36:00,068 INFO [train.py:1198] (1/2) Epoch 18, batch 4000, loss[loss=0.2282, ctc_loss=0.1208, cr_loss=0.3412, attn_decoder_loss=0.2325, over 29513.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1406, cr_loss=0.3834, attn_decoder_loss=0.2523, over 5812730.21 frames. ], batch size: 74, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:36:00,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=323700.0, ans=10.0 2024-09-17 23:36:07,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=323700.0, ans=0.125 2024-09-17 23:36:16,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=323740.0, ans=0.035 2024-09-17 23:36:31,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=323780.0, ans=0.0 2024-09-17 23:36:37,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=323780.0, ans=0.1 2024-09-17 23:36:41,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=323780.0, ans=0.0 2024-09-17 23:36:52,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=323820.0, ans=0.0 2024-09-17 23:37:14,031 INFO [train.py:1198] (1/2) Epoch 18, batch 4050, loss[loss=0.2873, ctc_loss=0.2079, cr_loss=0.434, attn_decoder_loss=0.2865, over 20369.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1404, cr_loss=0.3828, attn_decoder_loss=0.2522, over 5797034.08 frames. ], batch size: 210, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:37:16,868 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.693e+01 8.741e+01 9.284e+01 9.840e+01 3.533e+02, threshold=1.857e+02, percent-clipped=2.0 2024-09-17 23:37:26,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.71 vs. limit=22.5 2024-09-17 23:37:28,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=323940.0, ans=0.125 2024-09-17 23:37:30,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=323940.0, ans=0.07 2024-09-17 23:37:38,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=323940.0, ans=0.125 2024-09-17 23:37:41,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=323980.0, ans=0.125 2024-09-17 23:37:44,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=323980.0, ans=0.5 2024-09-17 23:37:46,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=323980.0, ans=0.125 2024-09-17 23:37:58,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.32 vs. limit=15.0 2024-09-17 23:37:59,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=324020.0, ans=0.2 2024-09-17 23:38:12,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=324060.0, ans=0.125 2024-09-17 23:38:28,779 INFO [train.py:1198] (1/2) Epoch 18, batch 4100, loss[loss=0.2594, ctc_loss=0.1454, cr_loss=0.4015, attn_decoder_loss=0.2631, over 29487.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.141, cr_loss=0.3838, attn_decoder_loss=0.2527, over 5792281.30 frames. ], batch size: 90, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:38:33,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=324100.0, ans=0.125 2024-09-17 23:38:39,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=324100.0, ans=0.0 2024-09-17 23:39:07,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=324180.0, ans=0.04949747468305833 2024-09-17 23:39:30,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=324260.0, ans=0.125 2024-09-17 23:39:36,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=324260.0, ans=0.125 2024-09-17 23:39:43,551 INFO [train.py:1198] (1/2) Epoch 18, batch 4150, loss[loss=0.2334, ctc_loss=0.1308, cr_loss=0.3468, attn_decoder_loss=0.2371, over 29514.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1403, cr_loss=0.3831, attn_decoder_loss=0.2521, over 5798072.65 frames. ], batch size: 77, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:39:45,386 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:39:46,526 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.386e+01 9.045e+01 9.725e+01 1.428e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-17 23:40:04,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=324340.0, ans=0.125 2024-09-17 23:40:19,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=324380.0, ans=0.0 2024-09-17 23:40:19,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=324380.0, ans=0.025 2024-09-17 23:40:39,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=324420.0, ans=0.025 2024-09-17 23:40:42,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=324460.0, ans=0.125 2024-09-17 23:40:44,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=324460.0, ans=0.2 2024-09-17 23:40:44,472 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:40:54,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=324460.0, ans=0.1 2024-09-17 23:40:57,213 INFO [train.py:1198] (1/2) Epoch 18, batch 4200, loss[loss=0.2693, ctc_loss=0.1609, cr_loss=0.4277, attn_decoder_loss=0.2719, over 29513.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1411, cr_loss=0.3843, attn_decoder_loss=0.2527, over 5800138.33 frames. ], batch size: 90, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:41:14,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=324540.0, ans=0.2 2024-09-17 23:41:26,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.41 vs. limit=15.0 2024-09-17 23:41:31,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=324580.0, ans=0.125 2024-09-17 23:41:35,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.37 vs. limit=15.0 2024-09-17 23:41:56,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=324660.0, ans=0.0 2024-09-17 23:42:03,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=324660.0, ans=0.125 2024-09-17 23:42:10,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=324700.0, ans=0.125 2024-09-17 23:42:11,842 INFO [train.py:1198] (1/2) Epoch 18, batch 4250, loss[loss=0.2238, ctc_loss=0.1224, cr_loss=0.3604, attn_decoder_loss=0.2271, over 29528.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1407, cr_loss=0.384, attn_decoder_loss=0.2526, over 5804736.91 frames. ], batch size: 74, lr: 6.13e-03, grad_scale: 4.0 2024-09-17 23:42:13,484 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:42:16,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 8.827e+01 9.431e+01 1.016e+02 4.056e+02, threshold=1.886e+02, percent-clipped=2.0 2024-09-17 23:42:19,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=324700.0, ans=0.0 2024-09-17 23:42:22,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=324700.0, ans=0.1 2024-09-17 23:42:45,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324780.0, ans=0.1 2024-09-17 23:42:52,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2024-09-17 23:43:08,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=324820.0, ans=0.125 2024-09-17 23:43:11,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=22.5 2024-09-17 23:43:24,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=324860.0, ans=0.04949747468305833 2024-09-17 23:43:27,292 INFO [train.py:1198] (1/2) Epoch 18, batch 4300, loss[loss=0.2618, ctc_loss=0.146, cr_loss=0.3947, attn_decoder_loss=0.2659, over 29546.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1409, cr_loss=0.3838, attn_decoder_loss=0.2529, over 5796093.48 frames. ], batch size: 87, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:43:27,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324900.0, ans=0.1 2024-09-17 23:44:17,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=325020.0, ans=10.0 2024-09-17 23:44:22,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=325020.0, ans=0.2 2024-09-17 23:44:25,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=325060.0, ans=0.125 2024-09-17 23:44:38,478 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:44:41,170 INFO [train.py:1198] (1/2) Epoch 18, batch 4350, loss[loss=0.2659, ctc_loss=0.1595, cr_loss=0.4163, attn_decoder_loss=0.2684, over 29461.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1433, cr_loss=0.3889, attn_decoder_loss=0.256, over 5798464.91 frames. ], batch size: 97, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:44:46,378 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.876e+01 8.831e+01 9.306e+01 9.822e+01 6.484e+02, threshold=1.861e+02, percent-clipped=2.0 2024-09-17 23:44:51,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.48 vs. limit=15.0 2024-09-17 23:45:04,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=325140.0, ans=0.0 2024-09-17 23:45:37,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=325220.0, ans=0.125 2024-09-17 23:45:38,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=22.5 2024-09-17 23:45:43,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=325260.0, ans=0.1 2024-09-17 23:45:54,954 INFO [train.py:1198] (1/2) Epoch 18, batch 4400, loss[loss=0.2628, ctc_loss=0.1526, cr_loss=0.3982, attn_decoder_loss=0.2661, over 27156.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1452, cr_loss=0.3915, attn_decoder_loss=0.2583, over 5768104.73 frames. ], batch size: 124, lr: 6.12e-03, grad_scale: 16.0 2024-09-17 23:46:46,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=325420.0, ans=0.1 2024-09-17 23:46:53,334 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=22.5 2024-09-17 23:47:10,214 INFO [train.py:1198] (1/2) Epoch 18, batch 4450, loss[loss=0.2707, ctc_loss=0.1762, cr_loss=0.4152, attn_decoder_loss=0.272, over 20154.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1493, cr_loss=0.3959, attn_decoder_loss=0.2606, over 5578451.33 frames. ], batch size: 211, lr: 6.12e-03, grad_scale: 8.0 2024-09-17 23:47:16,218 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.181e+01 9.154e+01 9.637e+01 1.052e+02 1.489e+02, threshold=1.927e+02, percent-clipped=0.0 2024-09-17 23:47:24,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=325540.0, ans=0.2 2024-09-17 23:47:42,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=325580.0, ans=0.125 2024-09-17 23:47:57,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=325620.0, ans=0.125 2024-09-17 23:48:06,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=325620.0, ans=0.125 2024-09-17 23:48:09,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=325660.0, ans=0.1 2024-09-17 23:48:14,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=325660.0, ans=0.04949747468305833 2024-09-17 23:48:17,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=325660.0, ans=0.0 2024-09-17 23:48:20,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325660.0, ans=0.1 2024-09-17 23:48:22,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=325660.0, ans=0.2 2024-09-17 23:48:25,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=325700.0, ans=0.125 2024-09-17 23:48:26,297 INFO [train.py:1198] (1/2) Epoch 18, batch 4500, loss[loss=0.2792, ctc_loss=0.1908, cr_loss=0.4286, attn_decoder_loss=0.2795, over 19478.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1545, cr_loss=0.3987, attn_decoder_loss=0.2632, over 5235106.79 frames. ], batch size: 209, lr: 6.12e-03, grad_scale: 8.0 2024-09-17 23:48:56,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=325780.0, ans=0.125 2024-09-17 23:48:56,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=325780.0, ans=0.2 2024-09-17 23:49:55,593 INFO [train.py:1198] (1/2) Epoch 19, batch 0, loss[loss=0.2332, ctc_loss=0.1278, cr_loss=0.374, attn_decoder_loss=0.2366, over 29596.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1278, cr_loss=0.374, attn_decoder_loss=0.2366, over 29596.00 frames. ], batch size: 73, lr: 5.95e-03, grad_scale: 16.0 2024-09-17 23:49:55,593 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 23:50:13,877 INFO [train.py:1230] (1/2) Epoch 19, validation: loss=0.2122, ctc_loss=0.03932, cr_loss=5e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-17 23:50:13,877 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-17 23:50:14,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=325800.0, ans=0.0 2024-09-17 23:50:23,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=325800.0, ans=0.0 2024-09-17 23:50:45,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=325880.0, ans=0.125 2024-09-17 23:50:48,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325880.0, ans=0.1 2024-09-17 23:50:59,123 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.957e+01 1.057e+02 1.132e+02 1.239e+02 3.685e+02, threshold=2.265e+02, percent-clipped=3.0 2024-09-17 23:51:12,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.31 vs. limit=8.0 2024-09-17 23:51:17,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=325960.0, ans=0.0 2024-09-17 23:51:28,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=326000.0, ans=0.125 2024-09-17 23:51:29,515 INFO [train.py:1198] (1/2) Epoch 19, batch 50, loss[loss=0.2169, ctc_loss=0.1111, cr_loss=0.3196, attn_decoder_loss=0.2215, over 29408.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1446, cr_loss=0.3886, attn_decoder_loss=0.2544, over 1267457.46 frames. ], batch size: 70, lr: 5.95e-03, grad_scale: 8.0 2024-09-17 23:51:33,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=326000.0, ans=0.025 2024-09-17 23:51:34,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=326000.0, ans=0.125 2024-09-17 23:51:41,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=326000.0, ans=0.0 2024-09-17 23:52:01,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=326080.0, ans=0.1 2024-09-17 23:52:03,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=326080.0, ans=0.125 2024-09-17 23:52:13,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=326080.0, ans=0.125 2024-09-17 23:52:15,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=326080.0, ans=0.025 2024-09-17 23:52:49,548 INFO [train.py:1198] (1/2) Epoch 19, batch 100, loss[loss=0.2454, ctc_loss=0.1382, cr_loss=0.3723, attn_decoder_loss=0.2491, over 29549.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.145, cr_loss=0.3912, attn_decoder_loss=0.256, over 2251143.70 frames. ], batch size: 76, lr: 5.95e-03, grad_scale: 8.0 2024-09-17 23:52:51,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=326200.0, ans=0.0 2024-09-17 23:53:03,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2024-09-17 23:53:13,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=326240.0, ans=0.125 2024-09-17 23:53:16,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=326240.0, ans=0.04949747468305833 2024-09-17 23:53:18,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=326280.0, ans=0.1 2024-09-17 23:53:24,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=326280.0, ans=0.125 2024-09-17 23:53:24,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=326280.0, ans=0.125 2024-09-17 23:53:27,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=326280.0, ans=0.125 2024-09-17 23:53:31,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326280.0, ans=0.1 2024-09-17 23:53:34,404 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 8.614e+01 9.117e+01 9.815e+01 1.763e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-17 23:53:39,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=326320.0, ans=0.125 2024-09-17 23:54:03,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=326400.0, ans=0.5 2024-09-17 23:54:04,645 INFO [train.py:1198] (1/2) Epoch 19, batch 150, loss[loss=0.2144, ctc_loss=0.1103, cr_loss=0.3241, attn_decoder_loss=0.2188, over 29435.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1414, cr_loss=0.3844, attn_decoder_loss=0.2531, over 3045490.00 frames. ], batch size: 70, lr: 5.95e-03, grad_scale: 8.0 2024-09-17 23:54:06,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=326400.0, ans=0.0 2024-09-17 23:54:18,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=326440.0, ans=0.125 2024-09-17 23:54:24,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=326440.0, ans=0.0 2024-09-17 23:55:06,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=326560.0, ans=0.125 2024-09-17 23:55:17,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=326560.0, ans=0.0 2024-09-17 23:55:20,174 INFO [train.py:1198] (1/2) Epoch 19, batch 200, loss[loss=0.2646, ctc_loss=0.1574, cr_loss=0.4028, attn_decoder_loss=0.2675, over 27313.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1406, cr_loss=0.3838, attn_decoder_loss=0.2523, over 3657907.19 frames. ], batch size: 124, lr: 5.95e-03, grad_scale: 8.0 2024-09-17 23:55:32,047 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:55:43,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=326640.0, ans=0.0 2024-09-17 23:55:44,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=6.0 2024-09-17 23:55:54,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=326680.0, ans=0.125 2024-09-17 23:56:00,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=326680.0, ans=0.125 2024-09-17 23:56:10,482 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.598e+01 9.185e+01 9.838e+01 1.653e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-17 23:56:14,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2024-09-17 23:56:18,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.80 vs. limit=10.0 2024-09-17 23:56:30,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=326760.0, ans=0.025 2024-09-17 23:56:38,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=326760.0, ans=0.2 2024-09-17 23:56:40,801 INFO [train.py:1198] (1/2) Epoch 19, batch 250, loss[loss=0.2592, ctc_loss=0.1546, cr_loss=0.4087, attn_decoder_loss=0.2618, over 29272.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1401, cr_loss=0.3837, attn_decoder_loss=0.2522, over 4139321.62 frames. ], batch size: 100, lr: 5.94e-03, grad_scale: 8.0 2024-09-17 23:56:45,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=326800.0, ans=0.2 2024-09-17 23:56:49,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.44 vs. limit=5.0 2024-09-17 23:57:02,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=326840.0, ans=0.025 2024-09-17 23:57:22,160 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:57:32,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326920.0, ans=0.1 2024-09-17 23:57:38,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=326920.0, ans=0.1 2024-09-17 23:57:40,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.70 vs. limit=10.0 2024-09-17 23:57:43,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=326960.0, ans=0.1 2024-09-17 23:57:46,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=326960.0, ans=0.0 2024-09-17 23:57:56,483 INFO [train.py:1198] (1/2) Epoch 19, batch 300, loss[loss=0.2717, ctc_loss=0.1632, cr_loss=0.4242, attn_decoder_loss=0.2743, over 29546.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1395, cr_loss=0.3836, attn_decoder_loss=0.2519, over 4508869.40 frames. ], batch size: 92, lr: 5.94e-03, grad_scale: 8.0 2024-09-17 23:58:03,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=327000.0, ans=0.1 2024-09-17 23:58:12,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=327040.0, ans=0.0 2024-09-17 23:58:19,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=327040.0, ans=0.2 2024-09-17 23:58:21,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=327040.0, ans=0.125 2024-09-17 23:58:23,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2024-09-17 23:58:31,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=327080.0, ans=0.0 2024-09-17 23:58:37,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=327080.0, ans=0.0 2024-09-17 23:58:41,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.487e+01 9.041e+01 9.802e+01 3.671e+02, threshold=1.808e+02, percent-clipped=2.0 2024-09-17 23:58:48,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2024-09-17 23:59:06,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=327160.0, ans=0.0 2024-09-17 23:59:12,436 INFO [train.py:1198] (1/2) Epoch 19, batch 350, loss[loss=0.2146, ctc_loss=0.1147, cr_loss=0.3323, attn_decoder_loss=0.2183, over 29302.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1397, cr_loss=0.3833, attn_decoder_loss=0.2522, over 4796071.55 frames. ], batch size: 71, lr: 5.94e-03, grad_scale: 8.0 2024-09-17 23:59:18,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=327200.0, ans=0.0 2024-09-17 23:59:30,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=327240.0, ans=0.0 2024-09-17 23:59:30,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327240.0, ans=0.1 2024-09-17 23:59:33,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=327240.0, ans=0.125 2024-09-17 23:59:47,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.35 vs. limit=22.5 2024-09-17 23:59:58,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.25 vs. limit=15.0 2024-09-18 00:00:07,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=327320.0, ans=0.2 2024-09-18 00:00:09,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-09-18 00:00:11,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=327320.0, ans=0.0 2024-09-18 00:00:17,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.02 vs. limit=15.0 2024-09-18 00:00:21,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=327360.0, ans=10.0 2024-09-18 00:00:27,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=327360.0, ans=0.125 2024-09-18 00:00:30,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=327360.0, ans=0.04949747468305833 2024-09-18 00:00:32,747 INFO [train.py:1198] (1/2) Epoch 19, batch 400, loss[loss=0.2652, ctc_loss=0.1537, cr_loss=0.4121, attn_decoder_loss=0.2684, over 29719.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1393, cr_loss=0.3818, attn_decoder_loss=0.2518, over 5025017.96 frames. ], batch size: 82, lr: 5.94e-03, grad_scale: 16.0 2024-09-18 00:00:41,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.21 vs. limit=15.0 2024-09-18 00:01:05,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.39 vs. limit=15.0 2024-09-18 00:01:05,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2024-09-18 00:01:12,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=327480.0, ans=0.0 2024-09-18 00:01:19,971 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.676e+01 9.493e+01 1.045e+02 1.663e+02, threshold=1.899e+02, percent-clipped=0.0 2024-09-18 00:01:34,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=327560.0, ans=0.0 2024-09-18 00:01:34,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=327560.0, ans=0.125 2024-09-18 00:01:46,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=327560.0, ans=0.0 2024-09-18 00:01:48,886 INFO [train.py:1198] (1/2) Epoch 19, batch 450, loss[loss=0.2724, ctc_loss=0.1613, cr_loss=0.4146, attn_decoder_loss=0.2756, over 29699.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.14, cr_loss=0.3833, attn_decoder_loss=0.2522, over 5187986.71 frames. ], batch size: 83, lr: 5.94e-03, grad_scale: 8.0 2024-09-18 00:02:01,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=327600.0, ans=0.0 2024-09-18 00:02:10,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=327640.0, ans=22.5 2024-09-18 00:02:12,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2024-09-18 00:02:42,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=327720.0, ans=0.05 2024-09-18 00:02:51,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327760.0, ans=0.1 2024-09-18 00:03:04,353 INFO [train.py:1198] (1/2) Epoch 19, batch 500, loss[loss=0.2599, ctc_loss=0.151, cr_loss=0.4188, attn_decoder_loss=0.2627, over 29452.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1396, cr_loss=0.3827, attn_decoder_loss=0.2515, over 5330962.93 frames. ], batch size: 94, lr: 5.94e-03, grad_scale: 8.0 2024-09-18 00:03:39,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=327880.0, ans=0.125 2024-09-18 00:03:56,186 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 8.703e+01 9.333e+01 1.015e+02 2.225e+02, threshold=1.867e+02, percent-clipped=2.0 2024-09-18 00:04:04,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.34 vs. limit=22.5 2024-09-18 00:04:25,792 INFO [train.py:1198] (1/2) Epoch 19, batch 550, loss[loss=0.2636, ctc_loss=0.1521, cr_loss=0.4005, attn_decoder_loss=0.2671, over 28823.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1396, cr_loss=0.3821, attn_decoder_loss=0.2516, over 5423682.67 frames. ], batch size: 104, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:05:10,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.33 vs. limit=15.0 2024-09-18 00:05:19,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=328120.0, ans=0.1 2024-09-18 00:05:21,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328120.0, ans=0.1 2024-09-18 00:05:24,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=328160.0, ans=0.125 2024-09-18 00:05:37,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328160.0, ans=0.1 2024-09-18 00:05:37,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=328160.0, ans=0.125 2024-09-18 00:05:37,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=328160.0, ans=0.2 2024-09-18 00:05:41,273 INFO [train.py:1198] (1/2) Epoch 19, batch 600, loss[loss=0.2645, ctc_loss=0.1523, cr_loss=0.4105, attn_decoder_loss=0.2679, over 29258.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1398, cr_loss=0.3828, attn_decoder_loss=0.2521, over 5509657.44 frames. ], batch size: 100, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:06:14,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=328280.0, ans=0.025 2024-09-18 00:06:27,692 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.945e+01 9.378e+01 9.831e+01 2.043e+02, threshold=1.876e+02, percent-clipped=1.0 2024-09-18 00:06:33,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-09-18 00:06:35,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.30 vs. limit=15.0 2024-09-18 00:06:52,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=328360.0, ans=0.125 2024-09-18 00:06:56,864 INFO [train.py:1198] (1/2) Epoch 19, batch 650, loss[loss=0.2513, ctc_loss=0.1457, cr_loss=0.4087, attn_decoder_loss=0.2539, over 29766.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1392, cr_loss=0.3817, attn_decoder_loss=0.2513, over 5586805.26 frames. ], batch size: 81, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:07:01,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=328400.0, ans=0.015 2024-09-18 00:07:06,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=328400.0, ans=0.125 2024-09-18 00:07:15,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=328440.0, ans=0.09899494936611666 2024-09-18 00:07:24,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2024-09-18 00:07:37,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=328480.0, ans=0.125 2024-09-18 00:07:45,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=328520.0, ans=0.2 2024-09-18 00:08:10,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=328560.0, ans=0.2 2024-09-18 00:08:10,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-18 00:08:17,453 INFO [train.py:1198] (1/2) Epoch 19, batch 700, loss[loss=0.2372, ctc_loss=0.1302, cr_loss=0.3796, attn_decoder_loss=0.2407, over 29535.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1393, cr_loss=0.3824, attn_decoder_loss=0.2515, over 5637551.26 frames. ], batch size: 76, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:08:25,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.59 vs. limit=12.0 2024-09-18 00:08:29,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=328600.0, ans=0.125 2024-09-18 00:08:31,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=328640.0, ans=0.125 2024-09-18 00:08:33,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-18 00:08:53,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=328680.0, ans=0.125 2024-09-18 00:09:04,108 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.484e+01 8.986e+01 9.600e+01 2.397e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 00:09:19,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=328760.0, ans=0.025 2024-09-18 00:09:33,283 INFO [train.py:1198] (1/2) Epoch 19, batch 750, loss[loss=0.2545, ctc_loss=0.1514, cr_loss=0.4021, attn_decoder_loss=0.257, over 29706.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1386, cr_loss=0.3811, attn_decoder_loss=0.251, over 5677616.79 frames. ], batch size: 82, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:09:41,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=328800.0, ans=0.125 2024-09-18 00:09:54,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-18 00:10:12,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=328880.0, ans=0.125 2024-09-18 00:10:20,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=328920.0, ans=0.0 2024-09-18 00:10:44,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=328960.0, ans=0.125 2024-09-18 00:10:45,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=328960.0, ans=0.1 2024-09-18 00:10:47,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=329000.0, ans=0.125 2024-09-18 00:10:48,708 INFO [train.py:1198] (1/2) Epoch 19, batch 800, loss[loss=0.2306, ctc_loss=0.1305, cr_loss=0.3633, attn_decoder_loss=0.2337, over 29644.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.139, cr_loss=0.3816, attn_decoder_loss=0.251, over 5707397.38 frames. ], batch size: 73, lr: 5.92e-03, grad_scale: 16.0 2024-09-18 00:10:49,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=329000.0, ans=0.09899494936611666 2024-09-18 00:10:50,601 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:10:52,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-09-18 00:11:21,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=329080.0, ans=0.0 2024-09-18 00:11:25,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=329080.0, ans=0.125 2024-09-18 00:11:33,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=329080.0, ans=0.07 2024-09-18 00:11:33,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=329080.0, ans=0.2 2024-09-18 00:11:36,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=329120.0, ans=0.0 2024-09-18 00:11:39,696 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.734e+01 9.110e+01 9.840e+01 2.381e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 00:12:09,147 INFO [train.py:1198] (1/2) Epoch 19, batch 850, loss[loss=0.2698, ctc_loss=0.1629, cr_loss=0.4353, attn_decoder_loss=0.272, over 29728.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1392, cr_loss=0.3824, attn_decoder_loss=0.2512, over 5736758.15 frames. ], batch size: 89, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:12:13,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=329200.0, ans=0.0 2024-09-18 00:12:57,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=329320.0, ans=0.95 2024-09-18 00:13:24,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=329400.0, ans=0.025 2024-09-18 00:13:25,353 INFO [train.py:1198] (1/2) Epoch 19, batch 900, loss[loss=0.2214, ctc_loss=0.1157, cr_loss=0.3189, attn_decoder_loss=0.2261, over 29602.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1399, cr_loss=0.3833, attn_decoder_loss=0.2517, over 5741101.68 frames. ], batch size: 73, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:13:39,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329440.0, ans=0.1 2024-09-18 00:13:45,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=329440.0, ans=0.0 2024-09-18 00:13:50,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.64 vs. limit=10.0 2024-09-18 00:14:14,238 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.696e+01 9.115e+01 9.955e+01 6.704e+02, threshold=1.823e+02, percent-clipped=4.0 2024-09-18 00:14:17,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=329520.0, ans=0.125 2024-09-18 00:14:30,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=329560.0, ans=0.0 2024-09-18 00:14:41,597 INFO [train.py:1198] (1/2) Epoch 19, batch 950, loss[loss=0.228, ctc_loss=0.1148, cr_loss=0.3249, attn_decoder_loss=0.2333, over 29494.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1398, cr_loss=0.3828, attn_decoder_loss=0.2517, over 5741427.93 frames. ], batch size: 74, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:14:52,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329600.0, ans=0.1 2024-09-18 00:15:03,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=329640.0, ans=0.05 2024-09-18 00:15:03,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=329640.0, ans=0.0 2024-09-18 00:15:06,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=329640.0, ans=10.0 2024-09-18 00:15:10,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=329680.0, ans=0.2 2024-09-18 00:15:14,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-18 00:15:17,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2024-09-18 00:15:23,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=329680.0, ans=0.125 2024-09-18 00:15:26,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=329680.0, ans=0.125 2024-09-18 00:15:38,268 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:15:59,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=329760.0, ans=0.0 2024-09-18 00:16:01,894 INFO [train.py:1198] (1/2) Epoch 19, batch 1000, loss[loss=0.242, ctc_loss=0.1318, cr_loss=0.3637, attn_decoder_loss=0.2461, over 29509.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1408, cr_loss=0.3848, attn_decoder_loss=0.2526, over 5736592.52 frames. ], batch size: 77, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:16:08,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=329800.0, ans=0.0 2024-09-18 00:16:08,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2024-09-18 00:16:50,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.872e+01 9.584e+01 1.048e+02 1.890e+02, threshold=1.917e+02, percent-clipped=1.0 2024-09-18 00:16:52,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=329920.0, ans=0.0 2024-09-18 00:16:56,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=329920.0, ans=0.125 2024-09-18 00:17:07,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=329960.0, ans=0.0 2024-09-18 00:17:14,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=329960.0, ans=0.0 2024-09-18 00:17:17,662 INFO [train.py:1198] (1/2) Epoch 19, batch 1050, loss[loss=0.2534, ctc_loss=0.138, cr_loss=0.3977, attn_decoder_loss=0.2574, over 29692.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1398, cr_loss=0.3834, attn_decoder_loss=0.2517, over 5742447.52 frames. ], batch size: 85, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:17:30,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330000.0, ans=0.1 2024-09-18 00:17:32,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.14 vs. limit=10.0 2024-09-18 00:18:01,371 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:18:06,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-09-18 00:18:25,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330160.0, ans=0.1 2024-09-18 00:18:27,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=330160.0, ans=0.0 2024-09-18 00:18:28,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=330160.0, ans=0.125 2024-09-18 00:18:34,357 INFO [train.py:1198] (1/2) Epoch 19, batch 1100, loss[loss=0.2498, ctc_loss=0.148, cr_loss=0.4084, attn_decoder_loss=0.252, over 29431.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1393, cr_loss=0.3824, attn_decoder_loss=0.2512, over 5754868.47 frames. ], batch size: 78, lr: 5.91e-03, grad_scale: 8.0 2024-09-18 00:18:46,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=330200.0, ans=0.125 2024-09-18 00:18:48,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=330240.0, ans=0.125 2024-09-18 00:18:48,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=330240.0, ans=0.125 2024-09-18 00:19:12,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330280.0, ans=0.1 2024-09-18 00:19:20,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=330320.0, ans=0.125 2024-09-18 00:19:25,406 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.385e+01 8.690e+01 9.252e+01 1.167e+02, threshold=1.738e+02, percent-clipped=0.0 2024-09-18 00:19:28,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=330320.0, ans=0.125 2024-09-18 00:19:30,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=330320.0, ans=0.125 2024-09-18 00:19:36,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=22.5 2024-09-18 00:19:55,681 INFO [train.py:1198] (1/2) Epoch 19, batch 1150, loss[loss=0.2489, ctc_loss=0.1422, cr_loss=0.3989, attn_decoder_loss=0.2519, over 29444.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1394, cr_loss=0.3828, attn_decoder_loss=0.2512, over 5753819.08 frames. ], batch size: 78, lr: 5.91e-03, grad_scale: 8.0 2024-09-18 00:20:09,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=330440.0, ans=0.125 2024-09-18 00:20:15,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=330440.0, ans=0.2 2024-09-18 00:20:17,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=330440.0, ans=0.0 2024-09-18 00:20:18,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=330440.0, ans=0.0 2024-09-18 00:20:32,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=330480.0, ans=0.0 2024-09-18 00:20:46,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=330520.0, ans=0.025 2024-09-18 00:20:50,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2024-09-18 00:21:11,966 INFO [train.py:1198] (1/2) Epoch 19, batch 1200, loss[loss=0.2574, ctc_loss=0.1382, cr_loss=0.3696, attn_decoder_loss=0.2625, over 29707.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1401, cr_loss=0.3839, attn_decoder_loss=0.2521, over 5747631.79 frames. ], batch size: 85, lr: 5.91e-03, grad_scale: 16.0 2024-09-18 00:21:13,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=330600.0, ans=0.0 2024-09-18 00:21:14,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2024-09-18 00:21:18,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=330600.0, ans=0.0 2024-09-18 00:21:19,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=330600.0, ans=0.025 2024-09-18 00:21:24,712 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:22:02,443 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.778e+01 9.349e+01 9.833e+01 1.592e+02, threshold=1.870e+02, percent-clipped=0.0 2024-09-18 00:22:02,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=330720.0, ans=0.0 2024-09-18 00:22:12,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-09-18 00:22:24,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=330760.0, ans=0.125 2024-09-18 00:22:24,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=330760.0, ans=0.2 2024-09-18 00:22:28,392 INFO [train.py:1198] (1/2) Epoch 19, batch 1250, loss[loss=0.2642, ctc_loss=0.1479, cr_loss=0.3922, attn_decoder_loss=0.2684, over 29489.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1402, cr_loss=0.3839, attn_decoder_loss=0.2524, over 5774168.78 frames. ], batch size: 92, lr: 5.91e-03, grad_scale: 8.0 2024-09-18 00:23:00,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=330880.0, ans=0.2 2024-09-18 00:23:30,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=330920.0, ans=0.0 2024-09-18 00:23:30,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=330920.0, ans=0.125 2024-09-18 00:23:48,831 INFO [train.py:1198] (1/2) Epoch 19, batch 1300, loss[loss=0.2549, ctc_loss=0.1444, cr_loss=0.3909, attn_decoder_loss=0.2585, over 28588.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1401, cr_loss=0.3835, attn_decoder_loss=0.2521, over 5778782.67 frames. ], batch size: 112, lr: 5.91e-03, grad_scale: 8.0 2024-09-18 00:23:52,295 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:23:53,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=331000.0, ans=0.125 2024-09-18 00:24:05,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=331040.0, ans=0.0 2024-09-18 00:24:07,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=331040.0, ans=0.125 2024-09-18 00:24:18,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=331080.0, ans=0.2 2024-09-18 00:24:39,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.628e+01 9.058e+01 9.767e+01 1.420e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 00:25:05,554 INFO [train.py:1198] (1/2) Epoch 19, batch 1350, loss[loss=0.2473, ctc_loss=0.1431, cr_loss=0.3742, attn_decoder_loss=0.2506, over 29774.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1395, cr_loss=0.3827, attn_decoder_loss=0.2518, over 5795682.77 frames. ], batch size: 81, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:25:10,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=331200.0, ans=0.125 2024-09-18 00:25:15,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-18 00:25:19,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331240.0, ans=0.125 2024-09-18 00:25:19,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=331240.0, ans=0.2 2024-09-18 00:25:25,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=331240.0, ans=0.125 2024-09-18 00:25:41,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=331280.0, ans=0.09899494936611666 2024-09-18 00:25:43,566 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:25:46,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=331280.0, ans=0.025 2024-09-18 00:25:54,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=331320.0, ans=0.04949747468305833 2024-09-18 00:26:21,744 INFO [train.py:1198] (1/2) Epoch 19, batch 1400, loss[loss=0.2176, ctc_loss=0.1124, cr_loss=0.3161, attn_decoder_loss=0.2223, over 29568.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1394, cr_loss=0.3822, attn_decoder_loss=0.2518, over 5807442.11 frames. ], batch size: 69, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:26:51,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=12.0 2024-09-18 00:26:52,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=331480.0, ans=0.125 2024-09-18 00:26:55,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331480.0, ans=0.1 2024-09-18 00:26:58,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=331480.0, ans=0.125 2024-09-18 00:27:11,761 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.640e+01 9.143e+01 9.808e+01 1.570e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-18 00:27:17,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=331520.0, ans=0.125 2024-09-18 00:27:17,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=331520.0, ans=0.0 2024-09-18 00:27:27,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=331560.0, ans=0.0 2024-09-18 00:27:33,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=331560.0, ans=0.125 2024-09-18 00:27:38,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.61 vs. limit=15.0 2024-09-18 00:27:42,247 INFO [train.py:1198] (1/2) Epoch 19, batch 1450, loss[loss=0.2785, ctc_loss=0.1646, cr_loss=0.4487, attn_decoder_loss=0.2812, over 29400.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1398, cr_loss=0.383, attn_decoder_loss=0.2524, over 5803552.81 frames. ], batch size: 94, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:27:45,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=331600.0, ans=0.125 2024-09-18 00:28:09,687 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:28:27,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=331720.0, ans=0.0 2024-09-18 00:28:29,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=331720.0, ans=0.1 2024-09-18 00:28:57,908 INFO [train.py:1198] (1/2) Epoch 19, batch 1500, loss[loss=0.2544, ctc_loss=0.1333, cr_loss=0.3767, attn_decoder_loss=0.2595, over 29618.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1397, cr_loss=0.3829, attn_decoder_loss=0.2526, over 5803813.66 frames. ], batch size: 86, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:29:19,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=331840.0, ans=0.125 2024-09-18 00:29:21,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=331840.0, ans=0.0 2024-09-18 00:29:21,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=331840.0, ans=0.5 2024-09-18 00:29:41,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=331880.0, ans=0.0 2024-09-18 00:29:48,929 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.706e+01 9.242e+01 9.878e+01 2.158e+02, threshold=1.848e+02, percent-clipped=2.0 2024-09-18 00:29:52,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=331920.0, ans=0.0 2024-09-18 00:29:54,052 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:30:15,119 INFO [train.py:1198] (1/2) Epoch 19, batch 1550, loss[loss=0.2599, ctc_loss=0.1457, cr_loss=0.4047, attn_decoder_loss=0.2636, over 29468.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1398, cr_loss=0.3826, attn_decoder_loss=0.2526, over 5780860.20 frames. ], batch size: 90, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:30:38,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=332040.0, ans=0.0 2024-09-18 00:31:10,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332120.0, ans=0.1 2024-09-18 00:31:10,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=332120.0, ans=0.125 2024-09-18 00:31:32,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=12.0 2024-09-18 00:31:35,033 INFO [train.py:1198] (1/2) Epoch 19, batch 1600, loss[loss=0.2633, ctc_loss=0.1408, cr_loss=0.3826, attn_decoder_loss=0.2684, over 29670.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1395, cr_loss=0.3805, attn_decoder_loss=0.2521, over 5764562.80 frames. ], batch size: 85, lr: 5.90e-03, grad_scale: 16.0 2024-09-18 00:31:41,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=332200.0, ans=0.125 2024-09-18 00:31:44,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.10 vs. limit=10.0 2024-09-18 00:31:56,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=332240.0, ans=0.125 2024-09-18 00:32:16,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=332280.0, ans=0.025 2024-09-18 00:32:26,528 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 8.856e+01 9.608e+01 1.051e+02 2.791e+02, threshold=1.922e+02, percent-clipped=1.0 2024-09-18 00:32:29,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=332320.0, ans=0.125 2024-09-18 00:32:35,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=332360.0, ans=0.125 2024-09-18 00:32:50,578 INFO [train.py:1198] (1/2) Epoch 19, batch 1650, loss[loss=0.2538, ctc_loss=0.1417, cr_loss=0.3936, attn_decoder_loss=0.2575, over 29718.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1395, cr_loss=0.381, attn_decoder_loss=0.2521, over 5758218.97 frames. ], batch size: 89, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:32:54,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.43 vs. limit=15.0 2024-09-18 00:33:07,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=332440.0, ans=0.0 2024-09-18 00:33:16,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=332440.0, ans=0.125 2024-09-18 00:33:18,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=332440.0, ans=0.2 2024-09-18 00:33:21,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.71 vs. limit=15.0 2024-09-18 00:33:44,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-09-18 00:34:06,071 INFO [train.py:1198] (1/2) Epoch 19, batch 1700, loss[loss=0.2146, ctc_loss=0.1129, cr_loss=0.3286, attn_decoder_loss=0.2186, over 29601.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1394, cr_loss=0.382, attn_decoder_loss=0.252, over 5779724.36 frames. ], batch size: 69, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:34:09,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=332600.0, ans=0.0 2024-09-18 00:34:17,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=332600.0, ans=0.125 2024-09-18 00:34:21,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=332640.0, ans=0.125 2024-09-18 00:34:45,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=332680.0, ans=0.125 2024-09-18 00:34:51,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332720.0, ans=0.1 2024-09-18 00:34:59,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.557e+01 9.059e+01 9.709e+01 1.358e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 00:35:01,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=332720.0, ans=0.125 2024-09-18 00:35:04,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=332720.0, ans=0.2 2024-09-18 00:35:07,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=332760.0, ans=0.025 2024-09-18 00:35:09,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-18 00:35:26,339 INFO [train.py:1198] (1/2) Epoch 19, batch 1750, loss[loss=0.2227, ctc_loss=0.1189, cr_loss=0.3438, attn_decoder_loss=0.2266, over 29296.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1391, cr_loss=0.381, attn_decoder_loss=0.2518, over 5788339.36 frames. ], batch size: 67, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:35:53,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.85 vs. limit=15.0 2024-09-18 00:36:03,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=332880.0, ans=0.025 2024-09-18 00:36:12,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=332920.0, ans=0.0 2024-09-18 00:36:41,681 INFO [train.py:1198] (1/2) Epoch 19, batch 1800, loss[loss=0.2504, ctc_loss=0.138, cr_loss=0.3875, attn_decoder_loss=0.2543, over 29699.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1388, cr_loss=0.381, attn_decoder_loss=0.2515, over 5792120.17 frames. ], batch size: 83, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:37:31,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=333120.0, ans=0.025 2024-09-18 00:37:33,207 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.534e+01 9.002e+01 9.561e+01 2.098e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-18 00:37:57,823 INFO [train.py:1198] (1/2) Epoch 19, batch 1850, loss[loss=0.26, ctc_loss=0.1424, cr_loss=0.3824, attn_decoder_loss=0.2645, over 29633.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1388, cr_loss=0.3818, attn_decoder_loss=0.2513, over 5798065.09 frames. ], batch size: 86, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:38:31,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=333280.0, ans=0.0 2024-09-18 00:38:36,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=333280.0, ans=0.125 2024-09-18 00:38:36,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333280.0, ans=0.1 2024-09-18 00:38:43,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=333320.0, ans=0.125 2024-09-18 00:39:14,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=333400.0, ans=0.0 2024-09-18 00:39:15,881 INFO [train.py:1198] (1/2) Epoch 19, batch 1900, loss[loss=0.2562, ctc_loss=0.1376, cr_loss=0.3759, attn_decoder_loss=0.2611, over 29721.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1389, cr_loss=0.3828, attn_decoder_loss=0.2515, over 5805742.20 frames. ], batch size: 89, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:39:49,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=333480.0, ans=0.0 2024-09-18 00:39:52,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=333480.0, ans=0.0 2024-09-18 00:40:01,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=333480.0, ans=0.1 2024-09-18 00:40:06,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=22.5 2024-09-18 00:40:10,193 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 8.878e+01 9.424e+01 1.001e+02 2.862e+02, threshold=1.885e+02, percent-clipped=2.0 2024-09-18 00:40:25,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=333560.0, ans=0.0 2024-09-18 00:40:34,854 INFO [train.py:1198] (1/2) Epoch 19, batch 1950, loss[loss=0.2473, ctc_loss=0.1403, cr_loss=0.4016, attn_decoder_loss=0.2502, over 29432.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1401, cr_loss=0.385, attn_decoder_loss=0.2527, over 5820214.39 frames. ], batch size: 78, lr: 5.88e-03, grad_scale: 8.0 2024-09-18 00:41:38,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=333760.0, ans=0.125 2024-09-18 00:41:50,436 INFO [train.py:1198] (1/2) Epoch 19, batch 2000, loss[loss=0.2223, ctc_loss=0.1187, cr_loss=0.3514, attn_decoder_loss=0.226, over 29383.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1408, cr_loss=0.3855, attn_decoder_loss=0.2531, over 5796936.45 frames. ], batch size: 67, lr: 5.88e-03, grad_scale: 16.0 2024-09-18 00:41:50,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=333800.0, ans=0.125 2024-09-18 00:41:55,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=333800.0, ans=0.0 2024-09-18 00:41:58,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=333800.0, ans=0.125 2024-09-18 00:42:00,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=333800.0, ans=0.125 2024-09-18 00:42:17,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-09-18 00:42:20,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-09-18 00:42:24,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=333880.0, ans=0.125 2024-09-18 00:42:34,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=333920.0, ans=0.125 2024-09-18 00:42:41,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=333920.0, ans=0.125 2024-09-18 00:42:44,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=333920.0, ans=0.125 2024-09-18 00:42:45,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=333920.0, ans=12.0 2024-09-18 00:42:46,024 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.666e+01 9.128e+01 9.687e+01 2.181e+02, threshold=1.826e+02, percent-clipped=3.0 2024-09-18 00:42:55,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=333960.0, ans=0.025 2024-09-18 00:43:08,929 INFO [train.py:1198] (1/2) Epoch 19, batch 2050, loss[loss=0.2146, ctc_loss=0.1127, cr_loss=0.3275, attn_decoder_loss=0.2187, over 29419.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1404, cr_loss=0.3847, attn_decoder_loss=0.2524, over 5788542.30 frames. ], batch size: 70, lr: 5.88e-03, grad_scale: 8.0 2024-09-18 00:43:20,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=334000.0, ans=0.025 2024-09-18 00:43:45,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=334080.0, ans=0.07 2024-09-18 00:43:55,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.32 vs. limit=22.5 2024-09-18 00:44:01,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=334120.0, ans=0.125 2024-09-18 00:44:03,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=334120.0, ans=0.125 2024-09-18 00:44:27,291 INFO [train.py:1198] (1/2) Epoch 19, batch 2100, loss[loss=0.2544, ctc_loss=0.1412, cr_loss=0.3806, attn_decoder_loss=0.2585, over 29756.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1397, cr_loss=0.3833, attn_decoder_loss=0.252, over 5799168.52 frames. ], batch size: 81, lr: 5.88e-03, grad_scale: 8.0 2024-09-18 00:44:34,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=12.0 2024-09-18 00:44:49,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.99 vs. limit=10.0 2024-09-18 00:45:08,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=334280.0, ans=0.0 2024-09-18 00:45:09,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=334280.0, ans=15.0 2024-09-18 00:45:13,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.14 vs. limit=15.0 2024-09-18 00:45:17,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=334320.0, ans=0.0 2024-09-18 00:45:18,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-09-18 00:45:20,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.379e+01 9.013e+01 9.583e+01 3.257e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-18 00:45:26,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=334360.0, ans=0.2 2024-09-18 00:45:39,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2024-09-18 00:45:44,015 INFO [train.py:1198] (1/2) Epoch 19, batch 2150, loss[loss=0.2303, ctc_loss=0.1243, cr_loss=0.3633, attn_decoder_loss=0.234, over 29445.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1389, cr_loss=0.3818, attn_decoder_loss=0.2514, over 5813896.09 frames. ], batch size: 78, lr: 5.88e-03, grad_scale: 8.0 2024-09-18 00:45:45,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=334400.0, ans=0.125 2024-09-18 00:45:58,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=334440.0, ans=0.0 2024-09-18 00:46:01,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-18 00:46:14,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=334480.0, ans=0.2 2024-09-18 00:46:42,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=334520.0, ans=0.1 2024-09-18 00:46:53,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=334560.0, ans=0.0 2024-09-18 00:47:02,265 INFO [train.py:1198] (1/2) Epoch 19, batch 2200, loss[loss=0.2509, ctc_loss=0.1433, cr_loss=0.3772, attn_decoder_loss=0.2545, over 29604.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1389, cr_loss=0.3821, attn_decoder_loss=0.2513, over 5811439.74 frames. ], batch size: 86, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:47:40,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2024-09-18 00:47:41,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=334680.0, ans=0.0 2024-09-18 00:47:41,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=12.0 2024-09-18 00:47:57,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.512e+01 9.076e+01 9.778e+01 1.780e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 00:48:14,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=334760.0, ans=0.125 2024-09-18 00:48:15,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=334760.0, ans=10.0 2024-09-18 00:48:20,690 INFO [train.py:1198] (1/2) Epoch 19, batch 2250, loss[loss=0.2393, ctc_loss=0.1279, cr_loss=0.3539, attn_decoder_loss=0.2438, over 29705.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1385, cr_loss=0.3815, attn_decoder_loss=0.2511, over 5810661.66 frames. ], batch size: 82, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:48:37,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=334840.0, ans=0.0 2024-09-18 00:48:43,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=334840.0, ans=0.125 2024-09-18 00:48:54,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=334880.0, ans=0.0 2024-09-18 00:49:22,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=334960.0, ans=0.2 2024-09-18 00:49:22,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=334960.0, ans=0.2 2024-09-18 00:49:28,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2024-09-18 00:49:36,424 INFO [train.py:1198] (1/2) Epoch 19, batch 2300, loss[loss=0.2318, ctc_loss=0.1284, cr_loss=0.3744, attn_decoder_loss=0.2349, over 29335.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1379, cr_loss=0.3803, attn_decoder_loss=0.2503, over 5797957.27 frames. ], batch size: 71, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:49:38,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-09-18 00:49:44,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=335000.0, ans=0.0 2024-09-18 00:49:50,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=335040.0, ans=0.025 2024-09-18 00:50:03,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=335040.0, ans=0.125 2024-09-18 00:50:09,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=15.0 2024-09-18 00:50:22,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=335120.0, ans=0.125 2024-09-18 00:50:28,533 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:50:29,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.590e+01 9.155e+01 9.781e+01 6.273e+02, threshold=1.831e+02, percent-clipped=2.0 2024-09-18 00:50:31,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2024-09-18 00:50:37,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2024-09-18 00:50:43,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=335160.0, ans=0.035 2024-09-18 00:50:50,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=335160.0, ans=0.125 2024-09-18 00:50:55,509 INFO [train.py:1198] (1/2) Epoch 19, batch 2350, loss[loss=0.2642, ctc_loss=0.1546, cr_loss=0.4255, attn_decoder_loss=0.2669, over 29698.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1379, cr_loss=0.38, attn_decoder_loss=0.2506, over 5803381.22 frames. ], batch size: 83, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:50:55,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335200.0, ans=0.1 2024-09-18 00:50:58,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=335200.0, ans=0.125 2024-09-18 00:51:04,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=335200.0, ans=0.025 2024-09-18 00:52:13,763 INFO [train.py:1198] (1/2) Epoch 19, batch 2400, loss[loss=0.2429, ctc_loss=0.1437, cr_loss=0.3791, attn_decoder_loss=0.2455, over 29523.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1382, cr_loss=0.3802, attn_decoder_loss=0.2509, over 5807499.92 frames. ], batch size: 76, lr: 5.87e-03, grad_scale: 16.0 2024-09-18 00:52:21,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335400.0, ans=0.1 2024-09-18 00:52:41,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=335440.0, ans=0.1 2024-09-18 00:53:08,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.603e+01 9.064e+01 9.775e+01 3.534e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-18 00:53:20,947 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:53:29,715 INFO [train.py:1198] (1/2) Epoch 19, batch 2450, loss[loss=0.2388, ctc_loss=0.1294, cr_loss=0.3713, attn_decoder_loss=0.2427, over 29714.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1389, cr_loss=0.3814, attn_decoder_loss=0.2517, over 5785628.96 frames. ], batch size: 82, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:53:46,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=335640.0, ans=0.0 2024-09-18 00:53:55,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=335640.0, ans=0.025 2024-09-18 00:53:58,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=335680.0, ans=10.0 2024-09-18 00:54:15,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=335720.0, ans=0.1 2024-09-18 00:54:35,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=335760.0, ans=0.09899494936611666 2024-09-18 00:54:41,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.24 vs. limit=10.0 2024-09-18 00:54:47,583 INFO [train.py:1198] (1/2) Epoch 19, batch 2500, loss[loss=0.2587, ctc_loss=0.1435, cr_loss=0.3862, attn_decoder_loss=0.2629, over 29622.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1387, cr_loss=0.3813, attn_decoder_loss=0.2515, over 5796025.71 frames. ], batch size: 86, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 00:55:01,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=335840.0, ans=0.07 2024-09-18 00:55:25,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=335880.0, ans=0.1 2024-09-18 00:55:32,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=335880.0, ans=0.2 2024-09-18 00:55:40,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=335920.0, ans=0.1 2024-09-18 00:55:40,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=335920.0, ans=0.125 2024-09-18 00:55:44,877 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.526e+01 9.010e+01 9.846e+01 5.892e+02, threshold=1.802e+02, percent-clipped=2.0 2024-09-18 00:55:55,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.73 vs. limit=10.0 2024-09-18 00:56:03,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=335960.0, ans=0.0 2024-09-18 00:56:13,788 INFO [train.py:1198] (1/2) Epoch 19, batch 2550, loss[loss=0.2254, ctc_loss=0.1156, cr_loss=0.334, attn_decoder_loss=0.2302, over 29329.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1392, cr_loss=0.382, attn_decoder_loss=0.252, over 5798069.40 frames. ], batch size: 67, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 00:56:38,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-09-18 00:56:39,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=336040.0, ans=0.125 2024-09-18 00:56:44,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=336080.0, ans=0.015 2024-09-18 00:56:45,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=336080.0, ans=0.2 2024-09-18 00:56:53,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=336080.0, ans=0.0 2024-09-18 00:57:29,676 INFO [train.py:1198] (1/2) Epoch 19, batch 2600, loss[loss=0.2308, ctc_loss=0.1171, cr_loss=0.3405, attn_decoder_loss=0.2359, over 29449.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1394, cr_loss=0.3825, attn_decoder_loss=0.2525, over 5794928.97 frames. ], batch size: 78, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 00:57:49,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=336240.0, ans=0.125 2024-09-18 00:57:53,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-09-18 00:58:26,745 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.566e+01 8.963e+01 9.636e+01 1.354e+02, threshold=1.793e+02, percent-clipped=0.0 2024-09-18 00:58:47,578 INFO [train.py:1198] (1/2) Epoch 19, batch 2650, loss[loss=0.2722, ctc_loss=0.1641, cr_loss=0.4342, attn_decoder_loss=0.2745, over 29259.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1394, cr_loss=0.3829, attn_decoder_loss=0.2526, over 5801340.32 frames. ], batch size: 100, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 00:59:10,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=336440.0, ans=0.2 2024-09-18 00:59:17,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=336440.0, ans=0.1 2024-09-18 00:59:23,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=336480.0, ans=0.125 2024-09-18 00:59:37,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=336520.0, ans=0.125 2024-09-18 00:59:46,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=336520.0, ans=0.09899494936611666 2024-09-18 00:59:49,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=336560.0, ans=0.125 2024-09-18 01:00:03,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.30 vs. limit=15.0 2024-09-18 01:00:05,874 INFO [train.py:1198] (1/2) Epoch 19, batch 2700, loss[loss=0.2473, ctc_loss=0.1297, cr_loss=0.3535, attn_decoder_loss=0.2526, over 29501.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1396, cr_loss=0.3831, attn_decoder_loss=0.2529, over 5797444.72 frames. ], batch size: 87, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 01:00:25,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=336640.0, ans=0.2 2024-09-18 01:00:32,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=336640.0, ans=12.0 2024-09-18 01:00:36,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=336680.0, ans=0.0 2024-09-18 01:00:50,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=336720.0, ans=12.0 2024-09-18 01:00:56,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=336720.0, ans=0.95 2024-09-18 01:01:00,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 8.475e+01 9.059e+01 9.583e+01 2.142e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-18 01:01:00,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=336720.0, ans=0.025 2024-09-18 01:01:22,318 INFO [train.py:1198] (1/2) Epoch 19, batch 2750, loss[loss=0.2368, ctc_loss=0.1326, cr_loss=0.3679, attn_decoder_loss=0.2402, over 29493.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1388, cr_loss=0.3815, attn_decoder_loss=0.2517, over 5794936.44 frames. ], batch size: 75, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 01:01:22,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=336800.0, ans=0.1 2024-09-18 01:01:37,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=336840.0, ans=0.125 2024-09-18 01:01:45,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=336840.0, ans=0.125 2024-09-18 01:01:54,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.21 vs. limit=22.5 2024-09-18 01:02:04,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=336880.0, ans=0.125 2024-09-18 01:02:19,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=336920.0, ans=0.125 2024-09-18 01:02:28,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=336960.0, ans=0.0 2024-09-18 01:02:39,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=337000.0, ans=0.0 2024-09-18 01:02:40,759 INFO [train.py:1198] (1/2) Epoch 19, batch 2800, loss[loss=0.2718, ctc_loss=0.1712, cr_loss=0.3897, attn_decoder_loss=0.2744, over 20130.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1391, cr_loss=0.3817, attn_decoder_loss=0.2519, over 5775057.38 frames. ], batch size: 209, lr: 5.85e-03, grad_scale: 16.0 2024-09-18 01:02:45,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=337000.0, ans=0.0 2024-09-18 01:03:01,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=337040.0, ans=15.0 2024-09-18 01:03:21,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.86 vs. limit=15.0 2024-09-18 01:03:33,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.55 vs. limit=15.0 2024-09-18 01:03:38,940 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 9.014e+01 9.335e+01 1.020e+02 1.618e+02, threshold=1.867e+02, percent-clipped=0.0 2024-09-18 01:03:47,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-09-18 01:03:58,621 INFO [train.py:1198] (1/2) Epoch 19, batch 2850, loss[loss=0.2419, ctc_loss=0.1344, cr_loss=0.3889, attn_decoder_loss=0.2452, over 29513.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1397, cr_loss=0.3826, attn_decoder_loss=0.2523, over 5761153.06 frames. ], batch size: 77, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:04:05,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=337200.0, ans=0.125 2024-09-18 01:04:08,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=337200.0, ans=0.0 2024-09-18 01:04:09,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=337200.0, ans=0.2 2024-09-18 01:04:28,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.79 vs. limit=22.5 2024-09-18 01:04:35,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-09-18 01:04:44,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=337320.0, ans=0.0 2024-09-18 01:04:54,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=337320.0, ans=0.5 2024-09-18 01:04:57,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2024-09-18 01:05:15,150 INFO [train.py:1198] (1/2) Epoch 19, batch 2900, loss[loss=0.2387, ctc_loss=0.1416, cr_loss=0.3771, attn_decoder_loss=0.2411, over 29417.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1407, cr_loss=0.3851, attn_decoder_loss=0.2535, over 5787167.01 frames. ], batch size: 79, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:05:17,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2024-09-18 01:05:24,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=337400.0, ans=0.0 2024-09-18 01:05:41,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=337440.0, ans=0.125 2024-09-18 01:06:06,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.97 vs. limit=22.5 2024-09-18 01:06:13,397 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.650e+01 9.061e+01 9.798e+01 5.022e+02, threshold=1.812e+02, percent-clipped=2.0 2024-09-18 01:06:14,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.50 vs. limit=15.0 2024-09-18 01:06:33,629 INFO [train.py:1198] (1/2) Epoch 19, batch 2950, loss[loss=0.2388, ctc_loss=0.1339, cr_loss=0.3906, attn_decoder_loss=0.2418, over 29521.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1399, cr_loss=0.3835, attn_decoder_loss=0.2523, over 5780823.44 frames. ], batch size: 75, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:06:51,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=12.0 2024-09-18 01:07:09,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.89 vs. limit=22.5 2024-09-18 01:07:17,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=337680.0, ans=0.0 2024-09-18 01:07:20,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=337720.0, ans=0.125 2024-09-18 01:07:26,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=337720.0, ans=0.0 2024-09-18 01:07:52,296 INFO [train.py:1198] (1/2) Epoch 19, batch 3000, loss[loss=0.2484, ctc_loss=0.1338, cr_loss=0.3785, attn_decoder_loss=0.2527, over 29731.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1397, cr_loss=0.3828, attn_decoder_loss=0.252, over 5782818.58 frames. ], batch size: 81, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:07:52,297 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 01:08:10,717 INFO [train.py:1230] (1/2) Epoch 19, validation: loss=0.2115, ctc_loss=0.0393, cr_loss=5.039e-15, attn_decoder_loss=0.2306, over 944034.00 frames. 2024-09-18 01:08:10,717 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 01:08:14,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=337800.0, ans=0.1 2024-09-18 01:08:16,033 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:08:20,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=337800.0, ans=0.04949747468305833 2024-09-18 01:08:21,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2024-09-18 01:08:22,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2024-09-18 01:08:42,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=337880.0, ans=0.2 2024-09-18 01:09:07,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.664e+01 9.190e+01 9.808e+01 2.398e+02, threshold=1.838e+02, percent-clipped=1.0 2024-09-18 01:09:26,832 INFO [train.py:1198] (1/2) Epoch 19, batch 3050, loss[loss=0.2215, ctc_loss=0.1115, cr_loss=0.3124, attn_decoder_loss=0.2267, over 29533.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1406, cr_loss=0.3842, attn_decoder_loss=0.2528, over 5775729.66 frames. ], batch size: 76, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:09:39,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=338000.0, ans=0.0 2024-09-18 01:09:39,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=338000.0, ans=0.0 2024-09-18 01:09:41,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=338040.0, ans=0.95 2024-09-18 01:09:47,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=338040.0, ans=0.2 2024-09-18 01:09:57,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.41 vs. limit=15.0 2024-09-18 01:10:16,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-18 01:10:20,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=338120.0, ans=0.2 2024-09-18 01:10:31,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=338160.0, ans=0.0 2024-09-18 01:10:44,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=338200.0, ans=0.125 2024-09-18 01:10:45,194 INFO [train.py:1198] (1/2) Epoch 19, batch 3100, loss[loss=0.2683, ctc_loss=0.147, cr_loss=0.4035, attn_decoder_loss=0.2728, over 29222.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1399, cr_loss=0.3832, attn_decoder_loss=0.2524, over 5775825.39 frames. ], batch size: 100, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:10:45,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=338200.0, ans=0.125 2024-09-18 01:10:55,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=338200.0, ans=0.2 2024-09-18 01:11:07,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=338240.0, ans=0.1 2024-09-18 01:11:24,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338280.0, ans=0.1 2024-09-18 01:11:39,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=338320.0, ans=0.0 2024-09-18 01:11:42,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=338320.0, ans=0.0 2024-09-18 01:11:43,976 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.533e+01 9.118e+01 9.870e+01 1.992e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-18 01:12:04,279 INFO [train.py:1198] (1/2) Epoch 19, batch 3150, loss[loss=0.2636, ctc_loss=0.1458, cr_loss=0.4057, attn_decoder_loss=0.2676, over 28798.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1397, cr_loss=0.3828, attn_decoder_loss=0.2524, over 5783642.04 frames. ], batch size: 104, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:12:22,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=338440.0, ans=0.2 2024-09-18 01:12:50,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=338520.0, ans=0.0 2024-09-18 01:13:10,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.81 vs. limit=15.0 2024-09-18 01:13:20,347 INFO [train.py:1198] (1/2) Epoch 19, batch 3200, loss[loss=0.2388, ctc_loss=0.1257, cr_loss=0.3546, attn_decoder_loss=0.2435, over 29408.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.139, cr_loss=0.3815, attn_decoder_loss=0.2517, over 5794209.70 frames. ], batch size: 79, lr: 5.84e-03, grad_scale: 16.0 2024-09-18 01:13:23,584 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:13:43,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=22.5 2024-09-18 01:13:50,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=338640.0, ans=0.125 2024-09-18 01:13:50,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=338640.0, ans=0.2 2024-09-18 01:14:02,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.90 vs. limit=15.0 2024-09-18 01:14:20,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.580e+01 9.076e+01 9.687e+01 2.351e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-18 01:14:20,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=338720.0, ans=0.1 2024-09-18 01:14:34,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=338760.0, ans=0.0 2024-09-18 01:14:35,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=338760.0, ans=0.125 2024-09-18 01:14:38,536 INFO [train.py:1198] (1/2) Epoch 19, batch 3250, loss[loss=0.2639, ctc_loss=0.1492, cr_loss=0.3951, attn_decoder_loss=0.2678, over 29716.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1394, cr_loss=0.382, attn_decoder_loss=0.2522, over 5800170.48 frames. ], batch size: 84, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:14:56,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=338840.0, ans=0.125 2024-09-18 01:14:56,109 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:15:14,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=338880.0, ans=0.125 2024-09-18 01:15:19,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=338880.0, ans=0.125 2024-09-18 01:15:23,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=338880.0, ans=22.5 2024-09-18 01:15:31,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.27 vs. limit=15.0 2024-09-18 01:15:38,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-18 01:15:56,362 INFO [train.py:1198] (1/2) Epoch 19, batch 3300, loss[loss=0.2551, ctc_loss=0.1284, cr_loss=0.3701, attn_decoder_loss=0.261, over 28434.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.138, cr_loss=0.3802, attn_decoder_loss=0.2508, over 5797271.16 frames. ], batch size: 111, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:16:12,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=339040.0, ans=0.125 2024-09-18 01:16:18,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=339040.0, ans=0.125 2024-09-18 01:16:22,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=339040.0, ans=0.2 2024-09-18 01:16:27,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=339080.0, ans=0.2 2024-09-18 01:16:27,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=339080.0, ans=0.0 2024-09-18 01:16:43,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=339120.0, ans=0.0 2024-09-18 01:16:53,878 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.663e+01 9.126e+01 9.763e+01 2.623e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-18 01:16:57,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=339160.0, ans=0.0 2024-09-18 01:17:12,575 INFO [train.py:1198] (1/2) Epoch 19, batch 3350, loss[loss=0.259, ctc_loss=0.1512, cr_loss=0.4029, attn_decoder_loss=0.262, over 28818.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1386, cr_loss=0.3806, attn_decoder_loss=0.2514, over 5775446.06 frames. ], batch size: 104, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:17:20,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=339200.0, ans=0.125 2024-09-18 01:17:53,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=339280.0, ans=0.2 2024-09-18 01:18:11,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=339320.0, ans=0.2 2024-09-18 01:18:18,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=339360.0, ans=0.125 2024-09-18 01:18:30,834 INFO [train.py:1198] (1/2) Epoch 19, batch 3400, loss[loss=0.2297, ctc_loss=0.1345, cr_loss=0.3683, attn_decoder_loss=0.2321, over 29310.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1394, cr_loss=0.382, attn_decoder_loss=0.2517, over 5768198.97 frames. ], batch size: 67, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:18:34,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=339400.0, ans=0.09899494936611666 2024-09-18 01:18:34,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=339400.0, ans=0.0 2024-09-18 01:19:22,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=22.5 2024-09-18 01:19:30,514 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.511e+01 9.195e+01 9.878e+01 2.681e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-18 01:19:35,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=339560.0, ans=0.125 2024-09-18 01:19:37,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339560.0, ans=0.125 2024-09-18 01:19:42,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=339560.0, ans=0.2 2024-09-18 01:19:48,773 INFO [train.py:1198] (1/2) Epoch 19, batch 3450, loss[loss=0.2535, ctc_loss=0.1373, cr_loss=0.3825, attn_decoder_loss=0.2579, over 28342.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1394, cr_loss=0.3827, attn_decoder_loss=0.2519, over 5775602.69 frames. ], batch size: 111, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:19:49,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=339600.0, ans=0.125 2024-09-18 01:20:26,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=339680.0, ans=0.2 2024-09-18 01:20:30,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=339680.0, ans=0.025 2024-09-18 01:20:39,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=339720.0, ans=0.95 2024-09-18 01:20:45,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339720.0, ans=0.1 2024-09-18 01:21:04,604 INFO [train.py:1198] (1/2) Epoch 19, batch 3500, loss[loss=0.2259, ctc_loss=0.1243, cr_loss=0.3384, attn_decoder_loss=0.2296, over 29309.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1395, cr_loss=0.3822, attn_decoder_loss=0.2516, over 5777124.38 frames. ], batch size: 71, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:21:34,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=339840.0, ans=0.0 2024-09-18 01:21:48,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.42 vs. limit=22.5 2024-09-18 01:21:52,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=339920.0, ans=0.025 2024-09-18 01:21:59,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=339920.0, ans=0.125 2024-09-18 01:22:04,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.478e+01 8.934e+01 9.584e+01 2.565e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 01:22:10,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=339960.0, ans=0.07 2024-09-18 01:22:13,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=339960.0, ans=0.125 2024-09-18 01:22:13,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=339960.0, ans=0.05 2024-09-18 01:22:22,258 INFO [train.py:1198] (1/2) Epoch 19, batch 3550, loss[loss=0.2492, ctc_loss=0.1269, cr_loss=0.371, attn_decoder_loss=0.2546, over 29737.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1393, cr_loss=0.3823, attn_decoder_loss=0.2516, over 5783482.58 frames. ], batch size: 89, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:22:27,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=340000.0, ans=0.125 2024-09-18 01:22:28,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=340000.0, ans=0.125 2024-09-18 01:22:52,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=340080.0, ans=0.0 2024-09-18 01:23:16,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=340120.0, ans=0.125 2024-09-18 01:23:16,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=340120.0, ans=0.0 2024-09-18 01:23:19,344 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:23:29,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=340160.0, ans=0.125 2024-09-18 01:23:34,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=340160.0, ans=0.125 2024-09-18 01:23:37,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=340200.0, ans=0.125 2024-09-18 01:23:38,727 INFO [train.py:1198] (1/2) Epoch 19, batch 3600, loss[loss=0.2378, ctc_loss=0.1299, cr_loss=0.378, attn_decoder_loss=0.2413, over 29525.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1393, cr_loss=0.3828, attn_decoder_loss=0.2517, over 5792308.23 frames. ], batch size: 77, lr: 5.83e-03, grad_scale: 16.0 2024-09-18 01:23:42,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340200.0, ans=0.1 2024-09-18 01:23:59,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.36 vs. limit=15.0 2024-09-18 01:24:00,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=340240.0, ans=0.125 2024-09-18 01:24:21,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=340280.0, ans=0.0 2024-09-18 01:24:37,093 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.610e+01 9.225e+01 9.925e+01 8.683e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-18 01:24:53,597 INFO [train.py:1198] (1/2) Epoch 19, batch 3650, loss[loss=0.2554, ctc_loss=0.1373, cr_loss=0.3723, attn_decoder_loss=0.2602, over 29502.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1386, cr_loss=0.3818, attn_decoder_loss=0.2511, over 5795045.14 frames. ], batch size: 90, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:25:26,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=340480.0, ans=0.125 2024-09-18 01:25:27,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-09-18 01:25:49,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=340520.0, ans=0.0 2024-09-18 01:25:55,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=340560.0, ans=0.025 2024-09-18 01:25:56,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2024-09-18 01:25:57,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=16.60 vs. limit=15.0 2024-09-18 01:26:03,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=340560.0, ans=0.125 2024-09-18 01:26:03,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=340560.0, ans=10.0 2024-09-18 01:26:08,874 INFO [train.py:1198] (1/2) Epoch 19, batch 3700, loss[loss=0.2625, ctc_loss=0.1442, cr_loss=0.3835, attn_decoder_loss=0.2671, over 29687.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1385, cr_loss=0.3822, attn_decoder_loss=0.2511, over 5804802.43 frames. ], batch size: 84, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:27:03,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=340720.0, ans=0.125 2024-09-18 01:27:07,335 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.588e+01 9.238e+01 9.671e+01 4.711e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-18 01:27:15,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=340760.0, ans=10.0 2024-09-18 01:27:24,446 INFO [train.py:1198] (1/2) Epoch 19, batch 3750, loss[loss=0.2166, ctc_loss=0.1188, cr_loss=0.3339, attn_decoder_loss=0.22, over 29319.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1382, cr_loss=0.3812, attn_decoder_loss=0.251, over 5808480.56 frames. ], batch size: 67, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:27:38,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=340800.0, ans=0.0 2024-09-18 01:27:52,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=340840.0, ans=0.125 2024-09-18 01:27:55,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=22.5 2024-09-18 01:28:23,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340920.0, ans=0.125 2024-09-18 01:28:39,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=341000.0, ans=0.0 2024-09-18 01:28:41,212 INFO [train.py:1198] (1/2) Epoch 19, batch 3800, loss[loss=0.2571, ctc_loss=0.137, cr_loss=0.3705, attn_decoder_loss=0.2622, over 29649.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1382, cr_loss=0.3807, attn_decoder_loss=0.2507, over 5798958.90 frames. ], batch size: 86, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:28:52,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=341000.0, ans=0.0 2024-09-18 01:28:53,699 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:28:54,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-09-18 01:29:02,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=341040.0, ans=0.2 2024-09-18 01:29:35,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=341120.0, ans=0.125 2024-09-18 01:29:36,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=341120.0, ans=0.125 2024-09-18 01:29:39,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.913e+01 9.389e+01 9.954e+01 1.370e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-18 01:29:56,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=341200.0, ans=0.0 2024-09-18 01:29:57,763 INFO [train.py:1198] (1/2) Epoch 19, batch 3850, loss[loss=0.2558, ctc_loss=0.1431, cr_loss=0.3881, attn_decoder_loss=0.2597, over 29276.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1382, cr_loss=0.3808, attn_decoder_loss=0.2507, over 5812688.75 frames. ], batch size: 100, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:30:01,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341200.0, ans=0.1 2024-09-18 01:30:26,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=341280.0, ans=0.0 2024-09-18 01:31:08,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=341360.0, ans=0.125 2024-09-18 01:31:12,348 INFO [train.py:1198] (1/2) Epoch 19, batch 3900, loss[loss=0.2537, ctc_loss=0.1343, cr_loss=0.3714, attn_decoder_loss=0.2587, over 29644.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.138, cr_loss=0.3808, attn_decoder_loss=0.251, over 5817151.67 frames. ], batch size: 86, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:31:17,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=15.0 2024-09-18 01:31:18,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2024-09-18 01:31:30,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.74 vs. limit=12.0 2024-09-18 01:31:31,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=341440.0, ans=0.0 2024-09-18 01:31:45,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=341480.0, ans=0.0 2024-09-18 01:31:48,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2024-09-18 01:31:58,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=341520.0, ans=0.0 2024-09-18 01:32:02,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-09-18 01:32:04,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=341520.0, ans=0.125 2024-09-18 01:32:10,245 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.574e+01 8.925e+01 9.348e+01 1.659e+02, threshold=1.785e+02, percent-clipped=0.0 2024-09-18 01:32:22,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341560.0, ans=0.1 2024-09-18 01:32:27,242 INFO [train.py:1198] (1/2) Epoch 19, batch 3950, loss[loss=0.2771, ctc_loss=0.1696, cr_loss=0.4257, attn_decoder_loss=0.2796, over 29486.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1382, cr_loss=0.3814, attn_decoder_loss=0.2514, over 5836576.58 frames. ], batch size: 97, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:32:42,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=341640.0, ans=0.95 2024-09-18 01:32:45,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=341640.0, ans=0.125 2024-09-18 01:33:21,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=341720.0, ans=0.0 2024-09-18 01:33:30,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=341760.0, ans=0.125 2024-09-18 01:33:32,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=341760.0, ans=0.0 2024-09-18 01:33:42,796 INFO [train.py:1198] (1/2) Epoch 19, batch 4000, loss[loss=0.2234, ctc_loss=0.1145, cr_loss=0.3299, attn_decoder_loss=0.2282, over 29523.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1387, cr_loss=0.3814, attn_decoder_loss=0.2515, over 5814087.24 frames. ], batch size: 74, lr: 5.81e-03, grad_scale: 16.0 2024-09-18 01:33:42,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=341800.0, ans=0.125 2024-09-18 01:33:53,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=341800.0, ans=0.125 2024-09-18 01:34:08,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.52 vs. limit=15.0 2024-09-18 01:34:33,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=341920.0, ans=0.125 2024-09-18 01:34:37,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341920.0, ans=0.1 2024-09-18 01:34:41,832 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.874e+01 9.386e+01 1.032e+02 2.674e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-18 01:34:57,881 INFO [train.py:1198] (1/2) Epoch 19, batch 4050, loss[loss=0.2753, ctc_loss=0.1862, cr_loss=0.3955, attn_decoder_loss=0.2764, over 19136.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.139, cr_loss=0.3815, attn_decoder_loss=0.2514, over 5795870.85 frames. ], batch size: 209, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:34:59,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=342000.0, ans=0.035 2024-09-18 01:35:07,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=342000.0, ans=0.125 2024-09-18 01:35:37,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=342080.0, ans=0.0 2024-09-18 01:36:02,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=342160.0, ans=0.125 2024-09-18 01:36:11,637 INFO [train.py:1198] (1/2) Epoch 19, batch 4100, loss[loss=0.2695, ctc_loss=0.1603, cr_loss=0.4138, attn_decoder_loss=0.2724, over 29521.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1393, cr_loss=0.3821, attn_decoder_loss=0.2516, over 5791968.96 frames. ], batch size: 90, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:36:35,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.23 vs. limit=15.0 2024-09-18 01:36:55,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.87 vs. limit=10.0 2024-09-18 01:37:11,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.625e+01 9.215e+01 9.767e+01 2.484e+02, threshold=1.843e+02, percent-clipped=3.0 2024-09-18 01:37:16,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=342360.0, ans=0.0 2024-09-18 01:37:27,168 INFO [train.py:1198] (1/2) Epoch 19, batch 4150, loss[loss=0.2478, ctc_loss=0.1504, cr_loss=0.3944, attn_decoder_loss=0.2499, over 29474.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1392, cr_loss=0.3817, attn_decoder_loss=0.2514, over 5797309.39 frames. ], batch size: 77, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:37:34,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=342400.0, ans=0.125 2024-09-18 01:37:49,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342440.0, ans=0.1 2024-09-18 01:38:18,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2024-09-18 01:38:21,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=342520.0, ans=0.0 2024-09-18 01:38:32,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=342560.0, ans=0.125 2024-09-18 01:38:40,956 INFO [train.py:1198] (1/2) Epoch 19, batch 4200, loss[loss=0.2681, ctc_loss=0.1609, cr_loss=0.4158, attn_decoder_loss=0.2708, over 29518.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1392, cr_loss=0.3818, attn_decoder_loss=0.2518, over 5799788.25 frames. ], batch size: 90, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:38:41,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=342600.0, ans=0.2 2024-09-18 01:39:06,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=342640.0, ans=0.125 2024-09-18 01:39:12,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2024-09-18 01:39:16,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-09-18 01:39:35,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=342720.0, ans=0.04949747468305833 2024-09-18 01:39:41,096 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.502e+01 9.115e+01 9.695e+01 2.005e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-18 01:39:55,935 INFO [train.py:1198] (1/2) Epoch 19, batch 4250, loss[loss=0.2266, ctc_loss=0.1199, cr_loss=0.3393, attn_decoder_loss=0.2309, over 29513.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1392, cr_loss=0.3815, attn_decoder_loss=0.2519, over 5805227.55 frames. ], batch size: 74, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:39:59,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=342800.0, ans=0.0 2024-09-18 01:39:59,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=342800.0, ans=0.0 2024-09-18 01:40:08,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342800.0, ans=0.1 2024-09-18 01:40:13,051 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-18 01:40:41,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.87 vs. limit=10.0 2024-09-18 01:40:42,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=342920.0, ans=0.125 2024-09-18 01:40:45,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=342920.0, ans=0.125 2024-09-18 01:41:02,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=342960.0, ans=0.125 2024-09-18 01:41:09,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=343000.0, ans=0.125 2024-09-18 01:41:11,123 INFO [train.py:1198] (1/2) Epoch 19, batch 4300, loss[loss=0.26, ctc_loss=0.1423, cr_loss=0.385, attn_decoder_loss=0.2645, over 29515.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1388, cr_loss=0.3808, attn_decoder_loss=0.252, over 5794162.45 frames. ], batch size: 87, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:41:14,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=343000.0, ans=0.2 2024-09-18 01:41:28,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.98 vs. limit=15.0 2024-09-18 01:41:32,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=343040.0, ans=0.125 2024-09-18 01:41:44,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=343080.0, ans=0.0 2024-09-18 01:41:45,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=343080.0, ans=0.2 2024-09-18 01:41:47,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=343080.0, ans=0.0 2024-09-18 01:41:59,087 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:42:02,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=343120.0, ans=0.125 2024-09-18 01:42:10,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.890e+01 9.360e+01 1.027e+02 1.828e+02, threshold=1.872e+02, percent-clipped=1.0 2024-09-18 01:42:11,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.61 vs. limit=10.0 2024-09-18 01:42:15,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=343160.0, ans=0.125 2024-09-18 01:42:20,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=343160.0, ans=0.1 2024-09-18 01:42:27,033 INFO [train.py:1198] (1/2) Epoch 19, batch 4350, loss[loss=0.2622, ctc_loss=0.1494, cr_loss=0.3903, attn_decoder_loss=0.2661, over 29502.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1417, cr_loss=0.3866, attn_decoder_loss=0.2555, over 5796318.74 frames. ], batch size: 97, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:42:39,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=343200.0, ans=0.0 2024-09-18 01:43:09,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=343320.0, ans=0.025 2024-09-18 01:43:17,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.18 vs. limit=15.0 2024-09-18 01:43:26,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.80 vs. limit=15.0 2024-09-18 01:43:30,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=343360.0, ans=0.125 2024-09-18 01:43:41,028 INFO [train.py:1198] (1/2) Epoch 19, batch 4400, loss[loss=0.275, ctc_loss=0.1632, cr_loss=0.4437, attn_decoder_loss=0.2776, over 27302.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1433, cr_loss=0.3895, attn_decoder_loss=0.2576, over 5765858.35 frames. ], batch size: 125, lr: 5.80e-03, grad_scale: 16.0 2024-09-18 01:43:54,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=343440.0, ans=0.125 2024-09-18 01:44:20,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=343480.0, ans=0.125 2024-09-18 01:44:21,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=6.71 vs. limit=12.0 2024-09-18 01:44:26,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=343520.0, ans=0.025 2024-09-18 01:44:40,313 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:44:41,226 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.011e+01 9.170e+01 9.647e+01 1.019e+02 1.899e+02, threshold=1.929e+02, percent-clipped=1.0 2024-09-18 01:44:52,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=343560.0, ans=0.125 2024-09-18 01:44:54,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.71 vs. limit=15.0 2024-09-18 01:44:55,327 INFO [train.py:1198] (1/2) Epoch 19, batch 4450, loss[loss=0.2793, ctc_loss=0.1866, cr_loss=0.4167, attn_decoder_loss=0.2803, over 20113.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.1476, cr_loss=0.3944, attn_decoder_loss=0.26, over 5573102.29 frames. ], batch size: 210, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:44:55,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=343600.0, ans=0.2 2024-09-18 01:44:59,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-09-18 01:45:47,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=343720.0, ans=0.0 2024-09-18 01:45:49,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=343720.0, ans=0.125 2024-09-18 01:46:07,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=343760.0, ans=0.0 2024-09-18 01:46:11,293 INFO [train.py:1198] (1/2) Epoch 19, batch 4500, loss[loss=0.2662, ctc_loss=0.1598, cr_loss=0.4003, attn_decoder_loss=0.2691, over 21170.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1527, cr_loss=0.3973, attn_decoder_loss=0.2625, over 5234558.80 frames. ], batch size: 209, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:46:13,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=343800.0, ans=0.125 2024-09-18 01:46:36,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-09-18 01:46:41,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=343880.0, ans=0.0 2024-09-18 01:47:35,799 INFO [train.py:1198] (1/2) Epoch 20, batch 0, loss[loss=0.2281, ctc_loss=0.123, cr_loss=0.3448, attn_decoder_loss=0.2321, over 29589.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.123, cr_loss=0.3448, attn_decoder_loss=0.2321, over 29589.00 frames. ], batch size: 73, lr: 5.65e-03, grad_scale: 16.0 2024-09-18 01:47:35,800 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 01:47:41,226 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7596, 3.6028, 3.3981, 3.7046], device='cuda:1') 2024-09-18 01:47:53,061 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.8772, 4.5716, 4.4670, 4.3070], device='cuda:1') 2024-09-18 01:47:54,256 INFO [train.py:1230] (1/2) Epoch 20, validation: loss=0.2118, ctc_loss=0.0395, cr_loss=4.878e-15, attn_decoder_loss=0.2309, over 944034.00 frames. 2024-09-18 01:47:54,256 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 01:48:08,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=343900.0, ans=0.125 2024-09-18 01:48:23,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.011e+01 1.094e+02 1.165e+02 1.257e+02 3.397e+02, threshold=2.331e+02, percent-clipped=2.0 2024-09-18 01:48:31,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=343980.0, ans=0.125 2024-09-18 01:48:36,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=343980.0, ans=0.09899494936611666 2024-09-18 01:48:40,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=344020.0, ans=0.2 2024-09-18 01:48:55,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=344060.0, ans=0.0 2024-09-18 01:48:59,085 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:49:12,438 INFO [train.py:1198] (1/2) Epoch 20, batch 50, loss[loss=0.2262, ctc_loss=0.1192, cr_loss=0.3518, attn_decoder_loss=0.2302, over 29457.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1399, cr_loss=0.3814, attn_decoder_loss=0.252, over 1268105.74 frames. ], batch size: 70, lr: 5.64e-03, grad_scale: 4.0 2024-09-18 01:49:37,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=344140.0, ans=0.0 2024-09-18 01:49:47,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344180.0, ans=0.1 2024-09-18 01:49:57,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.33 vs. limit=15.0 2024-09-18 01:50:12,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=344260.0, ans=0.5 2024-09-18 01:50:17,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2024-09-18 01:50:22,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=344260.0, ans=0.2 2024-09-18 01:50:23,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-09-18 01:50:28,288 INFO [train.py:1198] (1/2) Epoch 20, batch 100, loss[loss=0.2408, ctc_loss=0.1294, cr_loss=0.343, attn_decoder_loss=0.2456, over 29533.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1405, cr_loss=0.3839, attn_decoder_loss=0.2535, over 2252628.55 frames. ], batch size: 76, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:50:28,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=344300.0, ans=0.125 2024-09-18 01:50:46,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=344340.0, ans=0.125 2024-09-18 01:50:48,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.70 vs. limit=10.0 2024-09-18 01:50:51,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2024-09-18 01:50:55,263 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.781e+01 9.298e+01 1.012e+02 1.493e+02, threshold=1.860e+02, percent-clipped=0.0 2024-09-18 01:51:29,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=344460.0, ans=0.2 2024-09-18 01:51:39,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=344460.0, ans=0.125 2024-09-18 01:51:45,540 INFO [train.py:1198] (1/2) Epoch 20, batch 150, loss[loss=0.2269, ctc_loss=0.1208, cr_loss=0.3633, attn_decoder_loss=0.2306, over 29459.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1381, cr_loss=0.3809, attn_decoder_loss=0.2514, over 3047922.64 frames. ], batch size: 70, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:51:47,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=344500.0, ans=0.2 2024-09-18 01:51:52,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=344500.0, ans=0.0 2024-09-18 01:52:04,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=344540.0, ans=0.125 2024-09-18 01:52:15,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=344540.0, ans=0.2 2024-09-18 01:52:33,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=344620.0, ans=0.05 2024-09-18 01:52:36,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=344620.0, ans=0.125 2024-09-18 01:52:43,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=344620.0, ans=0.025 2024-09-18 01:53:03,322 INFO [train.py:1198] (1/2) Epoch 20, batch 200, loss[loss=0.2731, ctc_loss=0.1667, cr_loss=0.45, attn_decoder_loss=0.275, over 27516.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1378, cr_loss=0.381, attn_decoder_loss=0.2508, over 3659966.63 frames. ], batch size: 124, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:53:18,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=344740.0, ans=0.0 2024-09-18 01:53:30,640 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.380e+01 8.894e+01 9.610e+01 1.111e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-18 01:53:40,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=344780.0, ans=0.125 2024-09-18 01:53:55,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=344820.0, ans=0.09899494936611666 2024-09-18 01:54:01,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=344820.0, ans=0.0 2024-09-18 01:54:18,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=344900.0, ans=0.125 2024-09-18 01:54:19,421 INFO [train.py:1198] (1/2) Epoch 20, batch 250, loss[loss=0.2556, ctc_loss=0.1426, cr_loss=0.3784, attn_decoder_loss=0.2597, over 29300.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1375, cr_loss=0.3811, attn_decoder_loss=0.2508, over 4142432.13 frames. ], batch size: 100, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:54:24,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.74 vs. limit=22.5 2024-09-18 01:54:54,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=344980.0, ans=0.125 2024-09-18 01:54:59,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=344980.0, ans=0.125 2024-09-18 01:54:59,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=344980.0, ans=0.125 2024-09-18 01:54:59,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=344980.0, ans=10.0 2024-09-18 01:55:12,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=345020.0, ans=0.05 2024-09-18 01:55:37,647 INFO [train.py:1198] (1/2) Epoch 20, batch 300, loss[loss=0.26, ctc_loss=0.1525, cr_loss=0.4205, attn_decoder_loss=0.2626, over 29553.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1374, cr_loss=0.3811, attn_decoder_loss=0.2507, over 4510618.38 frames. ], batch size: 92, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:55:51,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=345100.0, ans=0.125 2024-09-18 01:55:51,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=345100.0, ans=0.125 2024-09-18 01:55:52,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=345100.0, ans=0.1 2024-09-18 01:56:02,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.09 vs. limit=10.0 2024-09-18 01:56:07,385 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.480e+01 8.946e+01 9.469e+01 2.628e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-18 01:56:08,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.16 vs. limit=15.0 2024-09-18 01:56:20,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1.whitening_limit, batch_count=345180.0, ans=10.0 2024-09-18 01:56:56,027 INFO [train.py:1198] (1/2) Epoch 20, batch 350, loss[loss=0.2222, ctc_loss=0.1142, cr_loss=0.3328, attn_decoder_loss=0.2268, over 29351.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1383, cr_loss=0.382, attn_decoder_loss=0.2518, over 4796018.64 frames. ], batch size: 71, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 01:56:59,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=345300.0, ans=0.125 2024-09-18 01:57:06,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=345300.0, ans=0.025 2024-09-18 01:57:29,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=345380.0, ans=0.0 2024-09-18 01:57:57,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=345460.0, ans=0.125 2024-09-18 01:58:05,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2024-09-18 01:58:11,280 INFO [train.py:1198] (1/2) Epoch 20, batch 400, loss[loss=0.2555, ctc_loss=0.1436, cr_loss=0.3783, attn_decoder_loss=0.2596, over 29724.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1376, cr_loss=0.3814, attn_decoder_loss=0.2513, over 5024753.16 frames. ], batch size: 82, lr: 5.63e-03, grad_scale: 16.0 2024-09-18 01:58:14,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=345500.0, ans=0.125 2024-09-18 01:58:31,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=345540.0, ans=0.125 2024-09-18 01:58:40,351 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.703e+01 9.237e+01 1.010e+02 2.283e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-18 01:58:55,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2024-09-18 01:59:19,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.82 vs. limit=15.0 2024-09-18 01:59:29,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=345700.0, ans=0.1 2024-09-18 01:59:30,505 INFO [train.py:1198] (1/2) Epoch 20, batch 450, loss[loss=0.2447, ctc_loss=0.1298, cr_loss=0.3492, attn_decoder_loss=0.2497, over 29688.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1374, cr_loss=0.3803, attn_decoder_loss=0.2511, over 5187738.53 frames. ], batch size: 83, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 01:59:35,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=345700.0, ans=0.0 2024-09-18 02:00:07,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2024-09-18 02:00:11,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=345780.0, ans=0.025 2024-09-18 02:00:11,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345780.0, ans=0.1 2024-09-18 02:00:11,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.78 vs. limit=10.0 2024-09-18 02:00:29,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=345820.0, ans=0.125 2024-09-18 02:00:35,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=345860.0, ans=0.0 2024-09-18 02:00:40,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=345860.0, ans=0.125 2024-09-18 02:00:40,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-09-18 02:00:48,890 INFO [train.py:1198] (1/2) Epoch 20, batch 500, loss[loss=0.2567, ctc_loss=0.143, cr_loss=0.3818, attn_decoder_loss=0.2609, over 29440.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1365, cr_loss=0.3789, attn_decoder_loss=0.2498, over 5331501.60 frames. ], batch size: 94, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 02:01:01,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=345900.0, ans=0.0 2024-09-18 02:01:13,736 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:01:17,859 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.440e+01 8.932e+01 9.633e+01 1.955e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-18 02:01:21,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=345980.0, ans=0.0 2024-09-18 02:01:39,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=346020.0, ans=0.125 2024-09-18 02:01:44,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=346020.0, ans=0.0 2024-09-18 02:01:52,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.76 vs. limit=15.0 2024-09-18 02:02:05,126 INFO [train.py:1198] (1/2) Epoch 20, batch 550, loss[loss=0.2736, ctc_loss=0.1667, cr_loss=0.4235, attn_decoder_loss=0.276, over 28817.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1365, cr_loss=0.3784, attn_decoder_loss=0.2499, over 5424702.06 frames. ], batch size: 104, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 02:02:23,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=346140.0, ans=0.125 2024-09-18 02:02:29,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=346140.0, ans=0.125 2024-09-18 02:02:35,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=346180.0, ans=10.0 2024-09-18 02:02:41,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=346180.0, ans=0.0 2024-09-18 02:02:57,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=346220.0, ans=0.125 2024-09-18 02:03:23,209 INFO [train.py:1198] (1/2) Epoch 20, batch 600, loss[loss=0.2564, ctc_loss=0.1392, cr_loss=0.3874, attn_decoder_loss=0.2609, over 29236.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1363, cr_loss=0.378, attn_decoder_loss=0.2497, over 5512239.07 frames. ], batch size: 100, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 02:03:23,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=346300.0, ans=0.125 2024-09-18 02:03:54,166 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.611e+01 9.331e+01 1.005e+02 2.865e+02, threshold=1.866e+02, percent-clipped=3.0 2024-09-18 02:04:10,235 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:04:19,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.19 vs. limit=12.0 2024-09-18 02:04:34,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=346460.0, ans=0.125 2024-09-18 02:04:37,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=346460.0, ans=0.0 2024-09-18 02:04:41,266 INFO [train.py:1198] (1/2) Epoch 20, batch 650, loss[loss=0.2418, ctc_loss=0.1247, cr_loss=0.3625, attn_decoder_loss=0.2468, over 29775.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1359, cr_loss=0.3779, attn_decoder_loss=0.2492, over 5589416.02 frames. ], batch size: 81, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 02:04:52,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=346500.0, ans=0.125 2024-09-18 02:04:56,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=346540.0, ans=0.0 2024-09-18 02:04:56,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=346540.0, ans=0.125 2024-09-18 02:05:10,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=346580.0, ans=15.0 2024-09-18 02:05:13,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=346580.0, ans=0.125 2024-09-18 02:05:13,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=346580.0, ans=0.125 2024-09-18 02:05:26,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=346620.0, ans=0.125 2024-09-18 02:05:54,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=346660.0, ans=0.02 2024-09-18 02:05:56,825 INFO [train.py:1198] (1/2) Epoch 20, batch 700, loss[loss=0.2311, ctc_loss=0.121, cr_loss=0.3516, attn_decoder_loss=0.2355, over 29511.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1363, cr_loss=0.3787, attn_decoder_loss=0.2499, over 5639580.97 frames. ], batch size: 76, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:05:58,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=346700.0, ans=0.125 2024-09-18 02:06:25,602 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.542e+01 8.952e+01 9.567e+01 1.859e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-18 02:06:27,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=346780.0, ans=0.0 2024-09-18 02:06:54,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=346820.0, ans=0.1 2024-09-18 02:06:59,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-09-18 02:07:14,603 INFO [train.py:1198] (1/2) Epoch 20, batch 750, loss[loss=0.2418, ctc_loss=0.1269, cr_loss=0.3652, attn_decoder_loss=0.2464, over 29716.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.136, cr_loss=0.3782, attn_decoder_loss=0.2497, over 5679340.01 frames. ], batch size: 82, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:08:14,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=347020.0, ans=0.0 2024-09-18 02:08:29,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=347060.0, ans=0.2 2024-09-18 02:08:32,360 INFO [train.py:1198] (1/2) Epoch 20, batch 800, loss[loss=0.2325, ctc_loss=0.1291, cr_loss=0.3737, attn_decoder_loss=0.2356, over 29575.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1368, cr_loss=0.3793, attn_decoder_loss=0.2499, over 5710034.17 frames. ], batch size: 73, lr: 5.62e-03, grad_scale: 16.0 2024-09-18 02:08:34,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=347100.0, ans=0.0 2024-09-18 02:09:01,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=347180.0, ans=0.125 2024-09-18 02:09:01,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.43 vs. limit=15.0 2024-09-18 02:09:02,545 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.412e+01 8.904e+01 9.473e+01 1.507e+02, threshold=1.781e+02, percent-clipped=0.0 2024-09-18 02:09:08,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=347180.0, ans=0.0 2024-09-18 02:09:33,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=347260.0, ans=0.0 2024-09-18 02:09:46,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=347300.0, ans=0.025 2024-09-18 02:09:48,230 INFO [train.py:1198] (1/2) Epoch 20, batch 850, loss[loss=0.2523, ctc_loss=0.1388, cr_loss=0.3837, attn_decoder_loss=0.2564, over 29701.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1363, cr_loss=0.3784, attn_decoder_loss=0.2497, over 5739507.61 frames. ], batch size: 89, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:10:13,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=347340.0, ans=0.95 2024-09-18 02:10:19,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=347380.0, ans=0.125 2024-09-18 02:10:42,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=347420.0, ans=0.2 2024-09-18 02:11:03,657 INFO [train.py:1198] (1/2) Epoch 20, batch 900, loss[loss=0.2319, ctc_loss=0.1214, cr_loss=0.3683, attn_decoder_loss=0.2359, over 29613.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1364, cr_loss=0.3784, attn_decoder_loss=0.2498, over 5744653.01 frames. ], batch size: 73, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:11:12,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347500.0, ans=0.1 2024-09-18 02:11:16,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-09-18 02:11:22,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=347540.0, ans=0.2 2024-09-18 02:11:38,370 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.746e+01 8.646e+01 9.308e+01 1.001e+02 2.040e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-18 02:11:44,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=347580.0, ans=0.0 2024-09-18 02:11:47,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=347580.0, ans=0.125 2024-09-18 02:11:49,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=347580.0, ans=0.125 2024-09-18 02:12:08,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=347660.0, ans=0.025 2024-09-18 02:12:12,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2024-09-18 02:12:14,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=347660.0, ans=0.1 2024-09-18 02:12:15,059 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:12:23,495 INFO [train.py:1198] (1/2) Epoch 20, batch 950, loss[loss=0.2209, ctc_loss=0.1102, cr_loss=0.3145, attn_decoder_loss=0.2263, over 29512.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1363, cr_loss=0.3782, attn_decoder_loss=0.2498, over 5745727.72 frames. ], batch size: 74, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:13:07,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=347820.0, ans=0.025 2024-09-18 02:13:13,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=347820.0, ans=0.1 2024-09-18 02:13:15,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=347820.0, ans=0.2 2024-09-18 02:13:18,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347820.0, ans=0.1 2024-09-18 02:13:25,795 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:13:25,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=347860.0, ans=0.1 2024-09-18 02:13:33,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=347860.0, ans=0.125 2024-09-18 02:13:39,003 INFO [train.py:1198] (1/2) Epoch 20, batch 1000, loss[loss=0.2414, ctc_loss=0.1347, cr_loss=0.3934, attn_decoder_loss=0.2445, over 29522.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1375, cr_loss=0.3801, attn_decoder_loss=0.2507, over 5737924.71 frames. ], batch size: 77, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:13:51,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=347900.0, ans=0.0 2024-09-18 02:14:09,208 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.704e+01 9.397e+01 1.040e+02 1.771e+02, threshold=1.879e+02, percent-clipped=0.0 2024-09-18 02:14:50,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=348060.0, ans=0.125 2024-09-18 02:14:53,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=348100.0, ans=0.2 2024-09-18 02:14:54,852 INFO [train.py:1198] (1/2) Epoch 20, batch 1050, loss[loss=0.2648, ctc_loss=0.152, cr_loss=0.4204, attn_decoder_loss=0.268, over 29673.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1368, cr_loss=0.3792, attn_decoder_loss=0.2501, over 5746085.39 frames. ], batch size: 85, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:15:03,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=348100.0, ans=0.2 2024-09-18 02:15:09,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=348100.0, ans=0.025 2024-09-18 02:15:30,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=348180.0, ans=0.5 2024-09-18 02:15:34,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=348180.0, ans=0.125 2024-09-18 02:15:42,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=12.0 2024-09-18 02:15:49,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=348220.0, ans=0.2 2024-09-18 02:16:04,921 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:16:08,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=348260.0, ans=0.125 2024-09-18 02:16:15,293 INFO [train.py:1198] (1/2) Epoch 20, batch 1100, loss[loss=0.2458, ctc_loss=0.1418, cr_loss=0.3828, attn_decoder_loss=0.2488, over 29466.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1368, cr_loss=0.3786, attn_decoder_loss=0.25, over 5758569.23 frames. ], batch size: 78, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:16:17,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=348300.0, ans=0.125 2024-09-18 02:16:22,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.51 vs. limit=15.0 2024-09-18 02:16:25,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2024-09-18 02:16:45,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.537e+01 9.169e+01 9.929e+01 2.148e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-18 02:16:58,949 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2024-09-18 02:17:11,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2024-09-18 02:17:11,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=348420.0, ans=0.125 2024-09-18 02:17:21,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=348460.0, ans=0.1 2024-09-18 02:17:21,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=348460.0, ans=0.125 2024-09-18 02:17:30,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=348500.0, ans=0.09899494936611666 2024-09-18 02:17:31,305 INFO [train.py:1198] (1/2) Epoch 20, batch 1150, loss[loss=0.2426, ctc_loss=0.1287, cr_loss=0.3816, attn_decoder_loss=0.2468, over 29452.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1366, cr_loss=0.3787, attn_decoder_loss=0.25, over 5757378.89 frames. ], batch size: 78, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:17:33,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=348500.0, ans=0.125 2024-09-18 02:17:34,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=348500.0, ans=0.125 2024-09-18 02:17:54,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348540.0, ans=0.125 2024-09-18 02:18:00,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=348580.0, ans=0.1 2024-09-18 02:18:20,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=348620.0, ans=0.1 2024-09-18 02:18:34,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-18 02:18:45,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=348700.0, ans=0.125 2024-09-18 02:18:46,987 INFO [train.py:1198] (1/2) Epoch 20, batch 1200, loss[loss=0.262, ctc_loss=0.1525, cr_loss=0.4103, attn_decoder_loss=0.265, over 29660.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1373, cr_loss=0.3798, attn_decoder_loss=0.251, over 5749377.30 frames. ], batch size: 85, lr: 5.61e-03, grad_scale: 16.0 2024-09-18 02:18:57,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=348700.0, ans=0.125 2024-09-18 02:19:06,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=348740.0, ans=0.125 2024-09-18 02:19:14,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2024-09-18 02:19:22,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-09-18 02:19:23,033 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.725e+01 9.303e+01 1.008e+02 1.601e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-18 02:19:37,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348820.0, ans=0.1 2024-09-18 02:19:51,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-09-18 02:19:53,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-18 02:19:55,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=348860.0, ans=0.0 2024-09-18 02:20:07,686 INFO [train.py:1198] (1/2) Epoch 20, batch 1250, loss[loss=0.2735, ctc_loss=0.1585, cr_loss=0.4385, attn_decoder_loss=0.2766, over 29522.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1379, cr_loss=0.3822, attn_decoder_loss=0.2516, over 5776082.44 frames. ], batch size: 92, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:20:26,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=348940.0, ans=0.0 2024-09-18 02:21:01,334 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.41 vs. limit=10.0 2024-09-18 02:21:03,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=349020.0, ans=0.0 2024-09-18 02:21:05,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=349020.0, ans=0.025 2024-09-18 02:21:23,263 INFO [train.py:1198] (1/2) Epoch 20, batch 1300, loss[loss=0.2546, ctc_loss=0.1377, cr_loss=0.3577, attn_decoder_loss=0.2596, over 28632.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1373, cr_loss=0.3806, attn_decoder_loss=0.2508, over 5781676.92 frames. ], batch size: 112, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:21:28,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=349100.0, ans=10.0 2024-09-18 02:21:30,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.83 vs. limit=15.0 2024-09-18 02:21:55,239 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.542e+01 9.047e+01 9.656e+01 1.934e+02, threshold=1.809e+02, percent-clipped=1.0 2024-09-18 02:22:25,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=349260.0, ans=0.1 2024-09-18 02:22:28,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=349260.0, ans=0.125 2024-09-18 02:22:38,978 INFO [train.py:1198] (1/2) Epoch 20, batch 1350, loss[loss=0.2482, ctc_loss=0.1308, cr_loss=0.3554, attn_decoder_loss=0.2534, over 29750.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1364, cr_loss=0.3794, attn_decoder_loss=0.2503, over 5798812.38 frames. ], batch size: 81, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:22:39,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=349300.0, ans=0.125 2024-09-18 02:23:56,658 INFO [train.py:1198] (1/2) Epoch 20, batch 1400, loss[loss=0.2244, ctc_loss=0.1229, cr_loss=0.3357, attn_decoder_loss=0.2282, over 29562.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1365, cr_loss=0.3792, attn_decoder_loss=0.2503, over 5809771.02 frames. ], batch size: 69, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:23:58,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=349500.0, ans=0.125 2024-09-18 02:23:59,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.73 vs. limit=15.0 2024-09-18 02:24:09,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=12.0 2024-09-18 02:24:28,082 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.400e+01 8.906e+01 9.445e+01 1.188e+02, threshold=1.781e+02, percent-clipped=0.0 2024-09-18 02:24:46,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=349620.0, ans=0.0 2024-09-18 02:25:12,146 INFO [train.py:1198] (1/2) Epoch 20, batch 1450, loss[loss=0.2645, ctc_loss=0.1479, cr_loss=0.4225, attn_decoder_loss=0.268, over 29444.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1368, cr_loss=0.3799, attn_decoder_loss=0.2509, over 5806426.87 frames. ], batch size: 94, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:25:15,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=349700.0, ans=0.0 2024-09-18 02:25:33,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=349740.0, ans=0.125 2024-09-18 02:25:38,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=349740.0, ans=0.0 2024-09-18 02:25:56,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=349820.0, ans=0.125 2024-09-18 02:26:27,567 INFO [train.py:1198] (1/2) Epoch 20, batch 1500, loss[loss=0.2521, ctc_loss=0.1359, cr_loss=0.3615, attn_decoder_loss=0.257, over 29646.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1368, cr_loss=0.3795, attn_decoder_loss=0.251, over 5807438.14 frames. ], batch size: 86, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:27:04,146 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.814e+01 9.450e+01 1.000e+02 1.461e+02, threshold=1.890e+02, percent-clipped=0.0 2024-09-18 02:27:10,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.98 vs. limit=10.0 2024-09-18 02:27:16,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=350020.0, ans=0.0 2024-09-18 02:27:17,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.60 vs. limit=10.0 2024-09-18 02:27:17,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.66 vs. limit=12.0 2024-09-18 02:27:30,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=350020.0, ans=0.125 2024-09-18 02:27:33,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=350060.0, ans=0.0 2024-09-18 02:27:48,317 INFO [train.py:1198] (1/2) Epoch 20, batch 1550, loss[loss=0.2621, ctc_loss=0.1522, cr_loss=0.4233, attn_decoder_loss=0.2649, over 29520.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1374, cr_loss=0.3797, attn_decoder_loss=0.2511, over 5783641.47 frames. ], batch size: 90, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:27:59,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=350100.0, ans=0.1 2024-09-18 02:28:09,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=350140.0, ans=0.0 2024-09-18 02:28:12,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=350140.0, ans=0.125 2024-09-18 02:29:03,919 INFO [train.py:1198] (1/2) Epoch 20, batch 1600, loss[loss=0.2598, ctc_loss=0.1405, cr_loss=0.3956, attn_decoder_loss=0.2642, over 29671.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1376, cr_loss=0.3802, attn_decoder_loss=0.2511, over 5765256.06 frames. ], batch size: 85, lr: 5.59e-03, grad_scale: 16.0 2024-09-18 02:29:37,442 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.748e+01 9.299e+01 1.007e+02 2.517e+02, threshold=1.860e+02, percent-clipped=3.0 2024-09-18 02:29:39,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=350380.0, ans=0.125 2024-09-18 02:29:57,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=350420.0, ans=0.125 2024-09-18 02:30:19,929 INFO [train.py:1198] (1/2) Epoch 20, batch 1650, loss[loss=0.2581, ctc_loss=0.1442, cr_loss=0.3889, attn_decoder_loss=0.2621, over 29705.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1373, cr_loss=0.3798, attn_decoder_loss=0.2508, over 5759594.68 frames. ], batch size: 89, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:30:25,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-09-18 02:30:33,164 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:30:40,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=350540.0, ans=0.2 2024-09-18 02:30:57,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=350580.0, ans=0.1 2024-09-18 02:31:26,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=350660.0, ans=0.0 2024-09-18 02:31:32,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=350660.0, ans=0.125 2024-09-18 02:31:32,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=350660.0, ans=0.2 2024-09-18 02:31:39,886 INFO [train.py:1198] (1/2) Epoch 20, batch 1700, loss[loss=0.2117, ctc_loss=0.1106, cr_loss=0.3384, attn_decoder_loss=0.2154, over 29578.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.137, cr_loss=0.3791, attn_decoder_loss=0.2506, over 5781901.64 frames. ], batch size: 69, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:31:40,162 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:31:44,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=350700.0, ans=0.125 2024-09-18 02:31:56,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=350740.0, ans=0.125 2024-09-18 02:32:04,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=350740.0, ans=0.2 2024-09-18 02:32:05,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=350740.0, ans=0.125 2024-09-18 02:32:07,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-18 02:32:13,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.537e+01 9.114e+01 9.746e+01 1.208e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-18 02:32:14,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-09-18 02:32:34,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=350820.0, ans=0.2 2024-09-18 02:32:50,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=350860.0, ans=0.0 2024-09-18 02:32:52,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2024-09-18 02:32:55,906 INFO [train.py:1198] (1/2) Epoch 20, batch 1750, loss[loss=0.2239, ctc_loss=0.1345, cr_loss=0.3655, attn_decoder_loss=0.2258, over 29343.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.137, cr_loss=0.3793, attn_decoder_loss=0.2504, over 5788897.36 frames. ], batch size: 67, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:32:59,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=12.0 2024-09-18 02:33:15,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=350940.0, ans=0.1 2024-09-18 02:33:21,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-09-18 02:33:32,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=350980.0, ans=0.125 2024-09-18 02:33:36,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=350980.0, ans=0.025 2024-09-18 02:33:44,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=351020.0, ans=0.0 2024-09-18 02:33:44,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=351020.0, ans=0.0 2024-09-18 02:33:55,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=351060.0, ans=0.125 2024-09-18 02:33:56,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=351060.0, ans=0.0 2024-09-18 02:34:11,446 INFO [train.py:1198] (1/2) Epoch 20, batch 1800, loss[loss=0.2614, ctc_loss=0.1502, cr_loss=0.4014, attn_decoder_loss=0.2648, over 29687.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1369, cr_loss=0.3794, attn_decoder_loss=0.2507, over 5791489.57 frames. ], batch size: 83, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:34:20,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=351100.0, ans=0.125 2024-09-18 02:34:32,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=351140.0, ans=15.0 2024-09-18 02:34:36,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=351140.0, ans=0.125 2024-09-18 02:34:36,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=351140.0, ans=0.0 2024-09-18 02:34:41,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=351140.0, ans=0.125 2024-09-18 02:34:41,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=351140.0, ans=0.125 2024-09-18 02:34:48,914 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.074e+01 8.564e+01 9.228e+01 9.746e+01 1.564e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-18 02:35:00,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=351220.0, ans=0.125 2024-09-18 02:35:01,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=351220.0, ans=0.125 2024-09-18 02:35:04,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2024-09-18 02:35:04,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2024-09-18 02:35:32,099 INFO [train.py:1198] (1/2) Epoch 20, batch 1850, loss[loss=0.2563, ctc_loss=0.1394, cr_loss=0.3631, attn_decoder_loss=0.2613, over 29616.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1372, cr_loss=0.3803, attn_decoder_loss=0.2508, over 5797368.98 frames. ], batch size: 86, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:35:32,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351300.0, ans=0.1 2024-09-18 02:35:47,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=351340.0, ans=0.125 2024-09-18 02:35:53,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351340.0, ans=0.1 2024-09-18 02:35:55,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.47 vs. limit=10.0 2024-09-18 02:36:14,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351380.0, ans=0.1 2024-09-18 02:36:14,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=351380.0, ans=0.2 2024-09-18 02:36:14,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=351380.0, ans=0.125 2024-09-18 02:36:23,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-18 02:36:25,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=351420.0, ans=0.125 2024-09-18 02:36:32,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=351460.0, ans=0.05 2024-09-18 02:36:47,370 INFO [train.py:1198] (1/2) Epoch 20, batch 1900, loss[loss=0.2588, ctc_loss=0.1469, cr_loss=0.4083, attn_decoder_loss=0.2622, over 29698.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1376, cr_loss=0.3815, attn_decoder_loss=0.2514, over 5803793.64 frames. ], batch size: 89, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:36:52,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=351500.0, ans=0.2 2024-09-18 02:36:56,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=351500.0, ans=0.0 2024-09-18 02:37:13,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=351540.0, ans=0.125 2024-09-18 02:37:20,728 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.814e+01 8.754e+01 9.062e+01 9.837e+01 1.384e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 02:37:22,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=351580.0, ans=0.125 2024-09-18 02:37:28,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=351580.0, ans=0.125 2024-09-18 02:37:37,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=351620.0, ans=0.07 2024-09-18 02:37:42,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=351620.0, ans=0.125 2024-09-18 02:37:46,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=351660.0, ans=0.0 2024-09-18 02:37:46,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=351660.0, ans=0.2 2024-09-18 02:37:46,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=351660.0, ans=0.125 2024-09-18 02:37:54,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=351660.0, ans=0.125 2024-09-18 02:38:03,042 INFO [train.py:1198] (1/2) Epoch 20, batch 1950, loss[loss=0.2482, ctc_loss=0.1415, cr_loss=0.3797, attn_decoder_loss=0.2516, over 29439.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1387, cr_loss=0.3832, attn_decoder_loss=0.2524, over 5818497.01 frames. ], batch size: 78, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:38:19,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=351740.0, ans=0.125 2024-09-18 02:38:24,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-09-18 02:38:44,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=351780.0, ans=0.0 2024-09-18 02:38:48,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=351780.0, ans=0.125 2024-09-18 02:39:23,347 INFO [train.py:1198] (1/2) Epoch 20, batch 2000, loss[loss=0.2137, ctc_loss=0.114, cr_loss=0.3378, attn_decoder_loss=0.2172, over 29374.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.139, cr_loss=0.3833, attn_decoder_loss=0.2526, over 5796671.71 frames. ], batch size: 67, lr: 5.58e-03, grad_scale: 16.0 2024-09-18 02:39:31,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=351900.0, ans=0.0 2024-09-18 02:39:42,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351940.0, ans=0.125 2024-09-18 02:39:46,585 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:39:51,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=351940.0, ans=0.2 2024-09-18 02:39:56,880 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.619e+01 9.159e+01 9.729e+01 7.125e+02, threshold=1.832e+02, percent-clipped=2.0 2024-09-18 02:40:00,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351980.0, ans=0.1 2024-09-18 02:40:00,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.59 vs. limit=15.0 2024-09-18 02:40:17,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=352020.0, ans=0.125 2024-09-18 02:40:19,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=352020.0, ans=0.0 2024-09-18 02:40:31,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=352060.0, ans=0.125 2024-09-18 02:40:32,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2024-09-18 02:40:34,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=352060.0, ans=0.125 2024-09-18 02:40:36,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=352060.0, ans=0.2 2024-09-18 02:40:46,373 INFO [train.py:1198] (1/2) Epoch 20, batch 2050, loss[loss=0.2246, ctc_loss=0.1195, cr_loss=0.3269, attn_decoder_loss=0.229, over 29467.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1386, cr_loss=0.3825, attn_decoder_loss=0.2518, over 5787932.39 frames. ], batch size: 70, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:41:09,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=352140.0, ans=0.125 2024-09-18 02:41:10,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=352140.0, ans=0.125 2024-09-18 02:41:12,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=352140.0, ans=0.125 2024-09-18 02:41:12,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=352140.0, ans=0.125 2024-09-18 02:41:35,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=352220.0, ans=0.125 2024-09-18 02:41:59,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=352260.0, ans=0.0 2024-09-18 02:42:01,886 INFO [train.py:1198] (1/2) Epoch 20, batch 2100, loss[loss=0.2466, ctc_loss=0.1327, cr_loss=0.3826, attn_decoder_loss=0.2507, over 29746.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1375, cr_loss=0.3809, attn_decoder_loss=0.2509, over 5800476.55 frames. ], batch size: 81, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:42:38,586 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.424e+01 8.970e+01 9.709e+01 1.410e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 02:42:40,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=352380.0, ans=0.2 2024-09-18 02:42:41,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=352380.0, ans=0.125 2024-09-18 02:43:00,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=352420.0, ans=0.2 2024-09-18 02:43:21,526 INFO [train.py:1198] (1/2) Epoch 20, batch 2150, loss[loss=0.2608, ctc_loss=0.1529, cr_loss=0.3998, attn_decoder_loss=0.2639, over 29439.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1369, cr_loss=0.3797, attn_decoder_loss=0.2505, over 5815752.05 frames. ], batch size: 78, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:43:41,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=352540.0, ans=0.0 2024-09-18 02:44:06,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.32 vs. limit=10.0 2024-09-18 02:44:07,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352620.0, ans=0.1 2024-09-18 02:44:11,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352620.0, ans=0.1 2024-09-18 02:44:30,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352660.0, ans=0.1 2024-09-18 02:44:31,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=352660.0, ans=0.125 2024-09-18 02:44:37,538 INFO [train.py:1198] (1/2) Epoch 20, batch 2200, loss[loss=0.2667, ctc_loss=0.1526, cr_loss=0.4162, attn_decoder_loss=0.2701, over 29648.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1367, cr_loss=0.3793, attn_decoder_loss=0.2503, over 5811441.60 frames. ], batch size: 86, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:44:47,163 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:44:53,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-18 02:44:57,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=352740.0, ans=0.0 2024-09-18 02:45:08,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=352780.0, ans=0.0 2024-09-18 02:45:12,309 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.647e+01 9.174e+01 9.915e+01 1.896e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-18 02:45:26,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=352820.0, ans=0.0 2024-09-18 02:45:32,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352820.0, ans=0.1 2024-09-18 02:45:45,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-09-18 02:45:47,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=352860.0, ans=0.125 2024-09-18 02:45:53,590 INFO [train.py:1198] (1/2) Epoch 20, batch 2250, loss[loss=0.2435, ctc_loss=0.133, cr_loss=0.3806, attn_decoder_loss=0.2473, over 29726.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1364, cr_loss=0.3787, attn_decoder_loss=0.2499, over 5811260.55 frames. ], batch size: 82, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:45:56,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=352900.0, ans=0.125 2024-09-18 02:45:59,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=352900.0, ans=0.125 2024-09-18 02:46:01,420 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:46:04,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=352900.0, ans=0.125 2024-09-18 02:46:07,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=352940.0, ans=0.1 2024-09-18 02:46:34,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-18 02:46:38,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=352980.0, ans=0.0 2024-09-18 02:46:42,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353020.0, ans=0.1 2024-09-18 02:47:13,751 INFO [train.py:1198] (1/2) Epoch 20, batch 2300, loss[loss=0.2275, ctc_loss=0.1267, cr_loss=0.3484, attn_decoder_loss=0.2309, over 29729.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.136, cr_loss=0.3775, attn_decoder_loss=0.2494, over 5798177.43 frames. ], batch size: 72, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:47:21,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=353100.0, ans=0.0 2024-09-18 02:47:48,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.594e+01 9.374e+01 1.007e+02 2.489e+02, threshold=1.875e+02, percent-clipped=2.0 2024-09-18 02:47:53,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=353180.0, ans=0.05 2024-09-18 02:48:29,442 INFO [train.py:1198] (1/2) Epoch 20, batch 2350, loss[loss=0.2481, ctc_loss=0.1373, cr_loss=0.3856, attn_decoder_loss=0.2518, over 29699.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1363, cr_loss=0.3783, attn_decoder_loss=0.2496, over 5803956.34 frames. ], batch size: 83, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:48:50,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=353340.0, ans=0.1 2024-09-18 02:49:16,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=353420.0, ans=0.125 2024-09-18 02:49:19,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=353420.0, ans=0.1 2024-09-18 02:49:24,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=353420.0, ans=0.125 2024-09-18 02:49:31,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=353460.0, ans=0.125 2024-09-18 02:49:34,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=353460.0, ans=0.0 2024-09-18 02:49:45,306 INFO [train.py:1198] (1/2) Epoch 20, batch 2400, loss[loss=0.2401, ctc_loss=0.1281, cr_loss=0.365, attn_decoder_loss=0.2445, over 29517.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1366, cr_loss=0.379, attn_decoder_loss=0.2501, over 5808323.03 frames. ], batch size: 76, lr: 5.57e-03, grad_scale: 16.0 2024-09-18 02:49:59,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.87 vs. limit=22.5 2024-09-18 02:50:15,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-09-18 02:50:23,695 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.660e+01 9.243e+01 9.853e+01 2.252e+02, threshold=1.849e+02, percent-clipped=1.0 2024-09-18 02:50:50,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=353660.0, ans=0.125 2024-09-18 02:50:52,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=353660.0, ans=0.2 2024-09-18 02:50:55,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=353660.0, ans=0.125 2024-09-18 02:50:55,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=353660.0, ans=0.125 2024-09-18 02:51:01,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=353660.0, ans=0.1 2024-09-18 02:51:05,907 INFO [train.py:1198] (1/2) Epoch 20, batch 2450, loss[loss=0.2604, ctc_loss=0.1445, cr_loss=0.3996, attn_decoder_loss=0.2644, over 29708.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1376, cr_loss=0.3807, attn_decoder_loss=0.2511, over 5786727.46 frames. ], batch size: 82, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:51:22,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=353740.0, ans=0.125 2024-09-18 02:51:29,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.25 vs. limit=15.0 2024-09-18 02:51:38,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=353780.0, ans=0.0 2024-09-18 02:52:08,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=353860.0, ans=0.2 2024-09-18 02:52:12,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=353860.0, ans=0.125 2024-09-18 02:52:21,948 INFO [train.py:1198] (1/2) Epoch 20, batch 2500, loss[loss=0.2581, ctc_loss=0.1429, cr_loss=0.3931, attn_decoder_loss=0.2622, over 29622.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1374, cr_loss=0.3807, attn_decoder_loss=0.2511, over 5797202.06 frames. ], batch size: 86, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:52:23,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=353900.0, ans=0.1 2024-09-18 02:52:34,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353900.0, ans=0.1 2024-09-18 02:52:43,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=353940.0, ans=0.09899494936611666 2024-09-18 02:52:43,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=353940.0, ans=0.0 2024-09-18 02:52:58,484 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.592e+01 8.974e+01 9.558e+01 1.231e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-18 02:53:17,165 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:53:29,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=354060.0, ans=0.0 2024-09-18 02:53:38,060 INFO [train.py:1198] (1/2) Epoch 20, batch 2550, loss[loss=0.2224, ctc_loss=0.1169, cr_loss=0.345, attn_decoder_loss=0.2265, over 29333.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1369, cr_loss=0.3793, attn_decoder_loss=0.2508, over 5801080.95 frames. ], batch size: 67, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:53:42,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=354100.0, ans=0.0 2024-09-18 02:53:44,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=354100.0, ans=0.125 2024-09-18 02:54:33,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=354220.0, ans=0.125 2024-09-18 02:54:58,070 INFO [train.py:1198] (1/2) Epoch 20, batch 2600, loss[loss=0.2387, ctc_loss=0.125, cr_loss=0.3632, attn_decoder_loss=0.2433, over 29474.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1372, cr_loss=0.3804, attn_decoder_loss=0.2512, over 5797035.57 frames. ], batch size: 78, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 02:54:58,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=354300.0, ans=0.2 2024-09-18 02:55:12,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.whiten.whitening_limit, batch_count=354340.0, ans=12.0 2024-09-18 02:55:13,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=354340.0, ans=0.0 2024-09-18 02:55:34,144 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.665e+01 9.316e+01 9.977e+01 1.565e+02, threshold=1.863e+02, percent-clipped=0.0 2024-09-18 02:55:48,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=354420.0, ans=0.0 2024-09-18 02:56:13,762 INFO [train.py:1198] (1/2) Epoch 20, batch 2650, loss[loss=0.2748, ctc_loss=0.1576, cr_loss=0.4257, attn_decoder_loss=0.2784, over 29171.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1375, cr_loss=0.3811, attn_decoder_loss=0.2515, over 5803077.04 frames. ], batch size: 100, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 02:56:18,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=354500.0, ans=0.0 2024-09-18 02:56:33,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=354540.0, ans=0.125 2024-09-18 02:56:55,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.84 vs. limit=15.0 2024-09-18 02:57:02,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=354620.0, ans=0.0 2024-09-18 02:57:29,344 INFO [train.py:1198] (1/2) Epoch 20, batch 2700, loss[loss=0.259, ctc_loss=0.1438, cr_loss=0.4132, attn_decoder_loss=0.2626, over 29533.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.138, cr_loss=0.3818, attn_decoder_loss=0.2523, over 5798098.77 frames. ], batch size: 87, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 02:57:33,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=22.5 2024-09-18 02:57:37,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=354700.0, ans=0.125 2024-09-18 02:57:52,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=354740.0, ans=0.125 2024-09-18 02:58:07,622 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.506e+01 9.049e+01 9.472e+01 1.287e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-18 02:58:12,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=354780.0, ans=0.0 2024-09-18 02:58:41,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=354860.0, ans=0.125 2024-09-18 02:58:49,462 INFO [train.py:1198] (1/2) Epoch 20, batch 2750, loss[loss=0.2433, ctc_loss=0.1292, cr_loss=0.3709, attn_decoder_loss=0.2477, over 29520.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1371, cr_loss=0.3806, attn_decoder_loss=0.251, over 5797070.49 frames. ], batch size: 75, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 02:58:51,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=354900.0, ans=0.125 2024-09-18 02:58:52,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2024-09-18 02:58:54,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=354900.0, ans=0.0 2024-09-18 02:58:58,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354900.0, ans=0.1 2024-09-18 02:59:00,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=354900.0, ans=0.09899494936611666 2024-09-18 02:59:03,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=354940.0, ans=0.0 2024-09-18 02:59:10,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=354940.0, ans=0.125 2024-09-18 02:59:48,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-09-18 02:59:50,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=355060.0, ans=0.125 2024-09-18 02:59:58,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=355060.0, ans=0.0 2024-09-18 03:00:02,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=355060.0, ans=0.125 2024-09-18 03:00:05,925 INFO [train.py:1198] (1/2) Epoch 20, batch 2800, loss[loss=0.2794, ctc_loss=0.1934, cr_loss=0.4274, attn_decoder_loss=0.2795, over 19736.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1375, cr_loss=0.3809, attn_decoder_loss=0.2513, over 5777227.56 frames. ], batch size: 210, lr: 5.56e-03, grad_scale: 16.0 2024-09-18 03:00:19,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=355140.0, ans=0.125 2024-09-18 03:00:27,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=355140.0, ans=0.125 2024-09-18 03:00:44,102 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.732e+01 9.172e+01 1.024e+02 2.809e+02, threshold=1.834e+02, percent-clipped=3.0 2024-09-18 03:00:53,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=355220.0, ans=0.09899494936611666 2024-09-18 03:00:59,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=355220.0, ans=0.125 2024-09-18 03:01:08,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=355260.0, ans=0.125 2024-09-18 03:01:10,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.92 vs. limit=10.0 2024-09-18 03:01:12,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.62 vs. limit=15.0 2024-09-18 03:01:21,675 INFO [train.py:1198] (1/2) Epoch 20, batch 2850, loss[loss=0.2412, ctc_loss=0.1302, cr_loss=0.3672, attn_decoder_loss=0.2454, over 29525.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1378, cr_loss=0.3813, attn_decoder_loss=0.2517, over 5761810.48 frames. ], batch size: 77, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 03:01:23,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=355300.0, ans=0.0 2024-09-18 03:02:09,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=355420.0, ans=0.125 2024-09-18 03:02:41,742 INFO [train.py:1198] (1/2) Epoch 20, batch 2900, loss[loss=0.2492, ctc_loss=0.1455, cr_loss=0.3983, attn_decoder_loss=0.2519, over 29415.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1385, cr_loss=0.3826, attn_decoder_loss=0.2526, over 5787797.95 frames. ], batch size: 79, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:03:01,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=355540.0, ans=0.2 2024-09-18 03:03:12,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=355580.0, ans=0.025 2024-09-18 03:03:12,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=355580.0, ans=0.0 2024-09-18 03:03:18,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=355580.0, ans=0.0 2024-09-18 03:03:19,775 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.493e+01 9.196e+01 9.952e+01 2.490e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-18 03:03:24,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=355580.0, ans=0.0 2024-09-18 03:03:52,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.19 vs. limit=22.5 2024-09-18 03:03:57,573 INFO [train.py:1198] (1/2) Epoch 20, batch 2950, loss[loss=0.237, ctc_loss=0.1334, cr_loss=0.3918, attn_decoder_loss=0.2398, over 29520.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1375, cr_loss=0.3808, attn_decoder_loss=0.2514, over 5782486.98 frames. ], batch size: 75, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:03:57,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=355700.0, ans=0.125 2024-09-18 03:04:05,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=355700.0, ans=0.025 2024-09-18 03:04:26,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.18 vs. limit=10.0 2024-09-18 03:04:31,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=355780.0, ans=0.0 2024-09-18 03:04:41,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=355820.0, ans=0.0 2024-09-18 03:04:44,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=355820.0, ans=0.125 2024-09-18 03:04:46,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=355820.0, ans=0.09899494936611666 2024-09-18 03:04:55,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=15.0 2024-09-18 03:05:01,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=355860.0, ans=0.025 2024-09-18 03:05:04,257 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:05:05,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=355860.0, ans=0.125 2024-09-18 03:05:08,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=355860.0, ans=0.125 2024-09-18 03:05:13,039 INFO [train.py:1198] (1/2) Epoch 20, batch 3000, loss[loss=0.2363, ctc_loss=0.1265, cr_loss=0.3564, attn_decoder_loss=0.2406, over 29736.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1368, cr_loss=0.3791, attn_decoder_loss=0.251, over 5783844.62 frames. ], batch size: 81, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:05:13,040 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 03:05:32,383 INFO [train.py:1230] (1/2) Epoch 20, validation: loss=0.2111, ctc_loss=0.03914, cr_loss=5.228e-15, attn_decoder_loss=0.2302, over 944034.00 frames. 2024-09-18 03:05:32,383 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 03:05:40,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=355900.0, ans=0.125 2024-09-18 03:06:04,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=12.0 2024-09-18 03:06:10,670 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.598e+01 9.158e+01 9.918e+01 2.557e+02, threshold=1.832e+02, percent-clipped=1.0 2024-09-18 03:06:12,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=355980.0, ans=0.0 2024-09-18 03:06:23,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=356020.0, ans=0.07 2024-09-18 03:06:25,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=356020.0, ans=0.05 2024-09-18 03:06:25,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=356020.0, ans=0.125 2024-09-18 03:06:35,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=356060.0, ans=0.5 2024-09-18 03:06:37,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=356060.0, ans=0.125 2024-09-18 03:06:50,689 INFO [train.py:1198] (1/2) Epoch 20, batch 3050, loss[loss=0.2342, ctc_loss=0.1272, cr_loss=0.3596, attn_decoder_loss=0.2381, over 29499.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1374, cr_loss=0.38, attn_decoder_loss=0.2515, over 5777809.96 frames. ], batch size: 76, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:07:09,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=356140.0, ans=0.0 2024-09-18 03:07:20,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.36 vs. limit=10.0 2024-09-18 03:07:34,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=356220.0, ans=0.125 2024-09-18 03:07:55,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356260.0, ans=0.1 2024-09-18 03:07:55,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=356260.0, ans=0.025 2024-09-18 03:08:05,860 INFO [train.py:1198] (1/2) Epoch 20, batch 3100, loss[loss=0.2687, ctc_loss=0.1546, cr_loss=0.4146, attn_decoder_loss=0.2721, over 29281.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1372, cr_loss=0.3791, attn_decoder_loss=0.2509, over 5777820.40 frames. ], batch size: 100, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:08:25,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=356340.0, ans=10.0 2024-09-18 03:08:41,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=356380.0, ans=0.0 2024-09-18 03:08:43,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.464e+01 9.160e+01 9.747e+01 2.632e+02, threshold=1.832e+02, percent-clipped=3.0 2024-09-18 03:09:15,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=356460.0, ans=0.0 2024-09-18 03:09:16,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=356460.0, ans=0.07 2024-09-18 03:09:18,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=356460.0, ans=0.2 2024-09-18 03:09:24,108 INFO [train.py:1198] (1/2) Epoch 20, batch 3150, loss[loss=0.2683, ctc_loss=0.157, cr_loss=0.413, attn_decoder_loss=0.2714, over 28913.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1372, cr_loss=0.3797, attn_decoder_loss=0.2509, over 5784629.84 frames. ], batch size: 104, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:09:25,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=356500.0, ans=0.125 2024-09-18 03:09:34,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356500.0, ans=0.1 2024-09-18 03:09:44,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=356540.0, ans=0.0 2024-09-18 03:09:48,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356540.0, ans=0.1 2024-09-18 03:10:07,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-09-18 03:10:11,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=356620.0, ans=0.0 2024-09-18 03:10:36,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=356660.0, ans=0.125 2024-09-18 03:10:42,207 INFO [train.py:1198] (1/2) Epoch 20, batch 3200, loss[loss=0.2382, ctc_loss=0.1234, cr_loss=0.3537, attn_decoder_loss=0.2431, over 29389.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1366, cr_loss=0.3788, attn_decoder_loss=0.2506, over 5794702.30 frames. ], batch size: 79, lr: 5.54e-03, grad_scale: 16.0 2024-09-18 03:10:42,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=356700.0, ans=0.2 2024-09-18 03:10:45,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=356700.0, ans=0.125 2024-09-18 03:10:46,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=356700.0, ans=0.0 2024-09-18 03:11:02,177 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:11:06,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-09-18 03:11:21,928 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 8.428e+01 9.069e+01 9.579e+01 2.573e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-18 03:11:40,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=356820.0, ans=0.025 2024-09-18 03:11:47,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.41 vs. limit=15.0 2024-09-18 03:11:52,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=356860.0, ans=0.0 2024-09-18 03:11:58,536 INFO [train.py:1198] (1/2) Epoch 20, batch 3250, loss[loss=0.2635, ctc_loss=0.1435, cr_loss=0.3741, attn_decoder_loss=0.2685, over 29687.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1368, cr_loss=0.3795, attn_decoder_loss=0.2511, over 5801366.72 frames. ], batch size: 84, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:12:02,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356900.0, ans=0.1 2024-09-18 03:12:24,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=356940.0, ans=0.025 2024-09-18 03:12:27,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.57 vs. limit=12.0 2024-09-18 03:12:33,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.94 vs. limit=15.0 2024-09-18 03:12:37,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=356980.0, ans=0.125 2024-09-18 03:12:37,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356980.0, ans=0.1 2024-09-18 03:12:52,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=357020.0, ans=0.125 2024-09-18 03:12:54,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=357020.0, ans=0.125 2024-09-18 03:12:54,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=357020.0, ans=0.125 2024-09-18 03:12:58,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2024-09-18 03:13:01,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=357060.0, ans=0.2 2024-09-18 03:13:10,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=357060.0, ans=0.125 2024-09-18 03:13:11,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=357060.0, ans=0.07 2024-09-18 03:13:15,869 INFO [train.py:1198] (1/2) Epoch 20, batch 3300, loss[loss=0.2529, ctc_loss=0.133, cr_loss=0.3531, attn_decoder_loss=0.2584, over 28461.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1362, cr_loss=0.3783, attn_decoder_loss=0.2499, over 5799208.25 frames. ], batch size: 111, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:13:26,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357100.0, ans=0.1 2024-09-18 03:13:34,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.45 vs. limit=12.0 2024-09-18 03:13:46,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=357180.0, ans=0.1 2024-09-18 03:13:55,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.624e+01 9.196e+01 9.884e+01 4.402e+02, threshold=1.839e+02, percent-clipped=2.0 2024-09-18 03:13:59,887 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:14:15,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=357220.0, ans=0.125 2024-09-18 03:14:20,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=357260.0, ans=0.05 2024-09-18 03:14:28,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=357260.0, ans=0.125 2024-09-18 03:14:33,202 INFO [train.py:1198] (1/2) Epoch 20, batch 3350, loss[loss=0.2569, ctc_loss=0.1487, cr_loss=0.416, attn_decoder_loss=0.2596, over 28724.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1373, cr_loss=0.3801, attn_decoder_loss=0.2508, over 5775883.71 frames. ], batch size: 104, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:14:38,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=357300.0, ans=0.025 2024-09-18 03:14:38,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=357300.0, ans=0.0 2024-09-18 03:14:58,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.00 vs. limit=22.5 2024-09-18 03:15:25,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=357420.0, ans=0.0 2024-09-18 03:15:29,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=357420.0, ans=0.125 2024-09-18 03:15:41,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=357460.0, ans=0.1 2024-09-18 03:15:48,967 INFO [train.py:1198] (1/2) Epoch 20, batch 3400, loss[loss=0.2212, ctc_loss=0.1146, cr_loss=0.3336, attn_decoder_loss=0.2256, over 29390.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1375, cr_loss=0.3801, attn_decoder_loss=0.2511, over 5766928.59 frames. ], batch size: 67, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:15:50,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=357500.0, ans=0.125 2024-09-18 03:16:18,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2024-09-18 03:16:28,756 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.602e+01 9.311e+01 9.873e+01 3.083e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-18 03:16:30,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=357580.0, ans=0.0 2024-09-18 03:16:50,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.81 vs. limit=22.5 2024-09-18 03:17:07,403 INFO [train.py:1198] (1/2) Epoch 20, batch 3450, loss[loss=0.2643, ctc_loss=0.1465, cr_loss=0.381, attn_decoder_loss=0.269, over 28244.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1378, cr_loss=0.3805, attn_decoder_loss=0.2515, over 5774525.20 frames. ], batch size: 111, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:17:22,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=357740.0, ans=0.125 2024-09-18 03:17:22,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=357740.0, ans=0.125 2024-09-18 03:17:33,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=357740.0, ans=0.125 2024-09-18 03:17:38,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.25 vs. limit=15.0 2024-09-18 03:17:40,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=357780.0, ans=0.09899494936611666 2024-09-18 03:17:43,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=357780.0, ans=0.125 2024-09-18 03:18:02,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.35 vs. limit=15.0 2024-09-18 03:18:16,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=357860.0, ans=0.0 2024-09-18 03:18:22,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=357860.0, ans=0.125 2024-09-18 03:18:25,109 INFO [train.py:1198] (1/2) Epoch 20, batch 3500, loss[loss=0.2169, ctc_loss=0.1167, cr_loss=0.3383, attn_decoder_loss=0.2205, over 29326.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1376, cr_loss=0.3806, attn_decoder_loss=0.251, over 5775483.23 frames. ], batch size: 71, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:18:29,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=357900.0, ans=0.125 2024-09-18 03:19:01,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=357980.0, ans=0.09899494936611666 2024-09-18 03:19:03,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.576e+01 9.185e+01 9.795e+01 1.651e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-18 03:19:13,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358020.0, ans=0.1 2024-09-18 03:19:28,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=358060.0, ans=0.125 2024-09-18 03:19:39,918 INFO [train.py:1198] (1/2) Epoch 20, batch 3550, loss[loss=0.2553, ctc_loss=0.1427, cr_loss=0.3878, attn_decoder_loss=0.2591, over 29734.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.137, cr_loss=0.38, attn_decoder_loss=0.2507, over 5783363.08 frames. ], batch size: 89, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:19:48,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2024-09-18 03:19:53,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=358140.0, ans=0.125 2024-09-18 03:19:54,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=358140.0, ans=0.0 2024-09-18 03:20:11,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=358180.0, ans=0.0 2024-09-18 03:20:14,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=358180.0, ans=0.025 2024-09-18 03:20:23,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.77 vs. limit=12.0 2024-09-18 03:20:26,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=358220.0, ans=0.125 2024-09-18 03:20:27,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=358220.0, ans=0.125 2024-09-18 03:20:29,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.81 vs. limit=15.0 2024-09-18 03:20:31,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=358220.0, ans=0.125 2024-09-18 03:20:32,000 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:20:33,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2024-09-18 03:20:34,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=358220.0, ans=0.0 2024-09-18 03:20:36,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=358220.0, ans=0.125 2024-09-18 03:20:48,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=358260.0, ans=0.125 2024-09-18 03:20:50,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.65 vs. limit=10.0 2024-09-18 03:20:53,743 INFO [train.py:1198] (1/2) Epoch 20, batch 3600, loss[loss=0.243, ctc_loss=0.133, cr_loss=0.3635, attn_decoder_loss=0.2472, over 29497.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1371, cr_loss=0.3797, attn_decoder_loss=0.2508, over 5791796.26 frames. ], batch size: 77, lr: 5.53e-03, grad_scale: 16.0 2024-09-18 03:20:54,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=358300.0, ans=0.125 2024-09-18 03:21:01,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=358300.0, ans=0.0 2024-09-18 03:21:06,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=358300.0, ans=0.0 2024-09-18 03:21:06,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=358300.0, ans=0.1 2024-09-18 03:21:07,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=358340.0, ans=0.2 2024-09-18 03:21:08,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=358340.0, ans=0.125 2024-09-18 03:21:18,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-09-18 03:21:32,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=15.0 2024-09-18 03:21:33,254 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.608e+01 9.165e+01 9.950e+01 3.634e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-18 03:22:01,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=358460.0, ans=0.0 2024-09-18 03:22:10,961 INFO [train.py:1198] (1/2) Epoch 20, batch 3650, loss[loss=0.2582, ctc_loss=0.1449, cr_loss=0.3983, attn_decoder_loss=0.2619, over 29513.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1365, cr_loss=0.3788, attn_decoder_loss=0.2501, over 5792458.18 frames. ], batch size: 90, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:22:35,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=358540.0, ans=0.2 2024-09-18 03:22:57,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=358620.0, ans=0.0 2024-09-18 03:22:58,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=358620.0, ans=0.0 2024-09-18 03:23:24,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=358700.0, ans=0.2 2024-09-18 03:23:25,634 INFO [train.py:1198] (1/2) Epoch 20, batch 3700, loss[loss=0.2491, ctc_loss=0.1414, cr_loss=0.3907, attn_decoder_loss=0.2524, over 29706.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1366, cr_loss=0.3798, attn_decoder_loss=0.2505, over 5802558.80 frames. ], batch size: 84, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:23:29,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=358700.0, ans=0.125 2024-09-18 03:23:31,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=358700.0, ans=0.125 2024-09-18 03:23:34,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=358700.0, ans=0.125 2024-09-18 03:23:54,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-09-18 03:23:55,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=358780.0, ans=0.125 2024-09-18 03:24:05,520 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.568e+01 9.154e+01 9.793e+01 1.686e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-18 03:24:31,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=358860.0, ans=0.1 2024-09-18 03:24:35,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=358860.0, ans=0.125 2024-09-18 03:24:41,704 INFO [train.py:1198] (1/2) Epoch 20, batch 3750, loss[loss=0.2158, ctc_loss=0.1159, cr_loss=0.3414, attn_decoder_loss=0.2193, over 29365.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1364, cr_loss=0.3796, attn_decoder_loss=0.2504, over 5806783.48 frames. ], batch size: 67, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:24:54,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-09-18 03:24:59,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=358940.0, ans=0.125 2024-09-18 03:25:19,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=358980.0, ans=0.125 2024-09-18 03:25:20,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=358980.0, ans=0.125 2024-09-18 03:25:26,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=359020.0, ans=0.1 2024-09-18 03:25:44,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=359060.0, ans=0.0 2024-09-18 03:25:56,008 INFO [train.py:1198] (1/2) Epoch 20, batch 3800, loss[loss=0.2692, ctc_loss=0.1537, cr_loss=0.4258, attn_decoder_loss=0.2726, over 29608.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1363, cr_loss=0.3792, attn_decoder_loss=0.25, over 5796153.51 frames. ], batch size: 86, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:26:14,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-09-18 03:26:36,572 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.575e+01 9.018e+01 9.556e+01 1.555e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-18 03:26:54,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=359260.0, ans=0.125 2024-09-18 03:26:56,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=359260.0, ans=0.0 2024-09-18 03:27:03,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=359260.0, ans=0.025 2024-09-18 03:27:06,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2024-09-18 03:27:10,604 INFO [train.py:1198] (1/2) Epoch 20, batch 3850, loss[loss=0.266, ctc_loss=0.1537, cr_loss=0.4222, attn_decoder_loss=0.2691, over 29288.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1358, cr_loss=0.3786, attn_decoder_loss=0.2497, over 5810988.60 frames. ], batch size: 100, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:27:21,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2024-09-18 03:27:24,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359340.0, ans=0.1 2024-09-18 03:27:32,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.44 vs. limit=10.0 2024-09-18 03:27:48,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359380.0, ans=0.1 2024-09-18 03:27:54,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=359380.0, ans=0.125 2024-09-18 03:27:57,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359420.0, ans=0.1 2024-09-18 03:28:01,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=359420.0, ans=0.125 2024-09-18 03:28:03,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.68 vs. limit=15.0 2024-09-18 03:28:04,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=359420.0, ans=0.0 2024-09-18 03:28:26,444 INFO [train.py:1198] (1/2) Epoch 20, batch 3900, loss[loss=0.2537, ctc_loss=0.1377, cr_loss=0.3959, attn_decoder_loss=0.2578, over 29636.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1361, cr_loss=0.3792, attn_decoder_loss=0.25, over 5815319.35 frames. ], batch size: 86, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:28:31,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=359500.0, ans=0.125 2024-09-18 03:28:34,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=359500.0, ans=0.125 2024-09-18 03:28:37,156 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:29:06,326 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.505e+01 9.019e+01 9.664e+01 2.565e+02, threshold=1.804e+02, percent-clipped=1.0 2024-09-18 03:29:06,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=359580.0, ans=0.0 2024-09-18 03:29:11,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=359620.0, ans=0.125 2024-09-18 03:29:11,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-09-18 03:29:22,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=359620.0, ans=0.0 2024-09-18 03:29:34,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=359660.0, ans=0.025 2024-09-18 03:29:40,553 INFO [train.py:1198] (1/2) Epoch 20, batch 3950, loss[loss=0.2605, ctc_loss=0.1547, cr_loss=0.4219, attn_decoder_loss=0.2628, over 29475.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.136, cr_loss=0.3799, attn_decoder_loss=0.2501, over 5834870.71 frames. ], batch size: 97, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:30:15,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=359780.0, ans=15.0 2024-09-18 03:30:35,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-09-18 03:30:35,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=359820.0, ans=0.125 2024-09-18 03:30:38,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=359820.0, ans=0.0 2024-09-18 03:30:56,092 INFO [train.py:1198] (1/2) Epoch 20, batch 4000, loss[loss=0.2369, ctc_loss=0.127, cr_loss=0.3695, attn_decoder_loss=0.2409, over 29498.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1363, cr_loss=0.3798, attn_decoder_loss=0.25, over 5811658.90 frames. ], batch size: 74, lr: 5.52e-03, grad_scale: 16.0 2024-09-18 03:30:56,437 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:31:09,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=359940.0, ans=0.2 2024-09-18 03:31:10,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2024-09-18 03:31:28,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=359980.0, ans=0.125 2024-09-18 03:31:38,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.709e+01 8.706e+01 9.188e+01 9.943e+01 2.259e+02, threshold=1.838e+02, percent-clipped=3.0 2024-09-18 03:31:49,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=12.0 2024-09-18 03:32:09,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=360100.0, ans=0.04949747468305833 2024-09-18 03:32:10,754 INFO [train.py:1198] (1/2) Epoch 20, batch 4050, loss[loss=0.2841, ctc_loss=0.1873, cr_loss=0.4312, attn_decoder_loss=0.2853, over 20286.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1366, cr_loss=0.3799, attn_decoder_loss=0.25, over 5796573.27 frames. ], batch size: 210, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:32:17,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.09 vs. limit=10.0 2024-09-18 03:32:35,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=360140.0, ans=0.125 2024-09-18 03:32:48,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=360180.0, ans=0.125 2024-09-18 03:32:49,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=360180.0, ans=0.125 2024-09-18 03:33:25,675 INFO [train.py:1198] (1/2) Epoch 20, batch 4100, loss[loss=0.2635, ctc_loss=0.1535, cr_loss=0.4275, attn_decoder_loss=0.2662, over 29532.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1368, cr_loss=0.3798, attn_decoder_loss=0.2503, over 5791828.86 frames. ], batch size: 90, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:33:34,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2024-09-18 03:33:58,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=360380.0, ans=0.125 2024-09-18 03:34:06,719 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.554e+01 9.204e+01 1.015e+02 1.958e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-18 03:34:17,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=360420.0, ans=0.125 2024-09-18 03:34:24,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-18 03:34:33,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=360460.0, ans=0.125 2024-09-18 03:34:40,410 INFO [train.py:1198] (1/2) Epoch 20, batch 4150, loss[loss=0.2425, ctc_loss=0.1373, cr_loss=0.3778, attn_decoder_loss=0.2458, over 29509.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1369, cr_loss=0.3804, attn_decoder_loss=0.2502, over 5798470.96 frames. ], batch size: 77, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:34:48,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=360500.0, ans=0.125 2024-09-18 03:34:55,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=360540.0, ans=0.025 2024-09-18 03:35:10,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-09-18 03:35:15,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=360580.0, ans=0.2 2024-09-18 03:35:32,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=360620.0, ans=0.0 2024-09-18 03:35:44,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-09-18 03:35:47,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.08 vs. limit=15.0 2024-09-18 03:35:53,871 INFO [train.py:1198] (1/2) Epoch 20, batch 4200, loss[loss=0.2654, ctc_loss=0.154, cr_loss=0.4039, attn_decoder_loss=0.2688, over 29507.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1363, cr_loss=0.379, attn_decoder_loss=0.2503, over 5799802.27 frames. ], batch size: 90, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:35:54,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=360700.0, ans=0.125 2024-09-18 03:36:25,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=22.5 2024-09-18 03:36:36,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.598e+01 9.049e+01 1.004e+02 1.437e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-18 03:36:36,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=12.0 2024-09-18 03:36:41,114 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:36:41,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=360820.0, ans=0.09899494936611666 2024-09-18 03:36:53,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=360860.0, ans=0.125 2024-09-18 03:36:57,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=12.0 2024-09-18 03:36:58,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=360860.0, ans=0.125 2024-09-18 03:37:09,126 INFO [train.py:1198] (1/2) Epoch 20, batch 4250, loss[loss=0.2282, ctc_loss=0.1197, cr_loss=0.3443, attn_decoder_loss=0.2326, over 29493.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1362, cr_loss=0.3792, attn_decoder_loss=0.2505, over 5805985.17 frames. ], batch size: 74, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:37:09,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=360900.0, ans=0.125 2024-09-18 03:37:29,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.77 vs. limit=22.5 2024-09-18 03:37:46,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=360980.0, ans=0.125 2024-09-18 03:37:58,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=361020.0, ans=0.125 2024-09-18 03:37:59,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=361020.0, ans=0.125 2024-09-18 03:38:23,852 INFO [train.py:1198] (1/2) Epoch 20, batch 4300, loss[loss=0.2531, ctc_loss=0.1371, cr_loss=0.3614, attn_decoder_loss=0.258, over 29562.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1361, cr_loss=0.3789, attn_decoder_loss=0.2507, over 5795120.06 frames. ], batch size: 87, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:38:24,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=12.0 2024-09-18 03:38:28,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=361100.0, ans=0.0 2024-09-18 03:38:44,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=361140.0, ans=0.2 2024-09-18 03:39:05,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 8.736e+01 9.238e+01 9.877e+01 2.557e+02, threshold=1.848e+02, percent-clipped=2.0 2024-09-18 03:39:10,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=361220.0, ans=0.0 2024-09-18 03:39:38,011 INFO [train.py:1198] (1/2) Epoch 20, batch 4350, loss[loss=0.2705, ctc_loss=0.1479, cr_loss=0.4012, attn_decoder_loss=0.2753, over 29522.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.139, cr_loss=0.3842, attn_decoder_loss=0.254, over 5797338.81 frames. ], batch size: 97, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:39:44,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=361300.0, ans=0.025 2024-09-18 03:39:47,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.11 vs. limit=15.0 2024-09-18 03:39:58,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=361340.0, ans=0.125 2024-09-18 03:40:00,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=361340.0, ans=0.125 2024-09-18 03:40:06,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=361380.0, ans=0.125 2024-09-18 03:40:21,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.09 vs. limit=15.0 2024-09-18 03:40:29,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-18 03:40:36,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-09-18 03:40:50,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=361500.0, ans=0.0 2024-09-18 03:40:51,530 INFO [train.py:1198] (1/2) Epoch 20, batch 4400, loss[loss=0.2586, ctc_loss=0.153, cr_loss=0.3917, attn_decoder_loss=0.2616, over 27409.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1406, cr_loss=0.387, attn_decoder_loss=0.256, over 5766900.26 frames. ], batch size: 125, lr: 5.51e-03, grad_scale: 16.0 2024-09-18 03:41:28,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2024-09-18 03:41:34,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.117e+01 8.833e+01 9.166e+01 9.784e+01 1.631e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-18 03:41:50,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=10.59 vs. limit=12.0 2024-09-18 03:42:01,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2024-09-18 03:42:02,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=361660.0, ans=0.125 2024-09-18 03:42:05,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=361700.0, ans=0.2 2024-09-18 03:42:06,988 INFO [train.py:1198] (1/2) Epoch 20, batch 4450, loss[loss=0.2761, ctc_loss=0.1758, cr_loss=0.4038, attn_decoder_loss=0.2782, over 20234.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.145, cr_loss=0.3919, attn_decoder_loss=0.2586, over 5573449.65 frames. ], batch size: 210, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:42:16,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=361700.0, ans=0.125 2024-09-18 03:42:28,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=361740.0, ans=0.125 2024-09-18 03:42:39,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=361780.0, ans=0.125 2024-09-18 03:42:48,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361780.0, ans=0.1 2024-09-18 03:43:11,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=361860.0, ans=0.0 2024-09-18 03:43:22,862 INFO [train.py:1198] (1/2) Epoch 20, batch 4500, loss[loss=0.2658, ctc_loss=0.1689, cr_loss=0.3874, attn_decoder_loss=0.2679, over 20263.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1498, cr_loss=0.3948, attn_decoder_loss=0.2611, over 5234671.82 frames. ], batch size: 210, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:43:33,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361900.0, ans=0.1 2024-09-18 03:43:34,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-18 03:43:50,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-09-18 03:44:52,331 INFO [train.py:1198] (1/2) Epoch 21, batch 0, loss[loss=0.2261, ctc_loss=0.1132, cr_loss=0.3427, attn_decoder_loss=0.2311, over 29620.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1132, cr_loss=0.3427, attn_decoder_loss=0.2311, over 29620.00 frames. ], batch size: 73, lr: 5.37e-03, grad_scale: 16.0 2024-09-18 03:44:52,331 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 03:45:10,775 INFO [train.py:1230] (1/2) Epoch 21, validation: loss=0.2126, ctc_loss=0.0391, cr_loss=5.275e-15, attn_decoder_loss=0.2319, over 944034.00 frames. 2024-09-18 03:45:10,775 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 03:45:19,728 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.076e+02 1.145e+02 1.241e+02 1.705e+02, threshold=2.291e+02, percent-clipped=0.0 2024-09-18 03:45:42,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=362080.0, ans=0.125 2024-09-18 03:46:01,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=362120.0, ans=0.1 2024-09-18 03:46:11,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=362160.0, ans=0.125 2024-09-18 03:46:19,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=362160.0, ans=0.0 2024-09-18 03:46:24,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=362160.0, ans=0.0 2024-09-18 03:46:28,338 INFO [train.py:1198] (1/2) Epoch 21, batch 50, loss[loss=0.2173, ctc_loss=0.1103, cr_loss=0.3455, attn_decoder_loss=0.2215, over 29446.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1389, cr_loss=0.3837, attn_decoder_loss=0.2518, over 1267937.69 frames. ], batch size: 70, lr: 5.37e-03, grad_scale: 8.0 2024-09-18 03:46:30,196 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:46:30,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.44 vs. limit=15.0 2024-09-18 03:47:10,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-09-18 03:47:26,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=362320.0, ans=0.0 2024-09-18 03:47:38,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.66 vs. limit=12.0 2024-09-18 03:47:44,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-09-18 03:47:46,710 INFO [train.py:1198] (1/2) Epoch 21, batch 100, loss[loss=0.2458, ctc_loss=0.1333, cr_loss=0.3756, attn_decoder_loss=0.2499, over 29540.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1401, cr_loss=0.3863, attn_decoder_loss=0.2537, over 2251903.63 frames. ], batch size: 76, lr: 5.37e-03, grad_scale: 8.0 2024-09-18 03:47:55,559 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.793e+01 9.358e+01 9.884e+01 2.727e+02, threshold=1.872e+02, percent-clipped=1.0 2024-09-18 03:47:56,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-09-18 03:48:00,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=362440.0, ans=0.125 2024-09-18 03:48:16,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=362480.0, ans=0.0 2024-09-18 03:48:17,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362480.0, ans=0.1 2024-09-18 03:48:31,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=362520.0, ans=0.07 2024-09-18 03:48:32,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2024-09-18 03:48:48,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=362560.0, ans=0.05 2024-09-18 03:49:01,017 INFO [train.py:1198] (1/2) Epoch 21, batch 150, loss[loss=0.2139, ctc_loss=0.1204, cr_loss=0.3497, attn_decoder_loss=0.2165, over 29420.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1363, cr_loss=0.3787, attn_decoder_loss=0.2504, over 3046567.10 frames. ], batch size: 70, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:49:13,660 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:49:15,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362640.0, ans=0.1 2024-09-18 03:49:24,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=362640.0, ans=0.125 2024-09-18 03:49:40,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=362680.0, ans=0.0 2024-09-18 03:49:55,409 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:50:09,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-09-18 03:50:18,545 INFO [train.py:1198] (1/2) Epoch 21, batch 200, loss[loss=0.2672, ctc_loss=0.1573, cr_loss=0.4149, attn_decoder_loss=0.2702, over 27157.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1359, cr_loss=0.3789, attn_decoder_loss=0.2498, over 3658875.53 frames. ], batch size: 124, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:50:26,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=362800.0, ans=0.125 2024-09-18 03:50:27,601 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.461e+01 9.001e+01 9.601e+01 1.394e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-18 03:50:43,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=362840.0, ans=0.0 2024-09-18 03:50:43,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-09-18 03:50:52,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.85 vs. limit=22.5 2024-09-18 03:50:57,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=362880.0, ans=0.125 2024-09-18 03:51:04,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=362880.0, ans=0.125 2024-09-18 03:51:05,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=362920.0, ans=0.09899494936611666 2024-09-18 03:51:07,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=362920.0, ans=0.0 2024-09-18 03:51:09,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-18 03:51:37,224 INFO [train.py:1198] (1/2) Epoch 21, batch 250, loss[loss=0.2634, ctc_loss=0.1498, cr_loss=0.4251, attn_decoder_loss=0.2666, over 29165.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1357, cr_loss=0.3793, attn_decoder_loss=0.2497, over 4140081.45 frames. ], batch size: 100, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:51:45,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=363000.0, ans=0.2 2024-09-18 03:52:05,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2024-09-18 03:52:12,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=363080.0, ans=0.035 2024-09-18 03:52:36,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=363160.0, ans=0.1 2024-09-18 03:52:53,505 INFO [train.py:1198] (1/2) Epoch 21, batch 300, loss[loss=0.2653, ctc_loss=0.159, cr_loss=0.4155, attn_decoder_loss=0.2679, over 29493.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1354, cr_loss=0.379, attn_decoder_loss=0.2496, over 4510453.95 frames. ], batch size: 92, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:52:54,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=363200.0, ans=15.0 2024-09-18 03:52:58,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=363200.0, ans=0.125 2024-09-18 03:53:02,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.788e+01 8.424e+01 9.085e+01 9.553e+01 2.134e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-18 03:53:13,765 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:53:22,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363280.0, ans=0.1 2024-09-18 03:53:22,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=363280.0, ans=0.2 2024-09-18 03:53:31,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=363280.0, ans=0.125 2024-09-18 03:53:44,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.37 vs. limit=10.0 2024-09-18 03:54:03,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=363360.0, ans=0.2 2024-09-18 03:54:11,653 INFO [train.py:1198] (1/2) Epoch 21, batch 350, loss[loss=0.23, ctc_loss=0.125, cr_loss=0.3528, attn_decoder_loss=0.2338, over 29345.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1358, cr_loss=0.3795, attn_decoder_loss=0.2499, over 4796980.01 frames. ], batch size: 71, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:54:30,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=363440.0, ans=0.1 2024-09-18 03:54:37,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=363440.0, ans=0.125 2024-09-18 03:55:29,552 INFO [train.py:1198] (1/2) Epoch 21, batch 400, loss[loss=0.2536, ctc_loss=0.1467, cr_loss=0.3966, attn_decoder_loss=0.2567, over 29713.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1355, cr_loss=0.3785, attn_decoder_loss=0.2497, over 5023479.03 frames. ], batch size: 82, lr: 5.36e-03, grad_scale: 16.0 2024-09-18 03:55:38,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.497e+01 9.045e+01 9.813e+01 2.448e+02, threshold=1.809e+02, percent-clipped=2.0 2024-09-18 03:55:45,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=363640.0, ans=0.035 2024-09-18 03:56:12,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=363680.0, ans=0.0 2024-09-18 03:56:39,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=363760.0, ans=0.0 2024-09-18 03:56:41,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=363760.0, ans=0.0 2024-09-18 03:56:45,199 INFO [train.py:1198] (1/2) Epoch 21, batch 450, loss[loss=0.2454, ctc_loss=0.1282, cr_loss=0.3506, attn_decoder_loss=0.2506, over 29687.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1358, cr_loss=0.3789, attn_decoder_loss=0.25, over 5185122.53 frames. ], batch size: 83, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:56:54,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=363800.0, ans=0.0 2024-09-18 03:56:57,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=363800.0, ans=0.125 2024-09-18 03:57:06,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363840.0, ans=0.1 2024-09-18 03:57:08,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=363840.0, ans=0.0 2024-09-18 03:57:22,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.30 vs. limit=15.0 2024-09-18 03:57:52,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=363960.0, ans=0.0 2024-09-18 03:57:54,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2024-09-18 03:58:01,714 INFO [train.py:1198] (1/2) Epoch 21, batch 500, loss[loss=0.2636, ctc_loss=0.1493, cr_loss=0.408, attn_decoder_loss=0.2672, over 29404.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.135, cr_loss=0.3782, attn_decoder_loss=0.2494, over 5329033.05 frames. ], batch size: 94, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 03:58:05,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.84 vs. limit=12.0 2024-09-18 03:58:14,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.455e+01 8.968e+01 9.588e+01 2.224e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-18 03:58:18,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=364040.0, ans=0.125 2024-09-18 03:58:31,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=364040.0, ans=0.0 2024-09-18 03:58:32,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2024-09-18 03:58:47,732 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:59:16,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=364160.0, ans=0.125 2024-09-18 03:59:22,165 INFO [train.py:1198] (1/2) Epoch 21, batch 550, loss[loss=0.2635, ctc_loss=0.1403, cr_loss=0.3815, attn_decoder_loss=0.2687, over 28901.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1351, cr_loss=0.3777, attn_decoder_loss=0.2497, over 5420949.85 frames. ], batch size: 104, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 03:59:31,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=364200.0, ans=0.125 2024-09-18 03:59:40,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=364240.0, ans=0.0 2024-09-18 03:59:45,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=364240.0, ans=6.0 2024-09-18 04:00:09,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=364320.0, ans=0.2 2024-09-18 04:00:23,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=364360.0, ans=10.0 2024-09-18 04:00:38,401 INFO [train.py:1198] (1/2) Epoch 21, batch 600, loss[loss=0.2624, ctc_loss=0.1427, cr_loss=0.4014, attn_decoder_loss=0.2668, over 29305.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1354, cr_loss=0.3785, attn_decoder_loss=0.2499, over 5507914.81 frames. ], batch size: 100, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 04:00:39,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=15.0 2024-09-18 04:00:43,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=364400.0, ans=0.05 2024-09-18 04:00:48,929 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.547e+01 9.115e+01 9.764e+01 2.691e+02, threshold=1.823e+02, percent-clipped=3.0 2024-09-18 04:00:56,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364440.0, ans=0.1 2024-09-18 04:01:08,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=364480.0, ans=0.1 2024-09-18 04:01:23,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=364520.0, ans=0.125 2024-09-18 04:01:42,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=364560.0, ans=0.0 2024-09-18 04:01:49,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364560.0, ans=0.1 2024-09-18 04:01:53,841 INFO [train.py:1198] (1/2) Epoch 21, batch 650, loss[loss=0.2358, ctc_loss=0.1206, cr_loss=0.3486, attn_decoder_loss=0.2409, over 29768.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1342, cr_loss=0.3767, attn_decoder_loss=0.2488, over 5585798.05 frames. ], batch size: 81, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 04:02:01,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=364600.0, ans=0.125 2024-09-18 04:02:23,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=364640.0, ans=0.0 2024-09-18 04:02:27,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=364680.0, ans=0.025 2024-09-18 04:03:14,772 INFO [train.py:1198] (1/2) Epoch 21, batch 700, loss[loss=0.2344, ctc_loss=0.1263, cr_loss=0.3554, attn_decoder_loss=0.2385, over 29540.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1347, cr_loss=0.3773, attn_decoder_loss=0.2495, over 5636347.20 frames. ], batch size: 76, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 04:03:21,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=364800.0, ans=0.025 2024-09-18 04:03:24,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=364800.0, ans=0.1 2024-09-18 04:03:25,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.553e+01 9.088e+01 9.665e+01 1.426e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-18 04:04:22,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2024-09-18 04:04:30,487 INFO [train.py:1198] (1/2) Epoch 21, batch 750, loss[loss=0.2595, ctc_loss=0.1446, cr_loss=0.4154, attn_decoder_loss=0.2631, over 29710.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1349, cr_loss=0.3773, attn_decoder_loss=0.2492, over 5673251.65 frames. ], batch size: 82, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 04:04:38,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=365000.0, ans=0.0 2024-09-18 04:04:39,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=365000.0, ans=0.125 2024-09-18 04:04:45,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=365040.0, ans=0.125 2024-09-18 04:04:51,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=365040.0, ans=0.0 2024-09-18 04:04:52,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-09-18 04:04:56,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=365040.0, ans=0.0 2024-09-18 04:05:28,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=365120.0, ans=0.125 2024-09-18 04:05:31,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=365160.0, ans=0.125 2024-09-18 04:05:45,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=365200.0, ans=0.2 2024-09-18 04:05:46,211 INFO [train.py:1198] (1/2) Epoch 21, batch 800, loss[loss=0.2238, ctc_loss=0.1122, cr_loss=0.3268, attn_decoder_loss=0.2289, over 29596.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.135, cr_loss=0.3772, attn_decoder_loss=0.2492, over 5704299.55 frames. ], batch size: 73, lr: 5.35e-03, grad_scale: 16.0 2024-09-18 04:05:46,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=365200.0, ans=0.125 2024-09-18 04:05:56,666 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.575e+01 9.275e+01 9.797e+01 6.839e+02, threshold=1.855e+02, percent-clipped=2.0 2024-09-18 04:06:18,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=365280.0, ans=0.125 2024-09-18 04:06:33,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=365280.0, ans=0.95 2024-09-18 04:06:41,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=365320.0, ans=0.0 2024-09-18 04:06:44,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=365320.0, ans=0.09899494936611666 2024-09-18 04:07:06,320 INFO [train.py:1198] (1/2) Epoch 21, batch 850, loss[loss=0.2463, ctc_loss=0.1361, cr_loss=0.3769, attn_decoder_loss=0.2502, over 29729.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1348, cr_loss=0.3771, attn_decoder_loss=0.249, over 5733455.53 frames. ], batch size: 89, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:07:15,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=365400.0, ans=0.2 2024-09-18 04:07:26,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=365440.0, ans=0.125 2024-09-18 04:07:26,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=365440.0, ans=0.0 2024-09-18 04:07:51,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=12.0 2024-09-18 04:08:22,627 INFO [train.py:1198] (1/2) Epoch 21, batch 900, loss[loss=0.2237, ctc_loss=0.1178, cr_loss=0.3435, attn_decoder_loss=0.2278, over 29607.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.135, cr_loss=0.3771, attn_decoder_loss=0.2494, over 5738002.31 frames. ], batch size: 73, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:08:28,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2024-09-18 04:08:32,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.61 vs. limit=12.0 2024-09-18 04:08:34,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.573e+01 9.119e+01 9.639e+01 3.066e+02, threshold=1.824e+02, percent-clipped=3.0 2024-09-18 04:08:36,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=365640.0, ans=0.025 2024-09-18 04:08:55,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-09-18 04:09:22,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2024-09-18 04:09:31,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-09-18 04:09:38,095 INFO [train.py:1198] (1/2) Epoch 21, batch 950, loss[loss=0.2303, ctc_loss=0.1237, cr_loss=0.3555, attn_decoder_loss=0.2343, over 29528.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1351, cr_loss=0.3767, attn_decoder_loss=0.2494, over 5741431.98 frames. ], batch size: 74, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:09:38,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365800.0, ans=0.1 2024-09-18 04:09:59,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.46 vs. limit=10.0 2024-09-18 04:10:13,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=365880.0, ans=0.125 2024-09-18 04:10:20,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=365880.0, ans=0.09899494936611666 2024-09-18 04:10:32,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365920.0, ans=0.1 2024-09-18 04:10:41,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=365960.0, ans=0.125 2024-09-18 04:10:44,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-09-18 04:10:48,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2024-09-18 04:10:58,230 INFO [train.py:1198] (1/2) Epoch 21, batch 1000, loss[loss=0.2257, ctc_loss=0.1152, cr_loss=0.341, attn_decoder_loss=0.2304, over 29509.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1359, cr_loss=0.3783, attn_decoder_loss=0.2502, over 5736789.28 frames. ], batch size: 77, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:11:10,224 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 8.708e+01 9.150e+01 9.911e+01 2.107e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-18 04:11:15,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.07 vs. limit=22.5 2024-09-18 04:11:16,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=366040.0, ans=0.125 2024-09-18 04:11:29,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=366080.0, ans=0.2 2024-09-18 04:11:38,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2024-09-18 04:12:03,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=366160.0, ans=0.125 2024-09-18 04:12:05,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=366160.0, ans=0.125 2024-09-18 04:12:13,877 INFO [train.py:1198] (1/2) Epoch 21, batch 1050, loss[loss=0.2465, ctc_loss=0.1279, cr_loss=0.3704, attn_decoder_loss=0.2515, over 29677.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1355, cr_loss=0.3774, attn_decoder_loss=0.2495, over 5744721.41 frames. ], batch size: 85, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:12:20,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.78 vs. limit=22.5 2024-09-18 04:12:23,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.06 vs. limit=15.0 2024-09-18 04:12:39,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.78 vs. limit=6.0 2024-09-18 04:13:04,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-18 04:13:07,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=366320.0, ans=0.125 2024-09-18 04:13:21,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2024-09-18 04:13:30,129 INFO [train.py:1198] (1/2) Epoch 21, batch 1100, loss[loss=0.2481, ctc_loss=0.1337, cr_loss=0.3776, attn_decoder_loss=0.2524, over 29461.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1352, cr_loss=0.3768, attn_decoder_loss=0.2493, over 5756203.77 frames. ], batch size: 78, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:13:35,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.83 vs. limit=22.5 2024-09-18 04:13:42,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.489e+01 9.148e+01 9.741e+01 7.755e+02, threshold=1.830e+02, percent-clipped=3.0 2024-09-18 04:13:50,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.26 vs. limit=15.0 2024-09-18 04:14:33,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=366560.0, ans=0.125 2024-09-18 04:14:50,298 INFO [train.py:1198] (1/2) Epoch 21, batch 1150, loss[loss=0.2444, ctc_loss=0.1399, cr_loss=0.4013, attn_decoder_loss=0.2471, over 29443.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1352, cr_loss=0.3777, attn_decoder_loss=0.2493, over 5754278.46 frames. ], batch size: 78, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:15:05,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2024-09-18 04:15:07,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=366640.0, ans=0.125 2024-09-18 04:15:20,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=366680.0, ans=0.125 2024-09-18 04:15:22,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=366680.0, ans=0.1 2024-09-18 04:15:38,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.60 vs. limit=15.0 2024-09-18 04:15:39,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=366720.0, ans=0.05 2024-09-18 04:16:05,930 INFO [train.py:1198] (1/2) Epoch 21, batch 1200, loss[loss=0.2482, ctc_loss=0.1244, cr_loss=0.3533, attn_decoder_loss=0.2541, over 29680.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1356, cr_loss=0.3783, attn_decoder_loss=0.25, over 5745884.24 frames. ], batch size: 85, lr: 5.33e-03, grad_scale: 16.0 2024-09-18 04:16:18,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=366800.0, ans=0.125 2024-09-18 04:16:19,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.603e+01 9.203e+01 9.910e+01 1.694e+02, threshold=1.841e+02, percent-clipped=0.0 2024-09-18 04:16:27,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=366840.0, ans=0.125 2024-09-18 04:16:33,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=366840.0, ans=0.025 2024-09-18 04:16:35,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=366880.0, ans=0.0 2024-09-18 04:16:38,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2024-09-18 04:16:39,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=366880.0, ans=0.125 2024-09-18 04:16:41,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=366880.0, ans=0.1 2024-09-18 04:16:41,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=366880.0, ans=0.07 2024-09-18 04:16:41,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=366880.0, ans=0.025 2024-09-18 04:16:41,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.01 vs. limit=22.5 2024-09-18 04:16:45,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=366880.0, ans=0.025 2024-09-18 04:17:03,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2024-09-18 04:17:07,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=366960.0, ans=0.1 2024-09-18 04:17:20,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=367000.0, ans=0.0 2024-09-18 04:17:22,064 INFO [train.py:1198] (1/2) Epoch 21, batch 1250, loss[loss=0.2687, ctc_loss=0.1524, cr_loss=0.4125, attn_decoder_loss=0.2725, over 29519.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1362, cr_loss=0.3799, attn_decoder_loss=0.2509, over 5773772.44 frames. ], batch size: 92, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:17:27,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=367000.0, ans=0.07 2024-09-18 04:17:29,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=367000.0, ans=0.0 2024-09-18 04:18:19,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=367120.0, ans=0.2 2024-09-18 04:18:41,102 INFO [train.py:1198] (1/2) Epoch 21, batch 1300, loss[loss=0.2556, ctc_loss=0.1349, cr_loss=0.3812, attn_decoder_loss=0.2606, over 28552.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1356, cr_loss=0.3792, attn_decoder_loss=0.2501, over 5780485.29 frames. ], batch size: 112, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:18:54,774 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.475e+01 9.131e+01 9.688e+01 1.292e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-18 04:19:10,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.02 vs. limit=15.0 2024-09-18 04:19:10,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.98 vs. limit=10.0 2024-09-18 04:19:16,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=367280.0, ans=0.025 2024-09-18 04:19:33,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=367320.0, ans=0.2 2024-09-18 04:19:40,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2024-09-18 04:19:55,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=367400.0, ans=0.125 2024-09-18 04:19:57,012 INFO [train.py:1198] (1/2) Epoch 21, batch 1350, loss[loss=0.244, ctc_loss=0.1299, cr_loss=0.3742, attn_decoder_loss=0.2483, over 29794.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1351, cr_loss=0.3784, attn_decoder_loss=0.25, over 5796255.89 frames. ], batch size: 81, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:20:09,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=367400.0, ans=0.125 2024-09-18 04:20:20,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=367440.0, ans=0.125 2024-09-18 04:20:42,498 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:21:02,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=367560.0, ans=0.0 2024-09-18 04:21:12,476 INFO [train.py:1198] (1/2) Epoch 21, batch 1400, loss[loss=0.2111, ctc_loss=0.1134, cr_loss=0.3311, attn_decoder_loss=0.2146, over 29589.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1351, cr_loss=0.3784, attn_decoder_loss=0.2498, over 5807647.88 frames. ], batch size: 69, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:21:25,901 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.438e+01 9.001e+01 9.853e+01 2.309e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-18 04:21:32,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=367640.0, ans=0.1 2024-09-18 04:21:50,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=367680.0, ans=0.125 2024-09-18 04:22:08,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=367720.0, ans=0.0 2024-09-18 04:22:17,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=367760.0, ans=10.0 2024-09-18 04:22:22,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2024-09-18 04:22:25,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2024-09-18 04:22:26,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=367760.0, ans=0.125 2024-09-18 04:22:32,117 INFO [train.py:1198] (1/2) Epoch 21, batch 1450, loss[loss=0.2606, ctc_loss=0.146, cr_loss=0.3955, attn_decoder_loss=0.2646, over 29416.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1351, cr_loss=0.3788, attn_decoder_loss=0.2501, over 5804794.23 frames. ], batch size: 94, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:22:44,451 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:22:44,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=367800.0, ans=0.05 2024-09-18 04:22:46,117 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:22:55,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=367840.0, ans=0.125 2024-09-18 04:23:19,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=367920.0, ans=0.0 2024-09-18 04:23:23,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=367920.0, ans=0.125 2024-09-18 04:23:31,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=367960.0, ans=0.0 2024-09-18 04:23:43,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=367960.0, ans=0.5 2024-09-18 04:23:55,037 INFO [train.py:1198] (1/2) Epoch 21, batch 1500, loss[loss=0.2553, ctc_loss=0.1397, cr_loss=0.3864, attn_decoder_loss=0.2596, over 29618.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1351, cr_loss=0.3792, attn_decoder_loss=0.2502, over 5807549.13 frames. ], batch size: 86, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:24:08,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.602e+01 9.157e+01 9.632e+01 2.068e+02, threshold=1.831e+02, percent-clipped=2.0 2024-09-18 04:24:36,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=368080.0, ans=0.125 2024-09-18 04:24:44,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=368120.0, ans=0.04949747468305833 2024-09-18 04:24:46,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=22.5 2024-09-18 04:24:47,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=368120.0, ans=0.025 2024-09-18 04:24:53,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=368120.0, ans=0.0 2024-09-18 04:24:53,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=368120.0, ans=0.2 2024-09-18 04:25:11,429 INFO [train.py:1198] (1/2) Epoch 21, batch 1550, loss[loss=0.2699, ctc_loss=0.1599, cr_loss=0.4197, attn_decoder_loss=0.2728, over 29518.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1357, cr_loss=0.3793, attn_decoder_loss=0.2503, over 5782133.84 frames. ], batch size: 90, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:25:20,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=368200.0, ans=0.0 2024-09-18 04:25:41,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2024-09-18 04:25:54,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=368280.0, ans=0.125 2024-09-18 04:26:09,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=368320.0, ans=0.0 2024-09-18 04:26:13,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=368320.0, ans=0.0 2024-09-18 04:26:23,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=15.0 2024-09-18 04:26:31,487 INFO [train.py:1198] (1/2) Epoch 21, batch 1600, loss[loss=0.2572, ctc_loss=0.1339, cr_loss=0.3699, attn_decoder_loss=0.2627, over 29691.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1356, cr_loss=0.3783, attn_decoder_loss=0.2502, over 5764362.24 frames. ], batch size: 85, lr: 5.32e-03, grad_scale: 16.0 2024-09-18 04:26:39,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2024-09-18 04:26:46,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.333e+01 8.519e+01 9.030e+01 9.960e+01 2.636e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-18 04:26:58,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.28 vs. limit=22.5 2024-09-18 04:27:14,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=368480.0, ans=0.125 2024-09-18 04:27:32,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=368560.0, ans=0.125 2024-09-18 04:27:41,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=368560.0, ans=0.025 2024-09-18 04:27:47,311 INFO [train.py:1198] (1/2) Epoch 21, batch 1650, loss[loss=0.254, ctc_loss=0.1403, cr_loss=0.3952, attn_decoder_loss=0.2579, over 29696.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1354, cr_loss=0.3781, attn_decoder_loss=0.25, over 5758006.91 frames. ], batch size: 89, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:28:02,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=368640.0, ans=0.125 2024-09-18 04:28:06,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=368640.0, ans=0.125 2024-09-18 04:28:06,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=368640.0, ans=0.125 2024-09-18 04:28:22,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=368680.0, ans=0.125 2024-09-18 04:28:48,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=368760.0, ans=0.125 2024-09-18 04:29:03,551 INFO [train.py:1198] (1/2) Epoch 21, batch 1700, loss[loss=0.2158, ctc_loss=0.1141, cr_loss=0.3262, attn_decoder_loss=0.2199, over 29594.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1351, cr_loss=0.3777, attn_decoder_loss=0.2499, over 5780203.61 frames. ], batch size: 69, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:29:05,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=368800.0, ans=0.0 2024-09-18 04:29:07,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=368800.0, ans=0.2 2024-09-18 04:29:18,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.456e+01 9.072e+01 9.555e+01 1.411e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-18 04:29:20,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=368840.0, ans=0.0 2024-09-18 04:29:48,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.63 vs. limit=22.5 2024-09-18 04:29:53,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=368920.0, ans=0.0 2024-09-18 04:29:57,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=368920.0, ans=0.125 2024-09-18 04:30:14,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=368960.0, ans=0.0 2024-09-18 04:30:23,625 INFO [train.py:1198] (1/2) Epoch 21, batch 1750, loss[loss=0.2226, ctc_loss=0.119, cr_loss=0.342, attn_decoder_loss=0.2265, over 29372.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1349, cr_loss=0.3778, attn_decoder_loss=0.2497, over 5787707.11 frames. ], batch size: 67, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:30:46,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-09-18 04:30:52,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=369080.0, ans=0.125 2024-09-18 04:31:04,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=369080.0, ans=0.025 2024-09-18 04:31:19,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=369120.0, ans=0.0 2024-09-18 04:31:30,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-09-18 04:31:35,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2024-09-18 04:31:38,789 INFO [train.py:1198] (1/2) Epoch 21, batch 1800, loss[loss=0.257, ctc_loss=0.1434, cr_loss=0.3991, attn_decoder_loss=0.2608, over 29702.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.135, cr_loss=0.3783, attn_decoder_loss=0.2499, over 5789593.01 frames. ], batch size: 83, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:31:54,056 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.570e+01 9.201e+01 9.986e+01 1.467e+02, threshold=1.840e+02, percent-clipped=0.0 2024-09-18 04:31:55,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=12.0 2024-09-18 04:32:18,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=369280.0, ans=0.2 2024-09-18 04:32:27,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=369320.0, ans=0.2 2024-09-18 04:32:54,990 INFO [train.py:1198] (1/2) Epoch 21, batch 1850, loss[loss=0.2594, ctc_loss=0.1423, cr_loss=0.3806, attn_decoder_loss=0.264, over 29607.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1348, cr_loss=0.3778, attn_decoder_loss=0.2496, over 5797149.22 frames. ], batch size: 86, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:33:11,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-09-18 04:33:33,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=369480.0, ans=0.125 2024-09-18 04:33:48,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=369520.0, ans=0.125 2024-09-18 04:33:57,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=369560.0, ans=0.2 2024-09-18 04:34:11,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2024-09-18 04:34:15,224 INFO [train.py:1198] (1/2) Epoch 21, batch 1900, loss[loss=0.259, ctc_loss=0.149, cr_loss=0.4261, attn_decoder_loss=0.2618, over 29711.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1352, cr_loss=0.3788, attn_decoder_loss=0.2502, over 5803873.49 frames. ], batch size: 89, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:34:30,312 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.618e+01 9.006e+01 9.728e+01 3.211e+02, threshold=1.801e+02, percent-clipped=2.0 2024-09-18 04:34:44,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369680.0, ans=0.1 2024-09-18 04:34:52,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.72 vs. limit=15.0 2024-09-18 04:35:13,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=369720.0, ans=0.125 2024-09-18 04:35:16,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.67 vs. limit=22.5 2024-09-18 04:35:31,127 INFO [train.py:1198] (1/2) Epoch 21, batch 1950, loss[loss=0.2463, ctc_loss=0.1342, cr_loss=0.4065, attn_decoder_loss=0.2497, over 29456.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1356, cr_loss=0.38, attn_decoder_loss=0.251, over 5818220.92 frames. ], batch size: 78, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:35:55,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=369840.0, ans=0.125 2024-09-18 04:35:55,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=369840.0, ans=0.025 2024-09-18 04:35:57,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=369840.0, ans=10.0 2024-09-18 04:36:10,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=369880.0, ans=0.125 2024-09-18 04:36:31,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=369960.0, ans=0.025 2024-09-18 04:36:46,468 INFO [train.py:1198] (1/2) Epoch 21, batch 2000, loss[loss=0.2211, ctc_loss=0.1226, cr_loss=0.3533, attn_decoder_loss=0.2242, over 29337.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1361, cr_loss=0.3805, attn_decoder_loss=0.2513, over 5796869.98 frames. ], batch size: 67, lr: 5.31e-03, grad_scale: 16.0 2024-09-18 04:37:01,560 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.001e+01 8.831e+01 9.227e+01 9.765e+01 5.439e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-18 04:37:04,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=370040.0, ans=0.125 2024-09-18 04:37:10,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=8.0 2024-09-18 04:37:16,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=370080.0, ans=0.2 2024-09-18 04:37:24,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=370080.0, ans=0.125 2024-09-18 04:37:38,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=370120.0, ans=0.07 2024-09-18 04:37:41,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=370120.0, ans=0.125 2024-09-18 04:37:59,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2024-09-18 04:38:05,837 INFO [train.py:1198] (1/2) Epoch 21, batch 2050, loss[loss=0.2218, ctc_loss=0.1198, cr_loss=0.3641, attn_decoder_loss=0.2251, over 29421.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1355, cr_loss=0.379, attn_decoder_loss=0.2503, over 5787488.15 frames. ], batch size: 70, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:38:11,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=12.0 2024-09-18 04:38:15,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=370200.0, ans=0.125 2024-09-18 04:39:11,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=370360.0, ans=0.125 2024-09-18 04:39:12,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=370360.0, ans=0.0 2024-09-18 04:39:12,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=370360.0, ans=0.125 2024-09-18 04:39:21,716 INFO [train.py:1198] (1/2) Epoch 21, batch 2100, loss[loss=0.2311, ctc_loss=0.1207, cr_loss=0.3444, attn_decoder_loss=0.2357, over 29750.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1344, cr_loss=0.3767, attn_decoder_loss=0.2495, over 5798917.51 frames. ], batch size: 81, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:39:21,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370400.0, ans=0.1 2024-09-18 04:39:38,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.297e+01 8.835e+01 9.326e+01 1.551e+02, threshold=1.767e+02, percent-clipped=0.0 2024-09-18 04:40:07,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=370520.0, ans=0.2 2024-09-18 04:40:24,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-18 04:40:26,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=370560.0, ans=0.0 2024-09-18 04:40:32,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=370560.0, ans=0.125 2024-09-18 04:40:37,250 INFO [train.py:1198] (1/2) Epoch 21, batch 2150, loss[loss=0.24, ctc_loss=0.1388, cr_loss=0.3791, attn_decoder_loss=0.2429, over 29439.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1336, cr_loss=0.3755, attn_decoder_loss=0.2487, over 5814391.34 frames. ], batch size: 78, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:40:39,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2024-09-18 04:40:45,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=370600.0, ans=0.0 2024-09-18 04:40:57,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=370640.0, ans=0.2 2024-09-18 04:41:00,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=370640.0, ans=15.0 2024-09-18 04:41:04,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=370640.0, ans=0.125 2024-09-18 04:41:25,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.16 vs. limit=22.5 2024-09-18 04:41:33,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=370720.0, ans=0.125 2024-09-18 04:41:36,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=370720.0, ans=0.0 2024-09-18 04:41:54,697 INFO [train.py:1198] (1/2) Epoch 21, batch 2200, loss[loss=0.255, ctc_loss=0.1403, cr_loss=0.3886, attn_decoder_loss=0.2591, over 29643.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1341, cr_loss=0.3766, attn_decoder_loss=0.2489, over 5811591.28 frames. ], batch size: 86, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:42:06,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370800.0, ans=0.1 2024-09-18 04:42:13,472 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.585e+01 9.031e+01 9.683e+01 2.928e+02, threshold=1.806e+02, percent-clipped=3.0 2024-09-18 04:42:30,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2024-09-18 04:43:12,511 INFO [train.py:1198] (1/2) Epoch 21, batch 2250, loss[loss=0.2469, ctc_loss=0.1351, cr_loss=0.3857, attn_decoder_loss=0.2507, over 29695.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1341, cr_loss=0.3765, attn_decoder_loss=0.2488, over 5811091.91 frames. ], batch size: 82, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:43:24,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=371000.0, ans=0.2 2024-09-18 04:43:56,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371120.0, ans=0.1 2024-09-18 04:44:11,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=371160.0, ans=0.0 2024-09-18 04:44:28,717 INFO [train.py:1198] (1/2) Epoch 21, batch 2300, loss[loss=0.2257, ctc_loss=0.1152, cr_loss=0.339, attn_decoder_loss=0.2305, over 29331.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1336, cr_loss=0.3755, attn_decoder_loss=0.248, over 5798789.98 frames. ], batch size: 71, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:44:45,171 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.450e+01 8.935e+01 9.776e+01 2.210e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 04:45:21,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.42 vs. limit=12.0 2024-09-18 04:45:33,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.66 vs. limit=15.0 2024-09-18 04:45:46,558 INFO [train.py:1198] (1/2) Epoch 21, batch 2350, loss[loss=0.2476, ctc_loss=0.1376, cr_loss=0.3853, attn_decoder_loss=0.2513, over 29682.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1336, cr_loss=0.3755, attn_decoder_loss=0.2481, over 5805019.18 frames. ], batch size: 83, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:45:49,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371400.0, ans=0.1 2024-09-18 04:46:34,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=371520.0, ans=0.125 2024-09-18 04:46:43,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=371520.0, ans=0.125 2024-09-18 04:46:54,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=371560.0, ans=0.07 2024-09-18 04:46:57,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=371560.0, ans=0.0 2024-09-18 04:47:00,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=371560.0, ans=0.125 2024-09-18 04:47:04,808 INFO [train.py:1198] (1/2) Epoch 21, batch 2400, loss[loss=0.2448, ctc_loss=0.1437, cr_loss=0.3792, attn_decoder_loss=0.2476, over 29510.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1343, cr_loss=0.3768, attn_decoder_loss=0.2488, over 5808497.48 frames. ], batch size: 76, lr: 5.30e-03, grad_scale: 16.0 2024-09-18 04:47:11,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=371600.0, ans=0.125 2024-09-18 04:47:12,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=371600.0, ans=0.0 2024-09-18 04:47:21,501 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.473e+01 9.186e+01 9.665e+01 3.026e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-18 04:47:27,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=371640.0, ans=0.025 2024-09-18 04:47:35,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=371680.0, ans=0.125 2024-09-18 04:47:44,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=371680.0, ans=0.025 2024-09-18 04:48:04,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=371760.0, ans=0.125 2024-09-18 04:48:13,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=371760.0, ans=0.125 2024-09-18 04:48:16,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=371760.0, ans=0.0 2024-09-18 04:48:20,894 INFO [train.py:1198] (1/2) Epoch 21, batch 2450, loss[loss=0.2495, ctc_loss=0.1388, cr_loss=0.3901, attn_decoder_loss=0.2532, over 29711.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1344, cr_loss=0.377, attn_decoder_loss=0.2495, over 5784591.76 frames. ], batch size: 82, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:48:25,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=371800.0, ans=0.2 2024-09-18 04:48:40,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=371840.0, ans=0.125 2024-09-18 04:49:15,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=371920.0, ans=0.1 2024-09-18 04:49:20,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=371920.0, ans=0.1 2024-09-18 04:49:29,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=371960.0, ans=0.125 2024-09-18 04:49:30,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.60 vs. limit=15.0 2024-09-18 04:49:34,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=371960.0, ans=0.125 2024-09-18 04:49:38,820 INFO [train.py:1198] (1/2) Epoch 21, batch 2500, loss[loss=0.2571, ctc_loss=0.1363, cr_loss=0.387, attn_decoder_loss=0.262, over 29621.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1348, cr_loss=0.3782, attn_decoder_loss=0.2496, over 5795598.05 frames. ], batch size: 86, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:49:59,137 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.495e+01 9.101e+01 9.738e+01 1.875e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-18 04:50:26,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=372120.0, ans=0.07 2024-09-18 04:50:35,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=372120.0, ans=0.0 2024-09-18 04:50:57,260 INFO [train.py:1198] (1/2) Epoch 21, batch 2550, loss[loss=0.2201, ctc_loss=0.1151, cr_loss=0.349, attn_decoder_loss=0.224, over 29349.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1347, cr_loss=0.3785, attn_decoder_loss=0.2495, over 5798809.55 frames. ], batch size: 67, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:50:59,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=372200.0, ans=0.125 2024-09-18 04:51:14,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=372240.0, ans=0.0 2024-09-18 04:51:30,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=372280.0, ans=0.125 2024-09-18 04:51:35,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=372280.0, ans=0.025 2024-09-18 04:51:39,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=372280.0, ans=0.2 2024-09-18 04:51:42,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372320.0, ans=0.1 2024-09-18 04:51:52,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=372320.0, ans=0.125 2024-09-18 04:52:02,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=372360.0, ans=0.125 2024-09-18 04:52:08,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=372360.0, ans=0.2 2024-09-18 04:52:13,129 INFO [train.py:1198] (1/2) Epoch 21, batch 2600, loss[loss=0.239, ctc_loss=0.1336, cr_loss=0.3751, attn_decoder_loss=0.2424, over 29431.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1352, cr_loss=0.3791, attn_decoder_loss=0.2499, over 5794218.82 frames. ], batch size: 78, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:52:19,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372400.0, ans=0.1 2024-09-18 04:52:31,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 8.657e+01 9.187e+01 9.794e+01 2.069e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-18 04:52:40,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=372440.0, ans=0.125 2024-09-18 04:52:51,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372480.0, ans=0.1 2024-09-18 04:53:00,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=372520.0, ans=0.0 2024-09-18 04:53:14,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=372560.0, ans=0.125 2024-09-18 04:53:17,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.15 vs. limit=10.0 2024-09-18 04:53:17,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.37 vs. limit=15.0 2024-09-18 04:53:18,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.53 vs. limit=10.0 2024-09-18 04:53:30,475 INFO [train.py:1198] (1/2) Epoch 21, batch 2650, loss[loss=0.2641, ctc_loss=0.1451, cr_loss=0.3923, attn_decoder_loss=0.2686, over 29270.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1352, cr_loss=0.3795, attn_decoder_loss=0.2503, over 5800535.41 frames. ], batch size: 100, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:53:33,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=372600.0, ans=0.0 2024-09-18 04:53:49,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=372640.0, ans=0.125 2024-09-18 04:53:58,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=372640.0, ans=0.125 2024-09-18 04:54:22,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=372720.0, ans=0.0 2024-09-18 04:54:48,381 INFO [train.py:1198] (1/2) Epoch 21, batch 2700, loss[loss=0.2506, ctc_loss=0.1389, cr_loss=0.3822, attn_decoder_loss=0.2545, over 29531.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1358, cr_loss=0.3803, attn_decoder_loss=0.2508, over 5797580.06 frames. ], batch size: 87, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:54:56,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372800.0, ans=0.1 2024-09-18 04:55:06,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 8.585e+01 9.069e+01 9.661e+01 1.375e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-18 04:55:32,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=372920.0, ans=0.04949747468305833 2024-09-18 04:55:57,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=372960.0, ans=0.125 2024-09-18 04:56:04,538 INFO [train.py:1198] (1/2) Epoch 21, batch 2750, loss[loss=0.2281, ctc_loss=0.1293, cr_loss=0.3789, attn_decoder_loss=0.2307, over 29513.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1352, cr_loss=0.3794, attn_decoder_loss=0.2498, over 5795317.94 frames. ], batch size: 75, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:56:06,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373000.0, ans=0.1 2024-09-18 04:56:15,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=373000.0, ans=0.1 2024-09-18 04:56:35,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=373080.0, ans=0.125 2024-09-18 04:56:35,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-18 04:56:46,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=373080.0, ans=0.125 2024-09-18 04:56:55,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.73 vs. limit=15.0 2024-09-18 04:57:21,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=373200.0, ans=0.125 2024-09-18 04:57:21,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.06 vs. limit=15.0 2024-09-18 04:57:22,200 INFO [train.py:1198] (1/2) Epoch 21, batch 2800, loss[loss=0.2641, ctc_loss=0.164, cr_loss=0.3937, attn_decoder_loss=0.2665, over 20587.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1355, cr_loss=0.3797, attn_decoder_loss=0.2502, over 5777742.02 frames. ], batch size: 209, lr: 5.29e-03, grad_scale: 16.0 2024-09-18 04:57:24,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=373200.0, ans=0.07 2024-09-18 04:57:30,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=373200.0, ans=0.0 2024-09-18 04:57:43,972 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.642e+01 9.187e+01 1.013e+02 2.371e+02, threshold=1.837e+02, percent-clipped=3.0 2024-09-18 04:57:47,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=373240.0, ans=0.125 2024-09-18 04:57:59,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=373280.0, ans=0.125 2024-09-18 04:58:01,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=373280.0, ans=0.125 2024-09-18 04:58:04,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2024-09-18 04:58:20,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=373320.0, ans=0.125 2024-09-18 04:58:35,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=373360.0, ans=0.125 2024-09-18 04:58:40,011 INFO [train.py:1198] (1/2) Epoch 21, batch 2850, loss[loss=0.2415, ctc_loss=0.1321, cr_loss=0.3922, attn_decoder_loss=0.245, over 29518.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.136, cr_loss=0.3802, attn_decoder_loss=0.2506, over 5762212.64 frames. ], batch size: 77, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:58:41,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=373400.0, ans=0.125 2024-09-18 04:58:53,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=373440.0, ans=0.1 2024-09-18 04:58:56,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=373440.0, ans=0.0 2024-09-18 04:59:09,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=373480.0, ans=0.125 2024-09-18 04:59:10,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=373480.0, ans=0.125 2024-09-18 04:59:34,883 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:59:37,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=373520.0, ans=0.125 2024-09-18 04:59:39,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=15.0 2024-09-18 04:59:42,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=373560.0, ans=0.07 2024-09-18 04:59:42,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=373560.0, ans=0.07 2024-09-18 04:59:47,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=373560.0, ans=0.125 2024-09-18 04:59:54,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=373600.0, ans=0.125 2024-09-18 04:59:56,350 INFO [train.py:1198] (1/2) Epoch 21, batch 2900, loss[loss=0.2288, ctc_loss=0.1155, cr_loss=0.3531, attn_decoder_loss=0.2336, over 29427.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1363, cr_loss=0.3817, attn_decoder_loss=0.2516, over 5787421.58 frames. ], batch size: 79, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 05:00:15,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.588e+01 9.125e+01 9.672e+01 3.101e+02, threshold=1.825e+02, percent-clipped=2.0 2024-09-18 05:00:17,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=373640.0, ans=0.125 2024-09-18 05:01:12,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=373800.0, ans=0.125 2024-09-18 05:01:13,983 INFO [train.py:1198] (1/2) Epoch 21, batch 2950, loss[loss=0.2345, ctc_loss=0.1239, cr_loss=0.361, attn_decoder_loss=0.2388, over 29525.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1351, cr_loss=0.3787, attn_decoder_loss=0.2501, over 5782514.73 frames. ], batch size: 75, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:01:18,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=373800.0, ans=0.125 2024-09-18 05:01:21,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=373800.0, ans=0.125 2024-09-18 05:01:21,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=373800.0, ans=0.09899494936611666 2024-09-18 05:01:42,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=373840.0, ans=0.125 2024-09-18 05:01:43,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=373840.0, ans=0.0 2024-09-18 05:01:49,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=373880.0, ans=0.125 2024-09-18 05:02:10,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=373920.0, ans=0.0 2024-09-18 05:02:15,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=373960.0, ans=0.125 2024-09-18 05:02:18,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=373960.0, ans=0.125 2024-09-18 05:02:21,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=373960.0, ans=0.2 2024-09-18 05:02:27,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=373960.0, ans=0.0 2024-09-18 05:02:31,642 INFO [train.py:1198] (1/2) Epoch 21, batch 3000, loss[loss=0.2508, ctc_loss=0.143, cr_loss=0.3898, attn_decoder_loss=0.2542, over 29777.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1347, cr_loss=0.3777, attn_decoder_loss=0.2497, over 5783262.87 frames. ], batch size: 81, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:02:31,642 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 05:02:39,437 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3577, 3.0348, 2.4504, 2.7416, 2.2709, 2.8968, 2.8559, 3.0956], device='cuda:1') 2024-09-18 05:02:50,162 INFO [train.py:1230] (1/2) Epoch 21, validation: loss=0.2116, ctc_loss=0.03952, cr_loss=5.001e-15, attn_decoder_loss=0.2307, over 944034.00 frames. 2024-09-18 05:02:50,162 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 05:02:53,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=374000.0, ans=10.0 2024-09-18 05:02:57,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2024-09-18 05:03:10,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.620e+01 9.343e+01 9.937e+01 2.049e+02, threshold=1.869e+02, percent-clipped=2.0 2024-09-18 05:03:12,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=374040.0, ans=0.1 2024-09-18 05:03:33,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=374080.0, ans=0.2 2024-09-18 05:03:44,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=374120.0, ans=0.2 2024-09-18 05:04:06,324 INFO [train.py:1198] (1/2) Epoch 21, batch 3050, loss[loss=0.2256, ctc_loss=0.1272, cr_loss=0.359, attn_decoder_loss=0.2286, over 29546.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1355, cr_loss=0.3792, attn_decoder_loss=0.2504, over 5776934.68 frames. ], batch size: 76, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:04:18,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=374200.0, ans=22.5 2024-09-18 05:04:19,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=374200.0, ans=0.125 2024-09-18 05:04:37,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374280.0, ans=0.125 2024-09-18 05:04:46,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374280.0, ans=0.1 2024-09-18 05:04:50,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=374280.0, ans=0.125 2024-09-18 05:04:57,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=374320.0, ans=0.1 2024-09-18 05:05:01,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=374320.0, ans=0.125 2024-09-18 05:05:12,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=374360.0, ans=0.125 2024-09-18 05:05:26,565 INFO [train.py:1198] (1/2) Epoch 21, batch 3100, loss[loss=0.2618, ctc_loss=0.153, cr_loss=0.4146, attn_decoder_loss=0.2646, over 29232.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1353, cr_loss=0.3792, attn_decoder_loss=0.2502, over 5777603.04 frames. ], batch size: 100, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:05:26,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=374400.0, ans=0.2 2024-09-18 05:05:31,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=374400.0, ans=0.0 2024-09-18 05:05:35,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=374400.0, ans=0.0 2024-09-18 05:05:44,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=374440.0, ans=0.125 2024-09-18 05:05:45,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.504e+01 9.125e+01 9.577e+01 2.431e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-18 05:05:52,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=374440.0, ans=0.0 2024-09-18 05:05:58,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=374480.0, ans=0.125 2024-09-18 05:06:28,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=374560.0, ans=0.125 2024-09-18 05:06:41,991 INFO [train.py:1198] (1/2) Epoch 21, batch 3150, loss[loss=0.2601, ctc_loss=0.1497, cr_loss=0.4078, attn_decoder_loss=0.2633, over 28738.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1353, cr_loss=0.3792, attn_decoder_loss=0.2501, over 5783110.60 frames. ], batch size: 104, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:06:55,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=374640.0, ans=0.025 2024-09-18 05:06:59,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=374640.0, ans=0.2 2024-09-18 05:07:05,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2024-09-18 05:07:17,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=374680.0, ans=0.0 2024-09-18 05:07:17,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=374680.0, ans=0.0 2024-09-18 05:07:30,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374720.0, ans=0.1 2024-09-18 05:07:30,834 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:07:47,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=374760.0, ans=0.025 2024-09-18 05:07:57,648 INFO [train.py:1198] (1/2) Epoch 21, batch 3200, loss[loss=0.2447, ctc_loss=0.1257, cr_loss=0.3761, attn_decoder_loss=0.2495, over 29407.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.135, cr_loss=0.3784, attn_decoder_loss=0.2497, over 5792974.03 frames. ], batch size: 79, lr: 5.28e-03, grad_scale: 16.0 2024-09-18 05:08:03,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=374800.0, ans=0.5 2024-09-18 05:08:20,843 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.671e+01 9.297e+01 1.015e+02 2.448e+02, threshold=1.859e+02, percent-clipped=1.0 2024-09-18 05:08:30,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=374880.0, ans=0.0 2024-09-18 05:08:48,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=374920.0, ans=0.125 2024-09-18 05:08:56,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2024-09-18 05:09:15,148 INFO [train.py:1198] (1/2) Epoch 21, batch 3250, loss[loss=0.2542, ctc_loss=0.1348, cr_loss=0.3867, attn_decoder_loss=0.2589, over 29695.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1348, cr_loss=0.3787, attn_decoder_loss=0.2498, over 5799822.87 frames. ], batch size: 84, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:09:15,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=375000.0, ans=0.125 2024-09-18 05:10:10,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=375120.0, ans=0.125 2024-09-18 05:10:13,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=375120.0, ans=0.1 2024-09-18 05:10:18,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=375160.0, ans=0.125 2024-09-18 05:10:28,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375160.0, ans=0.1 2024-09-18 05:10:33,523 INFO [train.py:1198] (1/2) Epoch 21, batch 3300, loss[loss=0.2559, ctc_loss=0.1387, cr_loss=0.3716, attn_decoder_loss=0.2607, over 28272.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1345, cr_loss=0.3776, attn_decoder_loss=0.2488, over 5797975.68 frames. ], batch size: 111, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:10:43,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=375200.0, ans=0.1 2024-09-18 05:10:54,884 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.586e+01 9.172e+01 9.727e+01 2.274e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-18 05:11:07,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=375280.0, ans=0.0 2024-09-18 05:11:32,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=375360.0, ans=0.125 2024-09-18 05:11:34,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=375360.0, ans=0.125 2024-09-18 05:11:48,828 INFO [train.py:1198] (1/2) Epoch 21, batch 3350, loss[loss=0.2603, ctc_loss=0.1508, cr_loss=0.426, attn_decoder_loss=0.263, over 28745.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1353, cr_loss=0.379, attn_decoder_loss=0.2497, over 5775029.86 frames. ], batch size: 104, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:11:54,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2024-09-18 05:12:44,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=375520.0, ans=10.0 2024-09-18 05:13:06,608 INFO [train.py:1198] (1/2) Epoch 21, batch 3400, loss[loss=0.2261, ctc_loss=0.1228, cr_loss=0.3826, attn_decoder_loss=0.2291, over 29360.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1352, cr_loss=0.3785, attn_decoder_loss=0.2495, over 5769023.20 frames. ], batch size: 67, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:13:29,671 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.485e+01 9.062e+01 9.587e+01 1.561e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 05:13:36,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=375640.0, ans=0.0 2024-09-18 05:13:42,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=375680.0, ans=0.0 2024-09-18 05:14:01,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.95 vs. limit=15.0 2024-09-18 05:14:02,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=375720.0, ans=0.125 2024-09-18 05:14:03,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=375720.0, ans=0.125 2024-09-18 05:14:05,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=375720.0, ans=0.1 2024-09-18 05:14:24,502 INFO [train.py:1198] (1/2) Epoch 21, batch 3450, loss[loss=0.262, ctc_loss=0.1499, cr_loss=0.3882, attn_decoder_loss=0.2658, over 28445.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1358, cr_loss=0.38, attn_decoder_loss=0.2502, over 5778211.85 frames. ], batch size: 112, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:14:51,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2024-09-18 05:15:40,552 INFO [train.py:1198] (1/2) Epoch 21, batch 3500, loss[loss=0.2316, ctc_loss=0.1305, cr_loss=0.4024, attn_decoder_loss=0.2339, over 29303.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1359, cr_loss=0.3798, attn_decoder_loss=0.2498, over 5779782.10 frames. ], batch size: 71, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:15:40,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=376000.0, ans=0.0 2024-09-18 05:15:41,015 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:16:02,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=376040.0, ans=0.0 2024-09-18 05:16:03,719 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.729e+01 9.303e+01 9.808e+01 4.681e+02, threshold=1.861e+02, percent-clipped=2.0 2024-09-18 05:16:11,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=376080.0, ans=0.125 2024-09-18 05:16:16,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=376080.0, ans=0.125 2024-09-18 05:16:38,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=376120.0, ans=0.125 2024-09-18 05:16:57,298 INFO [train.py:1198] (1/2) Epoch 21, batch 3550, loss[loss=0.2571, ctc_loss=0.137, cr_loss=0.378, attn_decoder_loss=0.2621, over 29713.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1353, cr_loss=0.3792, attn_decoder_loss=0.2497, over 5785999.16 frames. ], batch size: 89, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:17:05,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=376200.0, ans=0.125 2024-09-18 05:17:33,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=376280.0, ans=0.0 2024-09-18 05:17:59,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=376360.0, ans=0.125 2024-09-18 05:18:13,900 INFO [train.py:1198] (1/2) Epoch 21, batch 3600, loss[loss=0.2406, ctc_loss=0.132, cr_loss=0.3715, attn_decoder_loss=0.2444, over 29502.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1353, cr_loss=0.3792, attn_decoder_loss=0.2497, over 5794715.92 frames. ], batch size: 77, lr: 5.27e-03, grad_scale: 16.0 2024-09-18 05:18:17,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=376400.0, ans=0.125 2024-09-18 05:18:23,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.07 vs. limit=10.0 2024-09-18 05:18:34,851 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.337e+01 8.787e+01 9.364e+01 1.302e+02, threshold=1.757e+02, percent-clipped=0.0 2024-09-18 05:19:18,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=376560.0, ans=0.2 2024-09-18 05:19:25,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=376560.0, ans=0.0 2024-09-18 05:19:28,189 INFO [train.py:1198] (1/2) Epoch 21, batch 3650, loss[loss=0.2661, ctc_loss=0.151, cr_loss=0.3925, attn_decoder_loss=0.2702, over 29477.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1349, cr_loss=0.3782, attn_decoder_loss=0.2494, over 5796536.31 frames. ], batch size: 90, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:19:32,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=376600.0, ans=0.125 2024-09-18 05:19:57,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-09-18 05:20:08,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376680.0, ans=0.1 2024-09-18 05:20:17,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=376720.0, ans=0.5 2024-09-18 05:20:20,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=376720.0, ans=0.125 2024-09-18 05:20:23,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=376720.0, ans=0.0 2024-09-18 05:20:43,490 INFO [train.py:1198] (1/2) Epoch 21, batch 3700, loss[loss=0.2518, ctc_loss=0.1338, cr_loss=0.3753, attn_decoder_loss=0.2565, over 29699.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1348, cr_loss=0.3778, attn_decoder_loss=0.2495, over 5805551.79 frames. ], batch size: 84, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:20:54,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=376800.0, ans=0.04949747468305833 2024-09-18 05:21:04,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=376840.0, ans=0.09899494936611666 2024-09-18 05:21:05,912 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.517e+01 9.022e+01 9.849e+01 1.949e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-18 05:21:12,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=376880.0, ans=0.2 2024-09-18 05:21:25,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=376880.0, ans=0.0 2024-09-18 05:21:29,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=376920.0, ans=0.125 2024-09-18 05:21:44,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=376960.0, ans=0.2 2024-09-18 05:21:46,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=376960.0, ans=0.125 2024-09-18 05:21:49,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=376960.0, ans=0.025 2024-09-18 05:21:57,756 INFO [train.py:1198] (1/2) Epoch 21, batch 3750, loss[loss=0.2317, ctc_loss=0.1338, cr_loss=0.3786, attn_decoder_loss=0.2342, over 29334.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1347, cr_loss=0.3777, attn_decoder_loss=0.2491, over 5808225.74 frames. ], batch size: 67, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:22:04,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=377000.0, ans=0.1 2024-09-18 05:22:07,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=377000.0, ans=15.0 2024-09-18 05:22:34,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=377080.0, ans=0.0 2024-09-18 05:22:38,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=377080.0, ans=0.1 2024-09-18 05:22:43,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=377120.0, ans=6.0 2024-09-18 05:22:48,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377120.0, ans=0.1 2024-09-18 05:22:52,179 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:22:59,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=377160.0, ans=0.125 2024-09-18 05:23:14,085 INFO [train.py:1198] (1/2) Epoch 21, batch 3800, loss[loss=0.2556, ctc_loss=0.1406, cr_loss=0.3897, attn_decoder_loss=0.2597, over 29615.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1341, cr_loss=0.3764, attn_decoder_loss=0.2487, over 5798592.88 frames. ], batch size: 86, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:23:36,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.558e+01 9.240e+01 9.922e+01 2.766e+02, threshold=1.848e+02, percent-clipped=2.0 2024-09-18 05:23:38,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=377240.0, ans=0.2 2024-09-18 05:24:30,308 INFO [train.py:1198] (1/2) Epoch 21, batch 3850, loss[loss=0.2617, ctc_loss=0.1492, cr_loss=0.4039, attn_decoder_loss=0.2652, over 29242.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1333, cr_loss=0.3754, attn_decoder_loss=0.2482, over 5812698.78 frames. ], batch size: 100, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:24:40,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=15.0 2024-09-18 05:25:32,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.10 vs. limit=22.5 2024-09-18 05:25:39,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.17 vs. limit=15.0 2024-09-18 05:25:45,167 INFO [train.py:1198] (1/2) Epoch 21, batch 3900, loss[loss=0.259, ctc_loss=0.1375, cr_loss=0.3946, attn_decoder_loss=0.2637, over 29641.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1337, cr_loss=0.3763, attn_decoder_loss=0.2486, over 5817052.13 frames. ], batch size: 86, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:25:57,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=377600.0, ans=0.125 2024-09-18 05:26:00,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=377640.0, ans=0.0 2024-09-18 05:26:02,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-09-18 05:26:07,257 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 8.671e+01 9.111e+01 9.603e+01 1.300e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-18 05:26:19,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=377680.0, ans=0.1 2024-09-18 05:26:43,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=377760.0, ans=0.0 2024-09-18 05:26:48,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=377760.0, ans=0.0 2024-09-18 05:26:59,564 INFO [train.py:1198] (1/2) Epoch 21, batch 3950, loss[loss=0.2579, ctc_loss=0.1448, cr_loss=0.4011, attn_decoder_loss=0.2616, over 29500.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1337, cr_loss=0.3767, attn_decoder_loss=0.2487, over 5836167.40 frames. ], batch size: 97, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:27:15,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=377840.0, ans=0.04949747468305833 2024-09-18 05:27:35,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=377880.0, ans=0.0 2024-09-18 05:27:47,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=377920.0, ans=0.0 2024-09-18 05:28:07,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=377960.0, ans=0.125 2024-09-18 05:28:09,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=377960.0, ans=0.1 2024-09-18 05:28:14,620 INFO [train.py:1198] (1/2) Epoch 21, batch 4000, loss[loss=0.2382, ctc_loss=0.131, cr_loss=0.3774, attn_decoder_loss=0.2417, over 29501.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1344, cr_loss=0.378, attn_decoder_loss=0.2492, over 5811938.34 frames. ], batch size: 74, lr: 5.26e-03, grad_scale: 16.0 2024-09-18 05:28:15,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.29 vs. limit=22.5 2024-09-18 05:28:35,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=378040.0, ans=0.125 2024-09-18 05:28:38,253 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.637e+01 9.105e+01 9.736e+01 3.809e+02, threshold=1.821e+02, percent-clipped=2.0 2024-09-18 05:28:38,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=378040.0, ans=0.125 2024-09-18 05:28:41,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=378040.0, ans=0.025 2024-09-18 05:28:43,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.80 vs. limit=6.0 2024-09-18 05:28:47,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=378080.0, ans=0.025 2024-09-18 05:28:50,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=378080.0, ans=0.125 2024-09-18 05:29:00,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=378120.0, ans=0.125 2024-09-18 05:29:06,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378120.0, ans=0.1 2024-09-18 05:29:09,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=378120.0, ans=0.125 2024-09-18 05:29:17,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=378160.0, ans=0.125 2024-09-18 05:29:22,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.94 vs. limit=15.0 2024-09-18 05:29:30,129 INFO [train.py:1198] (1/2) Epoch 21, batch 4050, loss[loss=0.2785, ctc_loss=0.1873, cr_loss=0.4407, attn_decoder_loss=0.2789, over 20379.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.135, cr_loss=0.379, attn_decoder_loss=0.2494, over 5796041.75 frames. ], batch size: 210, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:29:39,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=378200.0, ans=0.0 2024-09-18 05:29:46,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=378240.0, ans=0.1 2024-09-18 05:29:49,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=378240.0, ans=0.0 2024-09-18 05:30:24,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=378320.0, ans=0.0 2024-09-18 05:30:27,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378360.0, ans=0.1 2024-09-18 05:30:28,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=378360.0, ans=0.025 2024-09-18 05:30:39,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-09-18 05:30:44,002 INFO [train.py:1198] (1/2) Epoch 21, batch 4100, loss[loss=0.2666, ctc_loss=0.156, cr_loss=0.4157, attn_decoder_loss=0.2697, over 29490.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1346, cr_loss=0.3784, attn_decoder_loss=0.2493, over 5791785.49 frames. ], batch size: 90, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:31:07,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.642e+01 9.337e+01 1.033e+02 5.468e+02, threshold=1.867e+02, percent-clipped=3.0 2024-09-18 05:31:08,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2024-09-18 05:31:19,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=378480.0, ans=0.025 2024-09-18 05:31:22,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=378480.0, ans=0.125 2024-09-18 05:31:27,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=378520.0, ans=10.0 2024-09-18 05:31:55,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.74 vs. limit=15.0 2024-09-18 05:31:57,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=378600.0, ans=0.125 2024-09-18 05:31:58,984 INFO [train.py:1198] (1/2) Epoch 21, batch 4150, loss[loss=0.2277, ctc_loss=0.1145, cr_loss=0.3318, attn_decoder_loss=0.2329, over 29478.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1343, cr_loss=0.3779, attn_decoder_loss=0.2491, over 5797866.69 frames. ], batch size: 77, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:32:05,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=378600.0, ans=0.125 2024-09-18 05:32:11,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=12.0 2024-09-18 05:32:24,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=378640.0, ans=0.1 2024-09-18 05:32:44,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=378720.0, ans=0.0 2024-09-18 05:32:52,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=378720.0, ans=0.0 2024-09-18 05:33:12,767 INFO [train.py:1198] (1/2) Epoch 21, batch 4200, loss[loss=0.2561, ctc_loss=0.1412, cr_loss=0.4022, attn_decoder_loss=0.2599, over 29498.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1343, cr_loss=0.3781, attn_decoder_loss=0.2494, over 5799717.69 frames. ], batch size: 90, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:33:20,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=378800.0, ans=0.2 2024-09-18 05:33:37,367 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.402e+01 9.063e+01 9.513e+01 1.420e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-18 05:33:49,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378880.0, ans=0.1 2024-09-18 05:34:17,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=378960.0, ans=0.2 2024-09-18 05:34:24,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=378960.0, ans=0.025 2024-09-18 05:34:25,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.34 vs. limit=15.0 2024-09-18 05:34:26,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379000.0, ans=0.1 2024-09-18 05:34:27,305 INFO [train.py:1198] (1/2) Epoch 21, batch 4250, loss[loss=0.2266, ctc_loss=0.1214, cr_loss=0.3467, attn_decoder_loss=0.2306, over 29525.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.134, cr_loss=0.3771, attn_decoder_loss=0.2493, over 5806606.33 frames. ], batch size: 74, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:34:39,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-09-18 05:34:45,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=379040.0, ans=0.125 2024-09-18 05:34:48,199 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:34:56,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2024-09-18 05:35:10,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=379120.0, ans=0.125 2024-09-18 05:35:16,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379120.0, ans=0.1 2024-09-18 05:35:34,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.09 vs. limit=15.0 2024-09-18 05:35:42,512 INFO [train.py:1198] (1/2) Epoch 21, batch 4300, loss[loss=0.252, ctc_loss=0.1308, cr_loss=0.3732, attn_decoder_loss=0.2572, over 29520.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.134, cr_loss=0.3768, attn_decoder_loss=0.2496, over 5795400.99 frames. ], batch size: 87, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:35:44,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=379200.0, ans=0.0 2024-09-18 05:36:06,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.631e+01 9.482e+01 1.010e+02 4.284e+02, threshold=1.896e+02, percent-clipped=4.0 2024-09-18 05:36:27,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=379320.0, ans=0.125 2024-09-18 05:36:36,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=379320.0, ans=0.125 2024-09-18 05:36:45,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-09-18 05:36:46,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=379360.0, ans=0.125 2024-09-18 05:36:57,602 INFO [train.py:1198] (1/2) Epoch 21, batch 4350, loss[loss=0.2699, ctc_loss=0.1566, cr_loss=0.4203, attn_decoder_loss=0.2731, over 29532.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1371, cr_loss=0.3829, attn_decoder_loss=0.2531, over 5796844.75 frames. ], batch size: 97, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:37:27,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=379480.0, ans=0.125 2024-09-18 05:37:42,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=379520.0, ans=0.125 2024-09-18 05:37:46,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=379520.0, ans=0.025 2024-09-18 05:38:10,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=379600.0, ans=0.125 2024-09-18 05:38:11,715 INFO [train.py:1198] (1/2) Epoch 21, batch 4400, loss[loss=0.2558, ctc_loss=0.1521, cr_loss=0.4056, attn_decoder_loss=0.2583, over 27010.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1385, cr_loss=0.3855, attn_decoder_loss=0.2553, over 5767148.35 frames. ], batch size: 124, lr: 5.24e-03, grad_scale: 16.0 2024-09-18 05:38:12,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=379600.0, ans=0.0 2024-09-18 05:38:23,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=379600.0, ans=0.0 2024-09-18 05:38:25,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=379640.0, ans=0.2 2024-09-18 05:38:26,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=379640.0, ans=0.0 2024-09-18 05:38:34,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.049e+01 8.987e+01 9.326e+01 1.008e+02 3.021e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-18 05:38:37,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.23 vs. limit=22.5 2024-09-18 05:38:43,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379680.0, ans=0.1 2024-09-18 05:38:49,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=379680.0, ans=0.0 2024-09-18 05:39:08,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=379760.0, ans=0.0 2024-09-18 05:39:15,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=379760.0, ans=0.125 2024-09-18 05:39:25,852 INFO [train.py:1198] (1/2) Epoch 21, batch 4450, loss[loss=0.2643, ctc_loss=0.1582, cr_loss=0.3813, attn_decoder_loss=0.2676, over 20038.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1426, cr_loss=0.3894, attn_decoder_loss=0.2578, over 5572942.05 frames. ], batch size: 209, lr: 5.24e-03, grad_scale: 8.0 2024-09-18 05:39:48,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=379840.0, ans=0.2 2024-09-18 05:39:56,707 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:40:01,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-09-18 05:40:30,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=379960.0, ans=0.0 2024-09-18 05:40:33,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=379960.0, ans=0.0 2024-09-18 05:40:41,893 INFO [train.py:1198] (1/2) Epoch 21, batch 4500, loss[loss=0.2682, ctc_loss=0.1653, cr_loss=0.4087, attn_decoder_loss=0.2705, over 20179.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1473, cr_loss=0.3921, attn_decoder_loss=0.2602, over 5233102.91 frames. ], batch size: 209, lr: 5.24e-03, grad_scale: 8.0 2024-09-18 05:40:56,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=380040.0, ans=0.125 2024-09-18 05:40:57,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-09-18 05:41:04,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=380040.0, ans=0.0 2024-09-18 05:41:07,264 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.842e+01 1.023e+02 1.116e+02 1.184e+02 1.723e+02, threshold=2.233e+02, percent-clipped=0.0 2024-09-18 05:42:06,204 INFO [train.py:1198] (1/2) Epoch 22, batch 0, loss[loss=0.2341, ctc_loss=0.1262, cr_loss=0.3582, attn_decoder_loss=0.2382, over 29582.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1262, cr_loss=0.3582, attn_decoder_loss=0.2382, over 29582.00 frames. ], batch size: 73, lr: 5.12e-03, grad_scale: 16.0 2024-09-18 05:42:06,205 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 05:42:13,737 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0576, 3.5965, 3.9289, 3.5301], device='cuda:1') 2024-09-18 05:42:24,647 INFO [train.py:1230] (1/2) Epoch 22, validation: loss=0.212, ctc_loss=0.0382, cr_loss=5.087e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 05:42:24,647 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 05:42:26,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=380100.0, ans=0.0 2024-09-18 05:42:46,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=380140.0, ans=0.05 2024-09-18 05:42:57,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=380180.0, ans=0.0 2024-09-18 05:43:09,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=380180.0, ans=0.125 2024-09-18 05:43:19,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=380220.0, ans=0.125 2024-09-18 05:43:23,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=380220.0, ans=0.025 2024-09-18 05:43:42,237 INFO [train.py:1198] (1/2) Epoch 22, batch 50, loss[loss=0.219, ctc_loss=0.1199, cr_loss=0.3504, attn_decoder_loss=0.2222, over 29429.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1362, cr_loss=0.3786, attn_decoder_loss=0.2501, over 1269808.36 frames. ], batch size: 70, lr: 5.12e-03, grad_scale: 8.0 2024-09-18 05:43:48,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=380300.0, ans=0.125 2024-09-18 05:44:14,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=380380.0, ans=0.07 2024-09-18 05:44:16,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=380380.0, ans=0.07 2024-09-18 05:44:17,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380380.0, ans=0.1 2024-09-18 05:44:25,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.25 vs. limit=15.0 2024-09-18 05:44:29,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=380420.0, ans=0.125 2024-09-18 05:44:44,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380460.0, ans=0.1 2024-09-18 05:44:44,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=380460.0, ans=0.125 2024-09-18 05:44:47,550 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.759e+01 9.355e+01 1.030e+02 2.527e+02, threshold=1.871e+02, percent-clipped=1.0 2024-09-18 05:44:50,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=380460.0, ans=15.0 2024-09-18 05:44:57,986 INFO [train.py:1198] (1/2) Epoch 22, batch 100, loss[loss=0.231, ctc_loss=0.1194, cr_loss=0.3596, attn_decoder_loss=0.2354, over 29553.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1373, cr_loss=0.3811, attn_decoder_loss=0.2521, over 2253464.09 frames. ], batch size: 76, lr: 5.12e-03, grad_scale: 8.0 2024-09-18 05:45:29,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=380580.0, ans=0.125 2024-09-18 05:45:40,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=380580.0, ans=0.0 2024-09-18 05:45:43,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=380620.0, ans=0.2 2024-09-18 05:45:53,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.77 vs. limit=15.0 2024-09-18 05:46:06,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380660.0, ans=0.1 2024-09-18 05:46:16,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=380700.0, ans=0.125 2024-09-18 05:46:17,497 INFO [train.py:1198] (1/2) Epoch 22, batch 150, loss[loss=0.2241, ctc_loss=0.1176, cr_loss=0.3439, attn_decoder_loss=0.2283, over 29426.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1351, cr_loss=0.3775, attn_decoder_loss=0.2502, over 3048147.53 frames. ], batch size: 70, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:46:26,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=380700.0, ans=0.125 2024-09-18 05:46:36,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=380740.0, ans=0.04949747468305833 2024-09-18 05:46:42,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=12.0 2024-09-18 05:46:47,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=380780.0, ans=0.0 2024-09-18 05:46:53,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=380780.0, ans=0.125 2024-09-18 05:47:03,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=380820.0, ans=0.0 2024-09-18 05:47:22,575 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.751e+01 8.602e+01 9.163e+01 9.915e+01 1.341e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-18 05:47:31,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=380900.0, ans=0.0 2024-09-18 05:47:33,201 INFO [train.py:1198] (1/2) Epoch 22, batch 200, loss[loss=0.2632, ctc_loss=0.1565, cr_loss=0.4289, attn_decoder_loss=0.2655, over 27468.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1347, cr_loss=0.3773, attn_decoder_loss=0.2493, over 3660283.60 frames. ], batch size: 124, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:47:34,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.84 vs. limit=10.0 2024-09-18 05:47:41,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=380900.0, ans=0.0 2024-09-18 05:47:48,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=380940.0, ans=0.2 2024-09-18 05:47:55,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.91 vs. limit=15.0 2024-09-18 05:48:01,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.01 vs. limit=15.0 2024-09-18 05:48:08,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=380980.0, ans=0.125 2024-09-18 05:48:12,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380980.0, ans=0.1 2024-09-18 05:48:14,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=380980.0, ans=0.0 2024-09-18 05:48:23,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=381020.0, ans=0.015 2024-09-18 05:48:27,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=381020.0, ans=0.0 2024-09-18 05:48:27,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=381020.0, ans=0.125 2024-09-18 05:48:35,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=381060.0, ans=0.125 2024-09-18 05:48:41,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=381060.0, ans=0.035 2024-09-18 05:48:48,673 INFO [train.py:1198] (1/2) Epoch 22, batch 250, loss[loss=0.2627, ctc_loss=0.144, cr_loss=0.3888, attn_decoder_loss=0.2673, over 29177.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1339, cr_loss=0.3776, attn_decoder_loss=0.2492, over 4143018.46 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:48:56,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=381100.0, ans=0.125 2024-09-18 05:49:02,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=381140.0, ans=0.125 2024-09-18 05:49:10,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=15.0 2024-09-18 05:49:17,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=381180.0, ans=0.2 2024-09-18 05:49:19,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=381180.0, ans=0.0 2024-09-18 05:49:32,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=381220.0, ans=0.125 2024-09-18 05:49:50,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=381260.0, ans=0.04949747468305833 2024-09-18 05:49:51,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2024-09-18 05:49:53,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=381260.0, ans=0.0 2024-09-18 05:49:56,365 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 8.533e+01 8.896e+01 9.505e+01 2.232e+02, threshold=1.779e+02, percent-clipped=1.0 2024-09-18 05:50:06,901 INFO [train.py:1198] (1/2) Epoch 22, batch 300, loss[loss=0.2621, ctc_loss=0.1486, cr_loss=0.4126, attn_decoder_loss=0.2656, over 29535.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.134, cr_loss=0.378, attn_decoder_loss=0.2489, over 4510180.26 frames. ], batch size: 92, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:50:15,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=381300.0, ans=0.2 2024-09-18 05:50:16,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2024-09-18 05:50:53,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.19 vs. limit=15.0 2024-09-18 05:50:56,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=381420.0, ans=0.125 2024-09-18 05:51:24,709 INFO [train.py:1198] (1/2) Epoch 22, batch 350, loss[loss=0.2209, ctc_loss=0.1085, cr_loss=0.3431, attn_decoder_loss=0.2258, over 29329.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1339, cr_loss=0.3776, attn_decoder_loss=0.2491, over 4794950.00 frames. ], batch size: 71, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:51:26,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=381500.0, ans=0.125 2024-09-18 05:51:50,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381540.0, ans=0.1 2024-09-18 05:51:56,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=381580.0, ans=0.125 2024-09-18 05:52:29,883 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.396e+01 8.841e+01 9.371e+01 8.849e+02, threshold=1.768e+02, percent-clipped=1.0 2024-09-18 05:52:40,320 INFO [train.py:1198] (1/2) Epoch 22, batch 400, loss[loss=0.2437, ctc_loss=0.1279, cr_loss=0.3447, attn_decoder_loss=0.2489, over 29716.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1335, cr_loss=0.3765, attn_decoder_loss=0.2488, over 5025049.85 frames. ], batch size: 82, lr: 5.11e-03, grad_scale: 16.0 2024-09-18 05:53:09,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=381780.0, ans=0.125 2024-09-18 05:53:14,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=381780.0, ans=0.07 2024-09-18 05:53:14,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=381780.0, ans=0.0 2024-09-18 05:53:17,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=381780.0, ans=0.1 2024-09-18 05:53:19,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-18 05:53:23,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=381780.0, ans=0.125 2024-09-18 05:53:28,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=381820.0, ans=0.0 2024-09-18 05:53:59,078 INFO [train.py:1198] (1/2) Epoch 22, batch 450, loss[loss=0.2509, ctc_loss=0.1374, cr_loss=0.3882, attn_decoder_loss=0.2549, over 29699.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1332, cr_loss=0.3755, attn_decoder_loss=0.2489, over 5186773.74 frames. ], batch size: 83, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:54:28,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.32 vs. limit=10.0 2024-09-18 05:54:30,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381980.0, ans=0.125 2024-09-18 05:54:33,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=381980.0, ans=10.0 2024-09-18 05:55:08,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.472e+01 8.899e+01 9.397e+01 1.729e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-18 05:55:17,496 INFO [train.py:1198] (1/2) Epoch 22, batch 500, loss[loss=0.2545, ctc_loss=0.1399, cr_loss=0.4054, attn_decoder_loss=0.2582, over 29426.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1328, cr_loss=0.3749, attn_decoder_loss=0.2483, over 5330349.01 frames. ], batch size: 94, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 05:55:33,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=382140.0, ans=0.0 2024-09-18 05:55:42,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=382140.0, ans=0.025 2024-09-18 05:55:54,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=382180.0, ans=0.1 2024-09-18 05:55:55,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=382180.0, ans=0.125 2024-09-18 05:56:18,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=382260.0, ans=0.0 2024-09-18 05:56:33,384 INFO [train.py:1198] (1/2) Epoch 22, batch 550, loss[loss=0.2598, ctc_loss=0.1432, cr_loss=0.399, attn_decoder_loss=0.2639, over 28810.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1327, cr_loss=0.3746, attn_decoder_loss=0.2482, over 5421082.22 frames. ], batch size: 104, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 05:56:38,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=382300.0, ans=0.125 2024-09-18 05:57:01,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=382340.0, ans=0.2 2024-09-18 05:57:02,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=382380.0, ans=0.125 2024-09-18 05:57:15,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382380.0, ans=0.1 2024-09-18 05:57:28,368 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2024-09-18 05:57:40,963 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.705e+01 9.082e+01 9.823e+01 4.645e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-18 05:57:47,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=382460.0, ans=0.125 2024-09-18 05:57:52,557 INFO [train.py:1198] (1/2) Epoch 22, batch 600, loss[loss=0.2579, ctc_loss=0.1473, cr_loss=0.4079, attn_decoder_loss=0.2611, over 29248.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1329, cr_loss=0.3757, attn_decoder_loss=0.2486, over 5508382.85 frames. ], batch size: 100, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 05:58:30,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=382580.0, ans=0.0 2024-09-18 05:58:44,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=382620.0, ans=0.0 2024-09-18 05:58:50,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.49 vs. limit=10.0 2024-09-18 05:58:53,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=382660.0, ans=0.05 2024-09-18 05:59:08,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=382700.0, ans=0.125 2024-09-18 05:59:09,507 INFO [train.py:1198] (1/2) Epoch 22, batch 650, loss[loss=0.2427, ctc_loss=0.1285, cr_loss=0.3613, attn_decoder_loss=0.2473, over 29755.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1321, cr_loss=0.374, attn_decoder_loss=0.2479, over 5585368.09 frames. ], batch size: 81, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 05:59:09,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=382700.0, ans=0.0 2024-09-18 05:59:17,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=12.0 2024-09-18 05:59:20,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=382700.0, ans=0.0 2024-09-18 05:59:31,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=382740.0, ans=0.0 2024-09-18 06:00:01,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=382820.0, ans=0.125 2024-09-18 06:00:04,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382820.0, ans=0.1 2024-09-18 06:00:06,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=382820.0, ans=0.125 2024-09-18 06:00:14,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-09-18 06:00:15,692 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 8.434e+01 8.895e+01 9.353e+01 1.142e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-18 06:00:24,703 INFO [train.py:1198] (1/2) Epoch 22, batch 700, loss[loss=0.2386, ctc_loss=0.1261, cr_loss=0.3698, attn_decoder_loss=0.2429, over 29524.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1326, cr_loss=0.3752, attn_decoder_loss=0.2486, over 5636653.22 frames. ], batch size: 76, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 06:00:27,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.14 vs. limit=15.0 2024-09-18 06:00:31,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.30 vs. limit=15.0 2024-09-18 06:00:37,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.72 vs. limit=10.0 2024-09-18 06:00:38,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=382940.0, ans=0.125 2024-09-18 06:00:52,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.41 vs. limit=15.0 2024-09-18 06:00:52,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.68 vs. limit=15.0 2024-09-18 06:00:55,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=382980.0, ans=0.05 2024-09-18 06:01:02,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-18 06:01:02,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=382980.0, ans=0.0 2024-09-18 06:01:21,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.70 vs. limit=22.5 2024-09-18 06:01:24,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=383060.0, ans=0.1 2024-09-18 06:01:40,822 INFO [train.py:1198] (1/2) Epoch 22, batch 750, loss[loss=0.2486, ctc_loss=0.1336, cr_loss=0.3816, attn_decoder_loss=0.2529, over 29700.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1324, cr_loss=0.3754, attn_decoder_loss=0.2483, over 5675627.05 frames. ], batch size: 82, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 06:01:49,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=383100.0, ans=0.0 2024-09-18 06:02:04,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2024-09-18 06:02:06,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=383140.0, ans=0.2 2024-09-18 06:02:20,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=383180.0, ans=0.0 2024-09-18 06:02:25,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=383180.0, ans=0.125 2024-09-18 06:02:29,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2024-09-18 06:02:43,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=383220.0, ans=0.125 2024-09-18 06:02:52,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.347e+01 8.642e+01 9.168e+01 9.743e+01 1.816e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-18 06:02:52,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=383260.0, ans=0.0 2024-09-18 06:02:59,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=383300.0, ans=0.0 2024-09-18 06:03:01,135 INFO [train.py:1198] (1/2) Epoch 22, batch 800, loss[loss=0.218, ctc_loss=0.1048, cr_loss=0.3133, attn_decoder_loss=0.2236, over 29598.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1329, cr_loss=0.376, attn_decoder_loss=0.2484, over 5705249.37 frames. ], batch size: 73, lr: 5.10e-03, grad_scale: 16.0 2024-09-18 06:03:22,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=383340.0, ans=0.0 2024-09-18 06:03:24,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-09-18 06:03:38,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=22.5 2024-09-18 06:03:51,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=383420.0, ans=0.0 2024-09-18 06:04:16,265 INFO [train.py:1198] (1/2) Epoch 22, batch 850, loss[loss=0.2586, ctc_loss=0.1367, cr_loss=0.3847, attn_decoder_loss=0.2636, over 29694.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1326, cr_loss=0.3755, attn_decoder_loss=0.2481, over 5732620.67 frames. ], batch size: 89, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 06:04:16,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=383500.0, ans=0.09899494936611666 2024-09-18 06:04:37,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=383540.0, ans=0.125 2024-09-18 06:04:40,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=383540.0, ans=0.125 2024-09-18 06:05:20,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=383660.0, ans=0.025 2024-09-18 06:05:24,337 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 8.714e+01 9.138e+01 9.767e+01 2.023e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-18 06:05:32,026 INFO [train.py:1198] (1/2) Epoch 22, batch 900, loss[loss=0.2295, ctc_loss=0.125, cr_loss=0.3625, attn_decoder_loss=0.233, over 29606.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1328, cr_loss=0.3751, attn_decoder_loss=0.2482, over 5739075.17 frames. ], batch size: 73, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:05:34,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2024-09-18 06:06:00,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=383740.0, ans=0.125 2024-09-18 06:06:04,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2024-09-18 06:06:28,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=383820.0, ans=0.025 2024-09-18 06:06:52,156 INFO [train.py:1198] (1/2) Epoch 22, batch 950, loss[loss=0.2269, ctc_loss=0.1181, cr_loss=0.35, attn_decoder_loss=0.2313, over 29504.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1329, cr_loss=0.375, attn_decoder_loss=0.2485, over 5741211.07 frames. ], batch size: 74, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:07:12,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=383940.0, ans=0.07 2024-09-18 06:07:58,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=384060.0, ans=0.125 2024-09-18 06:07:59,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=384060.0, ans=0.025 2024-09-18 06:08:06,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.991e+01 9.492e+01 1.022e+02 3.198e+02, threshold=1.898e+02, percent-clipped=2.0 2024-09-18 06:08:06,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=384060.0, ans=0.025 2024-09-18 06:08:11,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=384060.0, ans=0.125 2024-09-18 06:08:13,998 INFO [train.py:1198] (1/2) Epoch 22, batch 1000, loss[loss=0.2262, ctc_loss=0.1183, cr_loss=0.3519, attn_decoder_loss=0.2303, over 29524.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1337, cr_loss=0.3764, attn_decoder_loss=0.2491, over 5735806.93 frames. ], batch size: 77, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:08:17,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=384100.0, ans=0.0 2024-09-18 06:08:35,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=384140.0, ans=0.2 2024-09-18 06:08:58,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=384220.0, ans=0.0 2024-09-18 06:09:29,786 INFO [train.py:1198] (1/2) Epoch 22, batch 1050, loss[loss=0.2588, ctc_loss=0.1432, cr_loss=0.3881, attn_decoder_loss=0.2631, over 29700.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1331, cr_loss=0.3753, attn_decoder_loss=0.2484, over 5742554.22 frames. ], batch size: 85, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:09:44,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=384300.0, ans=0.125 2024-09-18 06:09:44,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=384300.0, ans=0.2 2024-09-18 06:09:56,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=384340.0, ans=0.125 2024-09-18 06:10:19,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=384420.0, ans=0.2 2024-09-18 06:10:39,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.94 vs. limit=10.0 2024-09-18 06:10:42,978 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.419e+01 8.971e+01 9.530e+01 1.277e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 06:10:50,698 INFO [train.py:1198] (1/2) Epoch 22, batch 1100, loss[loss=0.2348, ctc_loss=0.1144, cr_loss=0.3477, attn_decoder_loss=0.2405, over 29435.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1327, cr_loss=0.3746, attn_decoder_loss=0.2483, over 5755583.13 frames. ], batch size: 78, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:11:09,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=384540.0, ans=10.0 2024-09-18 06:11:10,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=384540.0, ans=0.125 2024-09-18 06:11:54,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=384660.0, ans=0.125 2024-09-18 06:12:02,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.14 vs. limit=15.0 2024-09-18 06:12:06,669 INFO [train.py:1198] (1/2) Epoch 22, batch 1150, loss[loss=0.2481, ctc_loss=0.1432, cr_loss=0.3945, attn_decoder_loss=0.2509, over 29424.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1329, cr_loss=0.375, attn_decoder_loss=0.2483, over 5753584.50 frames. ], batch size: 78, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:12:11,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=384700.0, ans=0.2 2024-09-18 06:12:42,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=384780.0, ans=0.025 2024-09-18 06:12:43,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=384780.0, ans=0.0 2024-09-18 06:12:47,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=384780.0, ans=10.0 2024-09-18 06:12:58,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2024-09-18 06:13:01,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=384820.0, ans=0.125 2024-09-18 06:13:11,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=384860.0, ans=0.025 2024-09-18 06:13:15,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.605e+01 9.127e+01 9.575e+01 1.863e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-18 06:13:22,816 INFO [train.py:1198] (1/2) Epoch 22, batch 1200, loss[loss=0.2719, ctc_loss=0.1578, cr_loss=0.4245, attn_decoder_loss=0.2752, over 29684.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1334, cr_loss=0.3756, attn_decoder_loss=0.2493, over 5745990.05 frames. ], batch size: 85, lr: 5.09e-03, grad_scale: 16.0 2024-09-18 06:13:37,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=384900.0, ans=0.0 2024-09-18 06:14:42,674 INFO [train.py:1198] (1/2) Epoch 22, batch 1250, loss[loss=0.2569, ctc_loss=0.1397, cr_loss=0.3967, attn_decoder_loss=0.2611, over 29547.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1333, cr_loss=0.3764, attn_decoder_loss=0.2494, over 5772847.97 frames. ], batch size: 92, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:14:47,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=385100.0, ans=0.2 2024-09-18 06:14:50,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=385100.0, ans=0.025 2024-09-18 06:14:52,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=385100.0, ans=0.125 2024-09-18 06:15:10,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=385140.0, ans=0.04949747468305833 2024-09-18 06:15:15,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=385180.0, ans=0.1 2024-09-18 06:15:17,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2024-09-18 06:15:39,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=385220.0, ans=0.0 2024-09-18 06:15:43,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=385260.0, ans=0.1 2024-09-18 06:15:50,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=385260.0, ans=0.125 2024-09-18 06:15:52,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.234e+01 8.912e+01 9.418e+01 2.045e+02, threshold=1.782e+02, percent-clipped=1.0 2024-09-18 06:15:55,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=385260.0, ans=0.025 2024-09-18 06:15:58,560 INFO [train.py:1198] (1/2) Epoch 22, batch 1300, loss[loss=0.2644, ctc_loss=0.1561, cr_loss=0.4036, attn_decoder_loss=0.2675, over 28299.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1326, cr_loss=0.3748, attn_decoder_loss=0.2486, over 5778958.67 frames. ], batch size: 111, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:16:09,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2024-09-18 06:16:23,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=385340.0, ans=0.125 2024-09-18 06:16:44,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385420.0, ans=0.1 2024-09-18 06:16:51,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385420.0, ans=0.1 2024-09-18 06:17:08,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=385460.0, ans=0.0 2024-09-18 06:17:14,123 INFO [train.py:1198] (1/2) Epoch 22, batch 1350, loss[loss=0.2479, ctc_loss=0.1301, cr_loss=0.3752, attn_decoder_loss=0.2527, over 29763.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1327, cr_loss=0.3749, attn_decoder_loss=0.2483, over 5795678.58 frames. ], batch size: 81, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:17:43,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=385540.0, ans=0.2 2024-09-18 06:17:46,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=385580.0, ans=0.125 2024-09-18 06:17:46,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=385580.0, ans=0.2 2024-09-18 06:18:15,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=385620.0, ans=0.2 2024-09-18 06:18:27,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.459e+01 9.043e+01 9.728e+01 1.319e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-18 06:18:33,609 INFO [train.py:1198] (1/2) Epoch 22, batch 1400, loss[loss=0.2153, ctc_loss=0.1015, cr_loss=0.3039, attn_decoder_loss=0.2212, over 29582.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1324, cr_loss=0.3744, attn_decoder_loss=0.2481, over 5806506.85 frames. ], batch size: 69, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:18:35,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=385700.0, ans=0.0 2024-09-18 06:18:46,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=385700.0, ans=0.0 2024-09-18 06:19:10,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=385780.0, ans=0.125 2024-09-18 06:19:14,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=385780.0, ans=0.125 2024-09-18 06:19:17,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=385820.0, ans=0.2 2024-09-18 06:19:32,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=385860.0, ans=0.2 2024-09-18 06:19:36,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-09-18 06:19:43,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385860.0, ans=0.1 2024-09-18 06:19:49,087 INFO [train.py:1198] (1/2) Epoch 22, batch 1450, loss[loss=0.269, ctc_loss=0.1552, cr_loss=0.4297, attn_decoder_loss=0.2721, over 29503.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1332, cr_loss=0.3765, attn_decoder_loss=0.249, over 5803270.17 frames. ], batch size: 94, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:19:58,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=385900.0, ans=0.125 2024-09-18 06:20:17,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.06 vs. limit=10.0 2024-09-18 06:20:38,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=386020.0, ans=0.2 2024-09-18 06:20:55,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-18 06:20:58,985 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.476e+01 9.077e+01 9.872e+01 2.572e+02, threshold=1.815e+02, percent-clipped=2.0 2024-09-18 06:21:05,105 INFO [train.py:1198] (1/2) Epoch 22, batch 1500, loss[loss=0.2578, ctc_loss=0.1418, cr_loss=0.4098, attn_decoder_loss=0.2615, over 29623.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1333, cr_loss=0.3765, attn_decoder_loss=0.2493, over 5804309.84 frames. ], batch size: 86, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:21:14,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.65 vs. limit=15.0 2024-09-18 06:21:26,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=386140.0, ans=0.0 2024-09-18 06:21:42,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=386180.0, ans=0.125 2024-09-18 06:21:42,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=386180.0, ans=0.125 2024-09-18 06:21:46,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=386180.0, ans=10.0 2024-09-18 06:21:54,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-18 06:22:25,767 INFO [train.py:1198] (1/2) Epoch 22, batch 1550, loss[loss=0.2713, ctc_loss=0.1632, cr_loss=0.441, attn_decoder_loss=0.2735, over 29507.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1336, cr_loss=0.3765, attn_decoder_loss=0.2491, over 5781161.26 frames. ], batch size: 90, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:22:43,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.11 vs. limit=10.0 2024-09-18 06:23:01,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=386380.0, ans=0.1 2024-09-18 06:23:06,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=386380.0, ans=0.125 2024-09-18 06:23:10,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=386420.0, ans=0.025 2024-09-18 06:23:11,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2024-09-18 06:23:12,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.63 vs. limit=12.0 2024-09-18 06:23:13,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=386420.0, ans=0.0 2024-09-18 06:23:31,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=386460.0, ans=0.125 2024-09-18 06:23:34,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386460.0, ans=0.1 2024-09-18 06:23:35,902 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.667e+01 9.294e+01 9.875e+01 4.781e+02, threshold=1.859e+02, percent-clipped=2.0 2024-09-18 06:23:39,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=386460.0, ans=0.1 2024-09-18 06:23:41,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-09-18 06:23:41,956 INFO [train.py:1198] (1/2) Epoch 22, batch 1600, loss[loss=0.2414, ctc_loss=0.1178, cr_loss=0.3325, attn_decoder_loss=0.2478, over 29679.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1337, cr_loss=0.3766, attn_decoder_loss=0.249, over 5763986.17 frames. ], batch size: 85, lr: 5.08e-03, grad_scale: 16.0 2024-09-18 06:23:49,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=386500.0, ans=0.125 2024-09-18 06:24:15,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=386580.0, ans=0.0 2024-09-18 06:24:20,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.45 vs. limit=22.5 2024-09-18 06:24:23,232 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:24:38,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=386620.0, ans=0.0 2024-09-18 06:24:39,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=386620.0, ans=0.0 2024-09-18 06:24:41,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=386660.0, ans=0.125 2024-09-18 06:24:50,695 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:24:57,609 INFO [train.py:1198] (1/2) Epoch 22, batch 1650, loss[loss=0.2606, ctc_loss=0.1443, cr_loss=0.3934, attn_decoder_loss=0.2647, over 29690.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1336, cr_loss=0.376, attn_decoder_loss=0.249, over 5758124.26 frames. ], batch size: 89, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:24:59,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=386700.0, ans=0.125 2024-09-18 06:25:13,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=386740.0, ans=0.125 2024-09-18 06:25:55,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2024-09-18 06:25:55,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-18 06:25:56,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=386820.0, ans=0.125 2024-09-18 06:26:12,695 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.418e+01 9.168e+01 9.653e+01 1.530e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-18 06:26:14,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=386860.0, ans=0.125 2024-09-18 06:26:17,113 INFO [train.py:1198] (1/2) Epoch 22, batch 1700, loss[loss=0.216, ctc_loss=0.1076, cr_loss=0.3385, attn_decoder_loss=0.2206, over 29568.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1332, cr_loss=0.3758, attn_decoder_loss=0.2489, over 5780844.96 frames. ], batch size: 69, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:26:45,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386980.0, ans=0.1 2024-09-18 06:26:52,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=386980.0, ans=0.2 2024-09-18 06:27:09,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-09-18 06:27:21,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-09-18 06:27:32,815 INFO [train.py:1198] (1/2) Epoch 22, batch 1750, loss[loss=0.2209, ctc_loss=0.1192, cr_loss=0.3545, attn_decoder_loss=0.2243, over 29306.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1327, cr_loss=0.3753, attn_decoder_loss=0.2484, over 5788969.78 frames. ], batch size: 67, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:27:40,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=387100.0, ans=0.125 2024-09-18 06:27:49,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=387140.0, ans=0.2 2024-09-18 06:28:24,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=387220.0, ans=0.025 2024-09-18 06:28:38,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=387260.0, ans=0.2 2024-09-18 06:28:39,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.54 vs. limit=15.0 2024-09-18 06:28:44,542 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.338e+01 8.803e+01 9.481e+01 6.567e+02, threshold=1.761e+02, percent-clipped=1.0 2024-09-18 06:28:44,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=387260.0, ans=0.0 2024-09-18 06:28:49,111 INFO [train.py:1198] (1/2) Epoch 22, batch 1800, loss[loss=0.2573, ctc_loss=0.1491, cr_loss=0.4204, attn_decoder_loss=0.26, over 29681.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1328, cr_loss=0.3754, attn_decoder_loss=0.2483, over 5791110.01 frames. ], batch size: 83, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:28:54,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=387300.0, ans=0.125 2024-09-18 06:29:05,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-18 06:29:11,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=387340.0, ans=0.1 2024-09-18 06:29:48,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=387420.0, ans=0.125 2024-09-18 06:29:59,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=387460.0, ans=0.125 2024-09-18 06:29:59,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=387460.0, ans=0.125 2024-09-18 06:30:09,423 INFO [train.py:1198] (1/2) Epoch 22, batch 1850, loss[loss=0.2608, ctc_loss=0.1432, cr_loss=0.3911, attn_decoder_loss=0.2652, over 29615.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1324, cr_loss=0.3753, attn_decoder_loss=0.248, over 5797576.76 frames. ], batch size: 86, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:30:14,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=387500.0, ans=0.1 2024-09-18 06:30:17,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=387500.0, ans=0.0 2024-09-18 06:30:17,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.16 vs. limit=12.0 2024-09-18 06:30:23,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=387540.0, ans=0.125 2024-09-18 06:30:30,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=387540.0, ans=0.025 2024-09-18 06:30:55,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=387620.0, ans=0.125 2024-09-18 06:31:20,352 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.630e+01 9.053e+01 9.518e+01 1.576e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-18 06:31:24,767 INFO [train.py:1198] (1/2) Epoch 22, batch 1900, loss[loss=0.2595, ctc_loss=0.139, cr_loss=0.3997, attn_decoder_loss=0.264, over 29717.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1326, cr_loss=0.3755, attn_decoder_loss=0.2484, over 5805200.15 frames. ], batch size: 89, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:31:40,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.58 vs. limit=15.0 2024-09-18 06:31:55,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=387780.0, ans=0.05 2024-09-18 06:32:00,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=387780.0, ans=0.125 2024-09-18 06:32:15,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=387820.0, ans=0.0 2024-09-18 06:32:32,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=387860.0, ans=0.125 2024-09-18 06:32:32,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.03 vs. limit=22.5 2024-09-18 06:32:40,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2024-09-18 06:32:40,935 INFO [train.py:1198] (1/2) Epoch 22, batch 1950, loss[loss=0.2332, ctc_loss=0.1267, cr_loss=0.3614, attn_decoder_loss=0.237, over 29422.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1333, cr_loss=0.3773, attn_decoder_loss=0.2497, over 5819548.96 frames. ], batch size: 78, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:32:57,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=387940.0, ans=0.125 2024-09-18 06:33:03,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=387940.0, ans=0.0 2024-09-18 06:33:13,733 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:33:21,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=387980.0, ans=0.2 2024-09-18 06:33:29,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=22.5 2024-09-18 06:33:47,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=388060.0, ans=0.125 2024-09-18 06:33:55,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388060.0, ans=0.1 2024-09-18 06:33:56,801 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 8.668e+01 9.187e+01 9.705e+01 3.737e+02, threshold=1.837e+02, percent-clipped=2.0 2024-09-18 06:34:01,472 INFO [train.py:1198] (1/2) Epoch 22, batch 2000, loss[loss=0.2218, ctc_loss=0.1142, cr_loss=0.3468, attn_decoder_loss=0.2261, over 29352.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1339, cr_loss=0.3783, attn_decoder_loss=0.2503, over 5798445.45 frames. ], batch size: 67, lr: 5.07e-03, grad_scale: 16.0 2024-09-18 06:34:15,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=388140.0, ans=0.1 2024-09-18 06:34:26,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=388140.0, ans=0.0 2024-09-18 06:34:28,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-09-18 06:34:33,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=388180.0, ans=0.0 2024-09-18 06:34:36,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=388180.0, ans=0.125 2024-09-18 06:34:41,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=388180.0, ans=0.09899494936611666 2024-09-18 06:34:42,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=388180.0, ans=0.025 2024-09-18 06:34:54,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=388220.0, ans=0.125 2024-09-18 06:34:56,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=388220.0, ans=0.125 2024-09-18 06:35:03,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388260.0, ans=0.1 2024-09-18 06:35:17,272 INFO [train.py:1198] (1/2) Epoch 22, batch 2050, loss[loss=0.2282, ctc_loss=0.1252, cr_loss=0.3588, attn_decoder_loss=0.2317, over 29440.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1342, cr_loss=0.3788, attn_decoder_loss=0.2499, over 5789924.53 frames. ], batch size: 70, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:35:19,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=388300.0, ans=0.07 2024-09-18 06:35:23,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388300.0, ans=0.1 2024-09-18 06:35:44,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=388340.0, ans=0.125 2024-09-18 06:35:51,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-18 06:35:54,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388380.0, ans=0.1 2024-09-18 06:35:54,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=388380.0, ans=0.125 2024-09-18 06:36:03,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=388420.0, ans=0.07 2024-09-18 06:36:24,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=388460.0, ans=0.125 2024-09-18 06:36:30,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.598e+01 9.133e+01 9.835e+01 1.696e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-18 06:36:33,165 INFO [train.py:1198] (1/2) Epoch 22, batch 2100, loss[loss=0.2436, ctc_loss=0.1252, cr_loss=0.3594, attn_decoder_loss=0.2488, over 29792.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1338, cr_loss=0.3781, attn_decoder_loss=0.2494, over 5800972.43 frames. ], batch size: 81, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:36:33,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=388500.0, ans=0.1 2024-09-18 06:36:44,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=388500.0, ans=0.125 2024-09-18 06:37:11,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=388580.0, ans=0.2 2024-09-18 06:37:11,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=388580.0, ans=0.0 2024-09-18 06:37:25,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=388620.0, ans=0.0 2024-09-18 06:37:42,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=388660.0, ans=0.0 2024-09-18 06:37:52,731 INFO [train.py:1198] (1/2) Epoch 22, batch 2150, loss[loss=0.2476, ctc_loss=0.1356, cr_loss=0.3895, attn_decoder_loss=0.2514, over 29425.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1331, cr_loss=0.3768, attn_decoder_loss=0.2487, over 5816076.29 frames. ], batch size: 78, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:38:10,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-09-18 06:38:31,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=388780.0, ans=0.2 2024-09-18 06:38:51,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=388820.0, ans=22.5 2024-09-18 06:38:56,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2024-09-18 06:39:05,566 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.588e+01 8.944e+01 9.592e+01 1.412e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-18 06:39:06,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=388860.0, ans=0.125 2024-09-18 06:39:08,643 INFO [train.py:1198] (1/2) Epoch 22, batch 2200, loss[loss=0.2443, ctc_loss=0.1321, cr_loss=0.3765, attn_decoder_loss=0.2484, over 29633.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1334, cr_loss=0.3779, attn_decoder_loss=0.249, over 5813317.25 frames. ], batch size: 86, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:39:15,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-09-18 06:39:50,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.18 vs. limit=15.0 2024-09-18 06:39:54,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-18 06:39:56,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2024-09-18 06:40:23,972 INFO [train.py:1198] (1/2) Epoch 22, batch 2250, loss[loss=0.2401, ctc_loss=0.1252, cr_loss=0.3538, attn_decoder_loss=0.245, over 29702.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1331, cr_loss=0.3769, attn_decoder_loss=0.2488, over 5813055.48 frames. ], batch size: 82, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:40:59,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=389180.0, ans=0.125 2024-09-18 06:41:08,578 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:41:22,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=389220.0, ans=0.2 2024-09-18 06:41:26,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=389260.0, ans=0.125 2024-09-18 06:41:41,016 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.612e+01 9.109e+01 9.746e+01 4.316e+02, threshold=1.822e+02, percent-clipped=5.0 2024-09-18 06:41:44,067 INFO [train.py:1198] (1/2) Epoch 22, batch 2300, loss[loss=0.2198, ctc_loss=0.1147, cr_loss=0.3345, attn_decoder_loss=0.224, over 29280.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1326, cr_loss=0.3757, attn_decoder_loss=0.2476, over 5801002.46 frames. ], batch size: 71, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:41:44,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=389300.0, ans=0.0 2024-09-18 06:41:53,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=389300.0, ans=0.125 2024-09-18 06:42:05,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=389340.0, ans=0.0 2024-09-18 06:42:20,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=389380.0, ans=0.125 2024-09-18 06:42:31,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=389420.0, ans=0.0 2024-09-18 06:42:59,650 INFO [train.py:1198] (1/2) Epoch 22, batch 2350, loss[loss=0.2597, ctc_loss=0.151, cr_loss=0.4132, attn_decoder_loss=0.2626, over 29697.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1328, cr_loss=0.3762, attn_decoder_loss=0.2479, over 5806194.95 frames. ], batch size: 83, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:43:19,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=389540.0, ans=0.0 2024-09-18 06:43:38,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=389580.0, ans=6.0 2024-09-18 06:43:47,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=389620.0, ans=10.0 2024-09-18 06:44:01,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.07 vs. limit=15.0 2024-09-18 06:44:13,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.909e+01 8.749e+01 9.346e+01 1.024e+02 1.570e+02, threshold=1.869e+02, percent-clipped=0.0 2024-09-18 06:44:16,222 INFO [train.py:1198] (1/2) Epoch 22, batch 2400, loss[loss=0.2351, ctc_loss=0.1328, cr_loss=0.3825, attn_decoder_loss=0.238, over 29515.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1334, cr_loss=0.3766, attn_decoder_loss=0.2485, over 5809495.90 frames. ], batch size: 76, lr: 5.05e-03, grad_scale: 16.0 2024-09-18 06:44:23,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=389700.0, ans=0.2 2024-09-18 06:44:28,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=389700.0, ans=0.125 2024-09-18 06:44:28,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=389700.0, ans=0.125 2024-09-18 06:44:28,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=389700.0, ans=0.025 2024-09-18 06:44:42,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=389740.0, ans=0.125 2024-09-18 06:44:44,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=389740.0, ans=0.125 2024-09-18 06:44:46,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2024-09-18 06:44:52,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=389780.0, ans=0.125 2024-09-18 06:45:08,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=389820.0, ans=0.09899494936611666 2024-09-18 06:45:25,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=389860.0, ans=0.125 2024-09-18 06:45:36,563 INFO [train.py:1198] (1/2) Epoch 22, batch 2450, loss[loss=0.2549, ctc_loss=0.1282, cr_loss=0.3695, attn_decoder_loss=0.2608, over 29709.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1339, cr_loss=0.3771, attn_decoder_loss=0.2494, over 5786357.26 frames. ], batch size: 82, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:45:36,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=389900.0, ans=0.0 2024-09-18 06:45:40,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-18 06:46:02,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=389940.0, ans=0.025 2024-09-18 06:46:19,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-09-18 06:46:22,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.71 vs. limit=15.0 2024-09-18 06:46:25,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2024-09-18 06:46:39,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=390060.0, ans=0.125 2024-09-18 06:46:41,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=390060.0, ans=0.125 2024-09-18 06:46:45,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=390060.0, ans=0.125 2024-09-18 06:46:47,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=390060.0, ans=0.125 2024-09-18 06:46:50,168 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.749e+01 9.206e+01 9.779e+01 5.372e+02, threshold=1.841e+02, percent-clipped=2.0 2024-09-18 06:46:51,746 INFO [train.py:1198] (1/2) Epoch 22, batch 2500, loss[loss=0.2558, ctc_loss=0.1423, cr_loss=0.4017, attn_decoder_loss=0.2595, over 29629.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.134, cr_loss=0.377, attn_decoder_loss=0.2492, over 5795715.74 frames. ], batch size: 86, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:47:11,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=390140.0, ans=0.125 2024-09-18 06:47:23,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-18 06:47:33,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=390180.0, ans=0.0 2024-09-18 06:48:01,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=390260.0, ans=0.125 2024-09-18 06:48:07,750 INFO [train.py:1198] (1/2) Epoch 22, batch 2550, loss[loss=0.2108, ctc_loss=0.1073, cr_loss=0.3296, attn_decoder_loss=0.215, over 29372.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1337, cr_loss=0.3774, attn_decoder_loss=0.2493, over 5797403.84 frames. ], batch size: 67, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:48:35,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=390340.0, ans=0.09899494936611666 2024-09-18 06:48:44,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=390380.0, ans=0.2 2024-09-18 06:48:54,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=390420.0, ans=0.125 2024-09-18 06:48:58,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=390420.0, ans=0.035 2024-09-18 06:49:12,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=390460.0, ans=0.025 2024-09-18 06:49:22,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=390460.0, ans=0.1 2024-09-18 06:49:24,704 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.515e+01 9.035e+01 9.623e+01 2.254e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-18 06:49:26,243 INFO [train.py:1198] (1/2) Epoch 22, batch 2600, loss[loss=0.2365, ctc_loss=0.1245, cr_loss=0.3764, attn_decoder_loss=0.2406, over 29437.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1338, cr_loss=0.3778, attn_decoder_loss=0.2497, over 5793797.95 frames. ], batch size: 78, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:49:26,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=390500.0, ans=0.125 2024-09-18 06:49:37,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.94 vs. limit=10.0 2024-09-18 06:49:57,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390580.0, ans=0.1 2024-09-18 06:50:06,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=390580.0, ans=0.125 2024-09-18 06:50:10,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2024-09-18 06:50:15,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=390620.0, ans=0.025 2024-09-18 06:50:26,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=390620.0, ans=0.125 2024-09-18 06:50:43,876 INFO [train.py:1198] (1/2) Epoch 22, batch 2650, loss[loss=0.2607, ctc_loss=0.1462, cr_loss=0.4111, attn_decoder_loss=0.2643, over 29226.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1339, cr_loss=0.3785, attn_decoder_loss=0.2498, over 5800889.80 frames. ], batch size: 100, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:50:50,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=390700.0, ans=0.1 2024-09-18 06:51:02,470 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:51:08,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.09 vs. limit=12.0 2024-09-18 06:51:18,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=22.5 2024-09-18 06:51:23,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=390780.0, ans=0.0 2024-09-18 06:51:40,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=390820.0, ans=0.025 2024-09-18 06:51:42,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=390860.0, ans=0.2 2024-09-18 06:51:51,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=390860.0, ans=0.0 2024-09-18 06:51:53,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390860.0, ans=0.1 2024-09-18 06:51:56,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=390860.0, ans=0.07 2024-09-18 06:51:57,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.432e+01 9.013e+01 9.580e+01 2.667e+02, threshold=1.803e+02, percent-clipped=2.0 2024-09-18 06:51:59,082 INFO [train.py:1198] (1/2) Epoch 22, batch 2700, loss[loss=0.2514, ctc_loss=0.1376, cr_loss=0.381, attn_decoder_loss=0.2556, over 29525.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1338, cr_loss=0.3784, attn_decoder_loss=0.2499, over 5796779.39 frames. ], batch size: 87, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:52:02,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=390900.0, ans=0.0 2024-09-18 06:52:42,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.08 vs. limit=22.5 2024-09-18 06:52:45,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=391020.0, ans=0.0 2024-09-18 06:52:47,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2024-09-18 06:53:15,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-09-18 06:53:17,091 INFO [train.py:1198] (1/2) Epoch 22, batch 2750, loss[loss=0.2361, ctc_loss=0.1312, cr_loss=0.3716, attn_decoder_loss=0.2395, over 29500.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1327, cr_loss=0.3757, attn_decoder_loss=0.2485, over 5795780.10 frames. ], batch size: 75, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:53:17,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2024-09-18 06:53:28,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=391100.0, ans=0.0 2024-09-18 06:53:46,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=391140.0, ans=0.125 2024-09-18 06:53:49,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=391180.0, ans=0.125 2024-09-18 06:53:54,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=391180.0, ans=0.2 2024-09-18 06:53:57,457 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:54:04,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=22.5 2024-09-18 06:54:06,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=391220.0, ans=0.1 2024-09-18 06:54:15,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=391220.0, ans=0.125 2024-09-18 06:54:23,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=391260.0, ans=0.0 2024-09-18 06:54:26,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=391260.0, ans=0.1 2024-09-18 06:54:34,213 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.791e+01 9.407e+01 1.009e+02 2.763e+02, threshold=1.881e+02, percent-clipped=2.0 2024-09-18 06:54:35,691 INFO [train.py:1198] (1/2) Epoch 22, batch 2800, loss[loss=0.2609, ctc_loss=0.1637, cr_loss=0.3954, attn_decoder_loss=0.2629, over 20425.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1332, cr_loss=0.3766, attn_decoder_loss=0.2487, over 5776564.56 frames. ], batch size: 209, lr: 5.04e-03, grad_scale: 16.0 2024-09-18 06:54:45,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.04 vs. limit=10.0 2024-09-18 06:54:46,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391300.0, ans=0.1 2024-09-18 06:54:47,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391300.0, ans=0.1 2024-09-18 06:54:56,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.25 vs. limit=15.0 2024-09-18 06:54:57,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=391340.0, ans=0.07 2024-09-18 06:54:58,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=391340.0, ans=0.125 2024-09-18 06:55:03,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=391340.0, ans=0.1 2024-09-18 06:55:08,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-18 06:55:36,891 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:55:38,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=391460.0, ans=0.125 2024-09-18 06:55:50,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=12.0 2024-09-18 06:55:51,518 INFO [train.py:1198] (1/2) Epoch 22, batch 2850, loss[loss=0.2358, ctc_loss=0.1352, cr_loss=0.3771, attn_decoder_loss=0.2386, over 29515.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1337, cr_loss=0.3771, attn_decoder_loss=0.249, over 5761324.26 frames. ], batch size: 77, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 06:55:59,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=22.5 2024-09-18 06:56:09,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=391540.0, ans=0.0 2024-09-18 06:56:16,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=391540.0, ans=0.125 2024-09-18 06:56:23,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=391580.0, ans=0.125 2024-09-18 06:56:24,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=391580.0, ans=0.125 2024-09-18 06:56:33,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=391580.0, ans=0.035 2024-09-18 06:56:37,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.54 vs. limit=22.5 2024-09-18 06:56:55,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2024-09-18 06:57:03,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=391660.0, ans=0.07 2024-09-18 06:57:06,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2024-09-18 06:57:09,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.934e+01 9.738e+01 1.096e+02 2.741e+02, threshold=1.948e+02, percent-clipped=1.0 2024-09-18 06:57:09,074 INFO [train.py:1198] (1/2) Epoch 22, batch 2900, loss[loss=0.2389, ctc_loss=0.1237, cr_loss=0.3624, attn_decoder_loss=0.2437, over 29424.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1344, cr_loss=0.3792, attn_decoder_loss=0.2504, over 5786963.23 frames. ], batch size: 79, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 06:57:19,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=391700.0, ans=0.1 2024-09-18 06:57:52,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=391780.0, ans=0.5 2024-09-18 06:57:58,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=391820.0, ans=0.025 2024-09-18 06:58:17,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-09-18 06:58:19,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=391860.0, ans=0.025 2024-09-18 06:58:27,098 INFO [train.py:1198] (1/2) Epoch 22, batch 2950, loss[loss=0.2372, ctc_loss=0.1179, cr_loss=0.3609, attn_decoder_loss=0.2424, over 29516.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1333, cr_loss=0.3766, attn_decoder_loss=0.2492, over 5780900.42 frames. ], batch size: 75, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 06:58:35,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=391900.0, ans=0.125 2024-09-18 06:58:51,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=391940.0, ans=0.0 2024-09-18 06:59:36,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392060.0, ans=0.1 2024-09-18 06:59:36,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=392060.0, ans=0.125 2024-09-18 06:59:43,577 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.512e+01 8.926e+01 9.722e+01 3.359e+02, threshold=1.785e+02, percent-clipped=2.0 2024-09-18 06:59:43,614 INFO [train.py:1198] (1/2) Epoch 22, batch 3000, loss[loss=0.2431, ctc_loss=0.1288, cr_loss=0.3765, attn_decoder_loss=0.2475, over 29767.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1332, cr_loss=0.376, attn_decoder_loss=0.2491, over 5780984.07 frames. ], batch size: 81, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 06:59:43,614 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 07:00:03,085 INFO [train.py:1230] (1/2) Epoch 22, validation: loss=0.2118, ctc_loss=0.03901, cr_loss=5.241e-15, attn_decoder_loss=0.231, over 944034.00 frames. 2024-09-18 07:00:03,086 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 07:00:15,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=392100.0, ans=0.125 2024-09-18 07:00:27,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=392140.0, ans=0.125 2024-09-18 07:00:30,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=392140.0, ans=10.0 2024-09-18 07:00:58,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=392220.0, ans=0.2 2024-09-18 07:01:15,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=392260.0, ans=0.2 2024-09-18 07:01:21,518 INFO [train.py:1198] (1/2) Epoch 22, batch 3050, loss[loss=0.2405, ctc_loss=0.1349, cr_loss=0.3677, attn_decoder_loss=0.2441, over 29520.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1343, cr_loss=0.3776, attn_decoder_loss=0.2503, over 5776076.39 frames. ], batch size: 76, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 07:01:40,321 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:01:43,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.03 vs. limit=22.5 2024-09-18 07:01:44,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=392340.0, ans=0.125 2024-09-18 07:01:45,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.68 vs. limit=15.0 2024-09-18 07:01:46,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=392340.0, ans=0.0 2024-09-18 07:02:11,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=392420.0, ans=0.2 2024-09-18 07:02:11,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=392420.0, ans=0.025 2024-09-18 07:02:20,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=392460.0, ans=0.0 2024-09-18 07:02:22,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=392460.0, ans=0.125 2024-09-18 07:02:22,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=392460.0, ans=0.125 2024-09-18 07:02:27,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2024-09-18 07:02:34,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=392460.0, ans=0.125 2024-09-18 07:02:37,019 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 8.738e+01 9.227e+01 9.918e+01 5.288e+02, threshold=1.845e+02, percent-clipped=2.0 2024-09-18 07:02:37,041 INFO [train.py:1198] (1/2) Epoch 22, batch 3100, loss[loss=0.261, ctc_loss=0.152, cr_loss=0.3844, attn_decoder_loss=0.2645, over 29256.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1338, cr_loss=0.377, attn_decoder_loss=0.2496, over 5776269.56 frames. ], batch size: 100, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 07:02:38,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392500.0, ans=0.1 2024-09-18 07:02:41,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=392500.0, ans=0.125 2024-09-18 07:02:48,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=392500.0, ans=0.2 2024-09-18 07:02:52,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=392540.0, ans=0.2 2024-09-18 07:03:09,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=392580.0, ans=0.0 2024-09-18 07:03:16,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=392580.0, ans=0.0 2024-09-18 07:03:49,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=392660.0, ans=0.125 2024-09-18 07:03:51,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=392660.0, ans=0.125 2024-09-18 07:03:55,332 INFO [train.py:1198] (1/2) Epoch 22, batch 3150, loss[loss=0.2538, ctc_loss=0.1399, cr_loss=0.3768, attn_decoder_loss=0.2581, over 28854.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1336, cr_loss=0.3767, attn_decoder_loss=0.2494, over 5783172.61 frames. ], batch size: 104, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 07:04:18,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-09-18 07:05:10,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=392860.0, ans=0.125 2024-09-18 07:05:13,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.635e+01 9.167e+01 9.821e+01 1.751e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-18 07:05:13,357 INFO [train.py:1198] (1/2) Epoch 22, batch 3200, loss[loss=0.256, ctc_loss=0.1488, cr_loss=0.4239, attn_decoder_loss=0.2585, over 29406.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1329, cr_loss=0.3758, attn_decoder_loss=0.2487, over 5793169.16 frames. ], batch size: 79, lr: 5.03e-03, grad_scale: 16.0 2024-09-18 07:05:27,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=392940.0, ans=0.125 2024-09-18 07:05:36,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=392940.0, ans=0.125 2024-09-18 07:05:47,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=392980.0, ans=0.125 2024-09-18 07:05:49,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.33 vs. limit=10.0 2024-09-18 07:06:11,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2024-09-18 07:06:20,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=393060.0, ans=0.125 2024-09-18 07:06:26,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=393060.0, ans=0.125 2024-09-18 07:06:29,130 INFO [train.py:1198] (1/2) Epoch 22, batch 3250, loss[loss=0.248, ctc_loss=0.1331, cr_loss=0.3853, attn_decoder_loss=0.2522, over 29684.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1327, cr_loss=0.3753, attn_decoder_loss=0.2489, over 5799410.67 frames. ], batch size: 84, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:06:37,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=393100.0, ans=0.125 2024-09-18 07:06:59,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=393180.0, ans=0.0 2024-09-18 07:07:09,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-18 07:07:13,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=393220.0, ans=0.0 2024-09-18 07:07:14,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393220.0, ans=0.1 2024-09-18 07:07:25,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=393220.0, ans=0.125 2024-09-18 07:07:47,124 INFO [train.py:1198] (1/2) Epoch 22, batch 3300, loss[loss=0.2563, ctc_loss=0.1401, cr_loss=0.3784, attn_decoder_loss=0.2608, over 28574.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1317, cr_loss=0.373, attn_decoder_loss=0.2477, over 5796953.86 frames. ], batch size: 112, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:07:48,689 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.576e+01 9.104e+01 9.607e+01 2.025e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-18 07:08:05,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=393340.0, ans=0.125 2024-09-18 07:08:11,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=393340.0, ans=0.125 2024-09-18 07:08:29,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=393380.0, ans=0.07 2024-09-18 07:08:40,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.85 vs. limit=12.0 2024-09-18 07:08:47,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=393460.0, ans=0.0 2024-09-18 07:09:04,458 INFO [train.py:1198] (1/2) Epoch 22, batch 3350, loss[loss=0.2539, ctc_loss=0.1459, cr_loss=0.3864, attn_decoder_loss=0.2573, over 28753.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1327, cr_loss=0.3748, attn_decoder_loss=0.2488, over 5774218.60 frames. ], batch size: 104, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:09:05,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.72 vs. limit=15.0 2024-09-18 07:09:24,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=393540.0, ans=0.125 2024-09-18 07:09:36,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=393580.0, ans=0.125 2024-09-18 07:09:46,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=393580.0, ans=0.125 2024-09-18 07:10:09,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=393660.0, ans=0.125 2024-09-18 07:10:20,920 INFO [train.py:1198] (1/2) Epoch 22, batch 3400, loss[loss=0.2146, ctc_loss=0.1187, cr_loss=0.3437, attn_decoder_loss=0.2176, over 29314.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1333, cr_loss=0.3759, attn_decoder_loss=0.2488, over 5765965.26 frames. ], batch size: 67, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:10:22,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.673e+01 9.256e+01 9.754e+01 2.312e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-18 07:10:24,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=393700.0, ans=0.125 2024-09-18 07:10:31,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=393700.0, ans=0.125 2024-09-18 07:10:31,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393700.0, ans=0.1 2024-09-18 07:10:33,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=393700.0, ans=0.125 2024-09-18 07:11:02,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2024-09-18 07:11:10,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.66 vs. limit=22.5 2024-09-18 07:11:13,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=393820.0, ans=0.2 2024-09-18 07:11:18,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2024-09-18 07:11:18,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-09-18 07:11:31,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=393860.0, ans=0.04949747468305833 2024-09-18 07:11:38,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-09-18 07:11:38,636 INFO [train.py:1198] (1/2) Epoch 22, batch 3450, loss[loss=0.2538, ctc_loss=0.1348, cr_loss=0.3925, attn_decoder_loss=0.2583, over 28327.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1333, cr_loss=0.3764, attn_decoder_loss=0.2491, over 5773570.80 frames. ], batch size: 111, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:11:51,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=393900.0, ans=0.0 2024-09-18 07:12:03,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=393940.0, ans=0.025 2024-09-18 07:12:07,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393980.0, ans=0.1 2024-09-18 07:12:17,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=393980.0, ans=0.2 2024-09-18 07:12:40,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=394060.0, ans=0.125 2024-09-18 07:12:56,515 INFO [train.py:1198] (1/2) Epoch 22, batch 3500, loss[loss=0.2334, ctc_loss=0.1275, cr_loss=0.3564, attn_decoder_loss=0.2373, over 29777.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1333, cr_loss=0.3767, attn_decoder_loss=0.2488, over 5776370.20 frames. ], batch size: 72, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:12:56,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=394100.0, ans=0.025 2024-09-18 07:12:58,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.509e+01 8.992e+01 9.710e+01 6.035e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-18 07:13:07,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=394100.0, ans=0.2 2024-09-18 07:13:20,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=394140.0, ans=0.95 2024-09-18 07:13:25,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=394180.0, ans=0.1 2024-09-18 07:13:25,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.64 vs. limit=15.0 2024-09-18 07:13:31,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=394180.0, ans=0.125 2024-09-18 07:13:52,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=394220.0, ans=0.125 2024-09-18 07:13:53,819 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:14:01,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=394260.0, ans=0.125 2024-09-18 07:14:11,196 INFO [train.py:1198] (1/2) Epoch 22, batch 3550, loss[loss=0.2562, ctc_loss=0.1301, cr_loss=0.3685, attn_decoder_loss=0.262, over 29713.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1329, cr_loss=0.3756, attn_decoder_loss=0.2487, over 5783901.56 frames. ], batch size: 89, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:14:36,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=394340.0, ans=0.025 2024-09-18 07:14:44,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=394380.0, ans=0.025 2024-09-18 07:14:52,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=394380.0, ans=22.5 2024-09-18 07:15:11,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.57 vs. limit=15.0 2024-09-18 07:15:17,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=394460.0, ans=0.0 2024-09-18 07:15:18,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=394460.0, ans=0.125 2024-09-18 07:15:20,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.55 vs. limit=15.0 2024-09-18 07:15:25,911 INFO [train.py:1198] (1/2) Epoch 22, batch 3600, loss[loss=0.2458, ctc_loss=0.1289, cr_loss=0.3711, attn_decoder_loss=0.2505, over 29514.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1329, cr_loss=0.3756, attn_decoder_loss=0.2488, over 5793487.71 frames. ], batch size: 77, lr: 5.02e-03, grad_scale: 16.0 2024-09-18 07:15:27,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.441e+01 8.945e+01 9.412e+01 1.487e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-18 07:15:51,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=394540.0, ans=0.025 2024-09-18 07:16:17,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=12.0 2024-09-18 07:16:40,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.80 vs. limit=12.0 2024-09-18 07:16:41,173 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:16:42,258 INFO [train.py:1198] (1/2) Epoch 22, batch 3650, loss[loss=0.2683, ctc_loss=0.1535, cr_loss=0.4216, attn_decoder_loss=0.2717, over 29508.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1323, cr_loss=0.3742, attn_decoder_loss=0.2482, over 5795058.31 frames. ], batch size: 90, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:16:49,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=394700.0, ans=0.1 2024-09-18 07:16:52,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=394700.0, ans=0.125 2024-09-18 07:16:55,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394740.0, ans=0.1 2024-09-18 07:17:19,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=394780.0, ans=0.0 2024-09-18 07:17:40,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=394860.0, ans=0.2 2024-09-18 07:17:40,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=394860.0, ans=0.0 2024-09-18 07:17:41,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=394860.0, ans=0.0 2024-09-18 07:17:56,473 INFO [train.py:1198] (1/2) Epoch 22, batch 3700, loss[loss=0.2538, ctc_loss=0.1463, cr_loss=0.3996, attn_decoder_loss=0.2569, over 29721.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1317, cr_loss=0.3737, attn_decoder_loss=0.248, over 5805251.08 frames. ], batch size: 84, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:17:59,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.466e+01 8.986e+01 9.824e+01 1.367e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-18 07:18:41,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=395020.0, ans=0.125 2024-09-18 07:18:42,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=395020.0, ans=0.0 2024-09-18 07:18:53,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=395020.0, ans=0.125 2024-09-18 07:18:59,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=395060.0, ans=0.125 2024-09-18 07:19:12,624 INFO [train.py:1198] (1/2) Epoch 22, batch 3750, loss[loss=0.2157, ctc_loss=0.1145, cr_loss=0.3394, attn_decoder_loss=0.2194, over 29333.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.132, cr_loss=0.3742, attn_decoder_loss=0.2479, over 5808703.48 frames. ], batch size: 67, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:19:12,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=395100.0, ans=0.125 2024-09-18 07:19:15,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=395100.0, ans=0.125 2024-09-18 07:19:23,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=395100.0, ans=0.0 2024-09-18 07:20:04,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=395220.0, ans=0.0 2024-09-18 07:20:16,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=395260.0, ans=0.0 2024-09-18 07:20:23,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=395260.0, ans=0.025 2024-09-18 07:20:27,725 INFO [train.py:1198] (1/2) Epoch 22, batch 3800, loss[loss=0.2566, ctc_loss=0.1457, cr_loss=0.3957, attn_decoder_loss=0.2602, over 29635.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.132, cr_loss=0.3742, attn_decoder_loss=0.2477, over 5799227.12 frames. ], batch size: 86, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:20:30,688 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.441e+01 9.008e+01 9.541e+01 1.561e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-18 07:20:36,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=395300.0, ans=0.125 2024-09-18 07:20:47,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=395340.0, ans=0.0 2024-09-18 07:20:49,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.31 vs. limit=15.0 2024-09-18 07:21:36,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=395460.0, ans=0.025 2024-09-18 07:21:41,876 INFO [train.py:1198] (1/2) Epoch 22, batch 3850, loss[loss=0.2606, ctc_loss=0.147, cr_loss=0.4147, attn_decoder_loss=0.264, over 29205.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.132, cr_loss=0.3743, attn_decoder_loss=0.2478, over 5813790.43 frames. ], batch size: 100, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:21:45,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=395500.0, ans=0.0 2024-09-18 07:21:49,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=395500.0, ans=0.0 2024-09-18 07:21:52,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=395500.0, ans=0.0 2024-09-18 07:21:52,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=395500.0, ans=0.125 2024-09-18 07:21:55,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395540.0, ans=0.1 2024-09-18 07:22:33,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2024-09-18 07:22:37,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=395620.0, ans=0.125 2024-09-18 07:22:38,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2024-09-18 07:22:41,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395660.0, ans=0.1 2024-09-18 07:22:50,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2024-09-18 07:22:57,687 INFO [train.py:1198] (1/2) Epoch 22, batch 3900, loss[loss=0.2631, ctc_loss=0.1486, cr_loss=0.4157, attn_decoder_loss=0.2665, over 29616.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1322, cr_loss=0.3753, attn_decoder_loss=0.2484, over 5818118.53 frames. ], batch size: 86, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:23:00,757 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.669e+01 9.089e+01 9.620e+01 1.531e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-18 07:23:06,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=395700.0, ans=0.04949747468305833 2024-09-18 07:23:09,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=395700.0, ans=0.035 2024-09-18 07:23:17,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=395740.0, ans=0.125 2024-09-18 07:23:20,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.40 vs. limit=15.0 2024-09-18 07:23:26,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.86 vs. limit=10.0 2024-09-18 07:23:37,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=12.0 2024-09-18 07:23:52,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=395820.0, ans=0.125 2024-09-18 07:24:01,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=395860.0, ans=0.125 2024-09-18 07:24:11,538 INFO [train.py:1198] (1/2) Epoch 22, batch 3950, loss[loss=0.255, ctc_loss=0.1349, cr_loss=0.3993, attn_decoder_loss=0.2595, over 29476.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1316, cr_loss=0.3743, attn_decoder_loss=0.248, over 5837018.28 frames. ], batch size: 97, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:24:14,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.49 vs. limit=15.0 2024-09-18 07:24:17,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=395900.0, ans=0.125 2024-09-18 07:24:17,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=395900.0, ans=0.2 2024-09-18 07:25:01,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=396020.0, ans=0.0 2024-09-18 07:25:05,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=396020.0, ans=0.125 2024-09-18 07:25:07,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.05 vs. limit=15.0 2024-09-18 07:25:12,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=396060.0, ans=0.125 2024-09-18 07:25:14,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=396060.0, ans=0.02 2024-09-18 07:25:21,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=396060.0, ans=0.125 2024-09-18 07:25:27,270 INFO [train.py:1198] (1/2) Epoch 22, batch 4000, loss[loss=0.2325, ctc_loss=0.1282, cr_loss=0.3698, attn_decoder_loss=0.2359, over 29538.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1321, cr_loss=0.3745, attn_decoder_loss=0.2483, over 5813094.81 frames. ], batch size: 74, lr: 5.01e-03, grad_scale: 16.0 2024-09-18 07:25:29,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-09-18 07:25:30,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.472e+01 8.530e+01 8.952e+01 9.583e+01 2.635e+02, threshold=1.790e+02, percent-clipped=1.0 2024-09-18 07:25:34,901 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:25:35,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=396100.0, ans=0.125 2024-09-18 07:25:42,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=396140.0, ans=0.125 2024-09-18 07:25:55,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=396180.0, ans=0.125 2024-09-18 07:25:57,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=396180.0, ans=0.2 2024-09-18 07:26:37,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=396260.0, ans=0.0 2024-09-18 07:26:37,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-09-18 07:26:41,543 INFO [train.py:1198] (1/2) Epoch 22, batch 4050, loss[loss=0.2713, ctc_loss=0.1732, cr_loss=0.4303, attn_decoder_loss=0.2726, over 19996.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1325, cr_loss=0.3748, attn_decoder_loss=0.2483, over 5796227.53 frames. ], batch size: 209, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:26:46,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=396300.0, ans=0.0 2024-09-18 07:27:31,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=396420.0, ans=0.0 2024-09-18 07:27:32,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2024-09-18 07:27:38,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=396420.0, ans=0.125 2024-09-18 07:27:55,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=396500.0, ans=0.125 2024-09-18 07:27:55,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=396500.0, ans=0.125 2024-09-18 07:27:56,313 INFO [train.py:1198] (1/2) Epoch 22, batch 4100, loss[loss=0.2636, ctc_loss=0.1499, cr_loss=0.4122, attn_decoder_loss=0.2671, over 29499.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.133, cr_loss=0.3753, attn_decoder_loss=0.2483, over 5792124.05 frames. ], batch size: 90, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:27:58,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=396500.0, ans=0.0 2024-09-18 07:28:00,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2024-09-18 07:28:00,767 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.697e+01 9.214e+01 1.008e+02 3.653e+02, threshold=1.843e+02, percent-clipped=2.0 2024-09-18 07:28:09,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=15.0 2024-09-18 07:28:14,243 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:28:22,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=396540.0, ans=0.125 2024-09-18 07:28:31,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=396580.0, ans=0.0 2024-09-18 07:28:36,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=396580.0, ans=0.125 2024-09-18 07:28:40,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=396620.0, ans=0.0 2024-09-18 07:28:51,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.96 vs. limit=15.0 2024-09-18 07:28:55,523 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:29:09,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=396700.0, ans=0.125 2024-09-18 07:29:10,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-18 07:29:11,101 INFO [train.py:1198] (1/2) Epoch 22, batch 4150, loss[loss=0.2414, ctc_loss=0.1327, cr_loss=0.3786, attn_decoder_loss=0.2451, over 29515.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1323, cr_loss=0.3742, attn_decoder_loss=0.248, over 5798320.55 frames. ], batch size: 77, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:29:16,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2024-09-18 07:29:17,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=396700.0, ans=0.0 2024-09-18 07:29:53,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=396780.0, ans=0.0 2024-09-18 07:29:59,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=396820.0, ans=0.5 2024-09-18 07:30:16,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=396860.0, ans=0.125 2024-09-18 07:30:19,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=396860.0, ans=0.0 2024-09-18 07:30:22,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396860.0, ans=0.1 2024-09-18 07:30:25,224 INFO [train.py:1198] (1/2) Epoch 22, batch 4200, loss[loss=0.2516, ctc_loss=0.1364, cr_loss=0.3694, attn_decoder_loss=0.2562, over 29523.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1328, cr_loss=0.375, attn_decoder_loss=0.2486, over 5800382.54 frames. ], batch size: 90, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:30:29,585 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.447e+01 9.085e+01 9.593e+01 1.747e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-18 07:30:49,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=396940.0, ans=0.125 2024-09-18 07:30:50,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=396940.0, ans=0.0 2024-09-18 07:31:07,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=396980.0, ans=0.0 2024-09-18 07:31:22,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=397020.0, ans=0.2 2024-09-18 07:31:37,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=397060.0, ans=0.125 2024-09-18 07:31:38,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=397100.0, ans=0.0 2024-09-18 07:31:39,885 INFO [train.py:1198] (1/2) Epoch 22, batch 4250, loss[loss=0.2285, ctc_loss=0.121, cr_loss=0.3704, attn_decoder_loss=0.2322, over 29496.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1328, cr_loss=0.3753, attn_decoder_loss=0.2488, over 5806265.40 frames. ], batch size: 74, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:31:47,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=397100.0, ans=0.125 2024-09-18 07:32:03,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=397140.0, ans=0.125 2024-09-18 07:32:07,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=397180.0, ans=0.125 2024-09-18 07:32:16,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=397180.0, ans=0.025 2024-09-18 07:32:35,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=397220.0, ans=0.0 2024-09-18 07:32:53,985 INFO [train.py:1198] (1/2) Epoch 22, batch 4300, loss[loss=0.266, ctc_loss=0.1445, cr_loss=0.3959, attn_decoder_loss=0.2707, over 29557.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.133, cr_loss=0.3755, attn_decoder_loss=0.2492, over 5795094.61 frames. ], batch size: 87, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:32:58,453 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.737e+01 9.479e+01 1.036e+02 1.602e+02, threshold=1.896e+02, percent-clipped=0.0 2024-09-18 07:33:12,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=397340.0, ans=0.0 2024-09-18 07:33:17,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=397340.0, ans=0.125 2024-09-18 07:33:19,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=397340.0, ans=0.0 2024-09-18 07:33:24,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=15.0 2024-09-18 07:33:30,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=397380.0, ans=0.0 2024-09-18 07:34:00,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2024-09-18 07:34:08,148 INFO [train.py:1198] (1/2) Epoch 22, batch 4350, loss[loss=0.2601, ctc_loss=0.1395, cr_loss=0.3849, attn_decoder_loss=0.2649, over 29443.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1355, cr_loss=0.3804, attn_decoder_loss=0.2522, over 5796898.29 frames. ], batch size: 97, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:34:18,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=15.0 2024-09-18 07:34:21,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=397540.0, ans=0.1 2024-09-18 07:34:31,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten.whitening_limit, batch_count=397540.0, ans=15.0 2024-09-18 07:34:34,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=397540.0, ans=0.125 2024-09-18 07:34:38,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.23 vs. limit=12.0 2024-09-18 07:34:39,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=397580.0, ans=0.125 2024-09-18 07:34:51,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=397620.0, ans=0.07 2024-09-18 07:35:08,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397660.0, ans=0.1 2024-09-18 07:35:12,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=397660.0, ans=0.0 2024-09-18 07:35:21,409 INFO [train.py:1198] (1/2) Epoch 22, batch 4400, loss[loss=0.2611, ctc_loss=0.1502, cr_loss=0.3878, attn_decoder_loss=0.2648, over 27157.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1371, cr_loss=0.3831, attn_decoder_loss=0.2544, over 5764635.91 frames. ], batch size: 124, lr: 5.00e-03, grad_scale: 16.0 2024-09-18 07:35:25,704 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.295e+01 9.019e+01 9.432e+01 1.021e+02 4.096e+02, threshold=1.886e+02, percent-clipped=2.0 2024-09-18 07:35:35,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397740.0, ans=0.1 2024-09-18 07:35:41,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=397740.0, ans=0.0 2024-09-18 07:35:48,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=397740.0, ans=0.035 2024-09-18 07:35:58,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=397780.0, ans=0.125 2024-09-18 07:36:36,332 INFO [train.py:1198] (1/2) Epoch 22, batch 4450, loss[loss=0.2842, ctc_loss=0.1903, cr_loss=0.4164, attn_decoder_loss=0.2854, over 20292.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1414, cr_loss=0.3878, attn_decoder_loss=0.257, over 5572305.52 frames. ], batch size: 209, lr: 5.00e-03, grad_scale: 8.0 2024-09-18 07:36:47,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=397900.0, ans=0.125 2024-09-18 07:36:51,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=397940.0, ans=0.0 2024-09-18 07:36:54,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=397940.0, ans=0.0 2024-09-18 07:37:03,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=397940.0, ans=0.125 2024-09-18 07:37:09,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=397980.0, ans=0.125 2024-09-18 07:37:14,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=397980.0, ans=0.025 2024-09-18 07:37:22,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.62 vs. limit=10.0 2024-09-18 07:37:37,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=398060.0, ans=0.125 2024-09-18 07:37:42,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-09-18 07:37:42,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=10.98 vs. limit=12.0 2024-09-18 07:37:50,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=398100.0, ans=0.1 2024-09-18 07:37:51,773 INFO [train.py:1198] (1/2) Epoch 22, batch 4500, loss[loss=0.2672, ctc_loss=0.1715, cr_loss=0.4114, attn_decoder_loss=0.2686, over 20307.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1457, cr_loss=0.3899, attn_decoder_loss=0.259, over 5230004.09 frames. ], batch size: 210, lr: 5.00e-03, grad_scale: 8.0 2024-09-18 07:37:57,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.580e+01 1.014e+02 1.103e+02 1.223e+02 2.065e+02, threshold=2.205e+02, percent-clipped=1.0 2024-09-18 07:38:03,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=398100.0, ans=0.2 2024-09-18 07:38:23,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=398180.0, ans=0.125 2024-09-18 07:38:23,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=398180.0, ans=0.125 2024-09-18 07:38:24,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=398180.0, ans=0.025 2024-09-18 07:39:14,511 INFO [train.py:1198] (1/2) Epoch 23, batch 0, loss[loss=0.2205, ctc_loss=0.1125, cr_loss=0.3312, attn_decoder_loss=0.2251, over 29612.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1125, cr_loss=0.3312, attn_decoder_loss=0.2251, over 29612.00 frames. ], batch size: 73, lr: 4.89e-03, grad_scale: 16.0 2024-09-18 07:39:14,511 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 07:39:33,043 INFO [train.py:1230] (1/2) Epoch 23, validation: loss=0.212, ctc_loss=0.03823, cr_loss=5.578e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 07:39:33,043 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 07:39:58,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=398240.0, ans=0.2 2024-09-18 07:40:21,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=398320.0, ans=0.125 2024-09-18 07:40:27,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=398320.0, ans=0.0 2024-09-18 07:40:37,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-18 07:40:41,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=398360.0, ans=0.125 2024-09-18 07:40:49,045 INFO [train.py:1198] (1/2) Epoch 23, batch 50, loss[loss=0.2245, ctc_loss=0.1199, cr_loss=0.354, attn_decoder_loss=0.2282, over 29413.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1335, cr_loss=0.378, attn_decoder_loss=0.2483, over 1268255.72 frames. ], batch size: 70, lr: 4.89e-03, grad_scale: 8.0 2024-09-18 07:40:58,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.71 vs. limit=15.0 2024-09-18 07:41:02,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=398400.0, ans=0.05 2024-09-18 07:41:03,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=398400.0, ans=0.125 2024-09-18 07:41:14,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=398440.0, ans=0.025 2024-09-18 07:41:38,768 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 8.809e+01 9.782e+01 1.101e+02 2.337e+02, threshold=1.956e+02, percent-clipped=1.0 2024-09-18 07:41:54,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=398560.0, ans=0.125 2024-09-18 07:41:56,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-09-18 07:41:58,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=398560.0, ans=0.125 2024-09-18 07:42:08,981 INFO [train.py:1198] (1/2) Epoch 23, batch 100, loss[loss=0.2351, ctc_loss=0.1282, cr_loss=0.3609, attn_decoder_loss=0.239, over 29549.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.136, cr_loss=0.3816, attn_decoder_loss=0.2515, over 2252520.95 frames. ], batch size: 76, lr: 4.89e-03, grad_scale: 8.0 2024-09-18 07:42:36,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398640.0, ans=0.1 2024-09-18 07:42:36,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=398640.0, ans=0.0 2024-09-18 07:43:10,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=398760.0, ans=0.125 2024-09-18 07:43:22,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=398800.0, ans=0.0 2024-09-18 07:43:23,639 INFO [train.py:1198] (1/2) Epoch 23, batch 150, loss[loss=0.2264, ctc_loss=0.1197, cr_loss=0.3491, attn_decoder_loss=0.2305, over 29416.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1334, cr_loss=0.3763, attn_decoder_loss=0.2492, over 3046649.39 frames. ], batch size: 70, lr: 4.89e-03, grad_scale: 8.0 2024-09-18 07:43:23,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=398800.0, ans=0.2 2024-09-18 07:43:29,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2024-09-18 07:43:31,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=398800.0, ans=0.125 2024-09-18 07:43:45,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=398840.0, ans=0.125 2024-09-18 07:43:57,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=398880.0, ans=0.125 2024-09-18 07:44:08,888 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.443e+01 9.031e+01 9.523e+01 1.308e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-18 07:44:10,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=398920.0, ans=0.125 2024-09-18 07:44:12,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=398920.0, ans=0.2 2024-09-18 07:44:13,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=398920.0, ans=0.09899494936611666 2024-09-18 07:44:16,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=398920.0, ans=0.2 2024-09-18 07:44:19,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=398920.0, ans=0.125 2024-09-18 07:44:33,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=398960.0, ans=0.07 2024-09-18 07:44:38,852 INFO [train.py:1198] (1/2) Epoch 23, batch 200, loss[loss=0.2535, ctc_loss=0.133, cr_loss=0.3894, attn_decoder_loss=0.2582, over 27457.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1323, cr_loss=0.3753, attn_decoder_loss=0.2481, over 3658611.50 frames. ], batch size: 125, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:44:39,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=399000.0, ans=0.2 2024-09-18 07:45:04,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=399040.0, ans=10.0 2024-09-18 07:45:05,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=399040.0, ans=0.0 2024-09-18 07:45:26,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=399080.0, ans=0.04949747468305833 2024-09-18 07:45:27,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=399120.0, ans=0.125 2024-09-18 07:45:48,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=12.0 2024-09-18 07:45:59,855 INFO [train.py:1198] (1/2) Epoch 23, batch 250, loss[loss=0.2622, ctc_loss=0.1468, cr_loss=0.3911, attn_decoder_loss=0.2663, over 29262.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1324, cr_loss=0.3752, attn_decoder_loss=0.2483, over 4141753.81 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:46:00,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=399200.0, ans=0.2 2024-09-18 07:46:03,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=399200.0, ans=0.0 2024-09-18 07:46:06,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=399200.0, ans=0.07 2024-09-18 07:46:18,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=399240.0, ans=0.5 2024-09-18 07:46:22,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399240.0, ans=0.1 2024-09-18 07:46:33,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=399280.0, ans=0.0 2024-09-18 07:46:35,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=399280.0, ans=0.0 2024-09-18 07:46:45,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.537e+01 9.009e+01 9.547e+01 2.225e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-18 07:46:59,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=22.5 2024-09-18 07:47:03,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399360.0, ans=0.1 2024-09-18 07:47:11,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.53 vs. limit=10.0 2024-09-18 07:47:15,479 INFO [train.py:1198] (1/2) Epoch 23, batch 300, loss[loss=0.2721, ctc_loss=0.15, cr_loss=0.4222, attn_decoder_loss=0.2763, over 29515.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1324, cr_loss=0.3758, attn_decoder_loss=0.2481, over 4510162.84 frames. ], batch size: 92, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:47:20,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2024-09-18 07:47:44,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=399480.0, ans=0.2 2024-09-18 07:48:31,065 INFO [train.py:1198] (1/2) Epoch 23, batch 350, loss[loss=0.222, ctc_loss=0.116, cr_loss=0.3431, attn_decoder_loss=0.2262, over 29324.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1321, cr_loss=0.3753, attn_decoder_loss=0.2481, over 4795619.12 frames. ], batch size: 71, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:48:54,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399640.0, ans=0.1 2024-09-18 07:49:09,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=399680.0, ans=0.125 2024-09-18 07:49:13,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=399680.0, ans=0.0 2024-09-18 07:49:13,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=399680.0, ans=0.0 2024-09-18 07:49:16,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=399680.0, ans=0.125 2024-09-18 07:49:16,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=399680.0, ans=0.0 2024-09-18 07:49:20,804 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.416e+01 8.727e+01 9.232e+01 2.116e+02, threshold=1.745e+02, percent-clipped=2.0 2024-09-18 07:49:25,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=399720.0, ans=0.125 2024-09-18 07:49:49,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=399800.0, ans=0.0 2024-09-18 07:49:50,850 INFO [train.py:1198] (1/2) Epoch 23, batch 400, loss[loss=0.2485, ctc_loss=0.1361, cr_loss=0.4079, attn_decoder_loss=0.2519, over 29719.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.131, cr_loss=0.3732, attn_decoder_loss=0.2476, over 5025175.46 frames. ], batch size: 82, lr: 4.88e-03, grad_scale: 16.0 2024-09-18 07:50:00,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=399800.0, ans=0.125 2024-09-18 07:50:29,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.19 vs. limit=22.5 2024-09-18 07:50:36,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=399920.0, ans=0.125 2024-09-18 07:50:46,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=15.0 2024-09-18 07:51:14,610 INFO [train.py:1198] (1/2) Epoch 23, batch 450, loss[loss=0.2524, ctc_loss=0.1414, cr_loss=0.39, attn_decoder_loss=0.256, over 29699.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1307, cr_loss=0.3726, attn_decoder_loss=0.2477, over 5187582.08 frames. ], batch size: 83, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:51:14,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400000.0, ans=0.1 2024-09-18 07:51:23,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400000.0, ans=0.1 2024-09-18 07:51:51,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400080.0, ans=0.1 2024-09-18 07:51:51,305 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:52:01,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.450e+01 8.997e+01 9.501e+01 2.678e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-18 07:52:18,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=400160.0, ans=0.09899494936611666 2024-09-18 07:52:30,188 INFO [train.py:1198] (1/2) Epoch 23, batch 500, loss[loss=0.257, ctc_loss=0.1422, cr_loss=0.3934, attn_decoder_loss=0.261, over 29441.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1302, cr_loss=0.3725, attn_decoder_loss=0.247, over 5331203.76 frames. ], batch size: 94, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:52:59,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.22 vs. limit=15.0 2024-09-18 07:53:00,468 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2024-09-18 07:53:04,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=400280.0, ans=0.0 2024-09-18 07:53:08,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=400280.0, ans=0.125 2024-09-18 07:53:11,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=400280.0, ans=0.0 2024-09-18 07:53:21,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=400320.0, ans=0.125 2024-09-18 07:53:21,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=400320.0, ans=0.125 2024-09-18 07:53:30,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-09-18 07:53:49,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2024-09-18 07:53:49,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2024-09-18 07:53:50,364 INFO [train.py:1198] (1/2) Epoch 23, batch 550, loss[loss=0.2522, ctc_loss=0.1292, cr_loss=0.3756, attn_decoder_loss=0.2575, over 28886.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1306, cr_loss=0.3723, attn_decoder_loss=0.2473, over 5425302.59 frames. ], batch size: 104, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:53:55,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400400.0, ans=0.1 2024-09-18 07:53:59,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=400400.0, ans=0.1 2024-09-18 07:54:01,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=400400.0, ans=0.125 2024-09-18 07:54:10,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=400440.0, ans=0.125 2024-09-18 07:54:23,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=400480.0, ans=0.125 2024-09-18 07:54:35,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=400520.0, ans=0.125 2024-09-18 07:54:37,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.525e+01 9.043e+01 9.907e+01 2.945e+02, threshold=1.809e+02, percent-clipped=3.0 2024-09-18 07:54:53,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=400560.0, ans=10.0 2024-09-18 07:55:01,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.46 vs. limit=10.0 2024-09-18 07:55:05,790 INFO [train.py:1198] (1/2) Epoch 23, batch 600, loss[loss=0.2658, ctc_loss=0.1496, cr_loss=0.4015, attn_decoder_loss=0.2698, over 29307.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1307, cr_loss=0.3733, attn_decoder_loss=0.2475, over 5511223.10 frames. ], batch size: 100, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 07:55:24,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=400640.0, ans=0.0 2024-09-18 07:55:25,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400640.0, ans=0.1 2024-09-18 07:55:27,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=400640.0, ans=22.5 2024-09-18 07:55:28,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400640.0, ans=0.1 2024-09-18 07:55:30,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=400640.0, ans=0.125 2024-09-18 07:55:33,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=400640.0, ans=0.125 2024-09-18 07:56:14,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2024-09-18 07:56:21,495 INFO [train.py:1198] (1/2) Epoch 23, batch 650, loss[loss=0.2467, ctc_loss=0.1321, cr_loss=0.3969, attn_decoder_loss=0.2506, over 29763.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1304, cr_loss=0.3726, attn_decoder_loss=0.247, over 5588437.08 frames. ], batch size: 81, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 07:56:26,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=400800.0, ans=0.125 2024-09-18 07:56:53,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=400880.0, ans=0.07 2024-09-18 07:57:12,931 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.571e+01 9.065e+01 9.710e+01 2.691e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-18 07:57:13,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=400920.0, ans=0.125 2024-09-18 07:57:25,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=400960.0, ans=0.125 2024-09-18 07:57:25,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-09-18 07:57:28,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=400960.0, ans=0.125 2024-09-18 07:57:35,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400960.0, ans=0.1 2024-09-18 07:57:41,626 INFO [train.py:1198] (1/2) Epoch 23, batch 700, loss[loss=0.245, ctc_loss=0.1366, cr_loss=0.3808, attn_decoder_loss=0.2485, over 29522.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1308, cr_loss=0.3731, attn_decoder_loss=0.2478, over 5638571.22 frames. ], batch size: 76, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 07:57:49,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=401000.0, ans=0.025 2024-09-18 07:57:57,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.83 vs. limit=10.0 2024-09-18 07:58:01,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=401040.0, ans=0.125 2024-09-18 07:58:21,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=401080.0, ans=0.125 2024-09-18 07:58:27,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=401120.0, ans=0.2 2024-09-18 07:58:35,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=401120.0, ans=0.025 2024-09-18 07:58:36,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=401120.0, ans=0.125 2024-09-18 07:58:41,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2024-09-18 07:58:57,482 INFO [train.py:1198] (1/2) Epoch 23, batch 750, loss[loss=0.2478, ctc_loss=0.1275, cr_loss=0.357, attn_decoder_loss=0.2532, over 29709.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.131, cr_loss=0.3734, attn_decoder_loss=0.2477, over 5677598.51 frames. ], batch size: 82, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 07:58:57,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=401200.0, ans=0.04949747468305833 2024-09-18 07:58:59,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=401200.0, ans=0.125 2024-09-18 07:59:15,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=401240.0, ans=0.025 2024-09-18 07:59:15,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=401240.0, ans=0.125 2024-09-18 07:59:29,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=401280.0, ans=0.125 2024-09-18 07:59:35,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401280.0, ans=0.1 2024-09-18 07:59:44,086 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.454e+01 8.911e+01 9.640e+01 3.418e+02, threshold=1.782e+02, percent-clipped=1.0 2024-09-18 07:59:51,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-09-18 08:00:12,907 INFO [train.py:1198] (1/2) Epoch 23, batch 800, loss[loss=0.2213, ctc_loss=0.1106, cr_loss=0.3267, attn_decoder_loss=0.2264, over 29606.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1307, cr_loss=0.3732, attn_decoder_loss=0.2475, over 5707130.50 frames. ], batch size: 73, lr: 4.87e-03, grad_scale: 16.0 2024-09-18 08:00:20,728 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:00:35,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=401440.0, ans=0.05 2024-09-18 08:00:50,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=401480.0, ans=0.125 2024-09-18 08:01:02,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=401520.0, ans=0.09899494936611666 2024-09-18 08:01:07,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=12.0 2024-09-18 08:01:21,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=401560.0, ans=0.0 2024-09-18 08:01:26,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=401560.0, ans=0.2 2024-09-18 08:01:30,977 INFO [train.py:1198] (1/2) Epoch 23, batch 850, loss[loss=0.2586, ctc_loss=0.1442, cr_loss=0.3949, attn_decoder_loss=0.2625, over 29707.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1304, cr_loss=0.373, attn_decoder_loss=0.2471, over 5736013.59 frames. ], batch size: 89, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 08:01:31,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=401600.0, ans=0.0 2024-09-18 08:01:32,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=401600.0, ans=0.125 2024-09-18 08:01:47,509 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:02:18,870 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.350e+01 8.947e+01 9.398e+01 1.136e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-18 08:02:22,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=401720.0, ans=0.125 2024-09-18 08:02:26,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=401720.0, ans=0.125 2024-09-18 08:02:30,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=401760.0, ans=0.0 2024-09-18 08:02:31,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=401760.0, ans=0.0 2024-09-18 08:02:39,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-09-18 08:02:46,367 INFO [train.py:1198] (1/2) Epoch 23, batch 900, loss[loss=0.2171, ctc_loss=0.1061, cr_loss=0.3153, attn_decoder_loss=0.2224, over 29598.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1306, cr_loss=0.3727, attn_decoder_loss=0.2472, over 5740795.32 frames. ], batch size: 73, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 08:02:48,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=401800.0, ans=0.125 2024-09-18 08:03:09,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=401840.0, ans=0.0 2024-09-18 08:03:11,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=401840.0, ans=0.2 2024-09-18 08:03:17,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=401880.0, ans=0.125 2024-09-18 08:03:21,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=401880.0, ans=0.0 2024-09-18 08:03:24,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=401880.0, ans=0.125 2024-09-18 08:03:28,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=401880.0, ans=0.0 2024-09-18 08:03:33,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.94 vs. limit=15.0 2024-09-18 08:03:37,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=401920.0, ans=0.125 2024-09-18 08:03:37,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=401920.0, ans=0.125 2024-09-18 08:03:52,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=401960.0, ans=0.2 2024-09-18 08:04:01,328 INFO [train.py:1198] (1/2) Epoch 23, batch 950, loss[loss=0.2328, ctc_loss=0.1266, cr_loss=0.3504, attn_decoder_loss=0.2369, over 29525.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.131, cr_loss=0.3736, attn_decoder_loss=0.2473, over 5743768.49 frames. ], batch size: 74, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 08:04:10,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=402000.0, ans=0.2 2024-09-18 08:04:15,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.82 vs. limit=15.0 2024-09-18 08:04:21,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=402040.0, ans=0.0 2024-09-18 08:04:28,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=402040.0, ans=0.125 2024-09-18 08:04:28,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=402040.0, ans=0.125 2024-09-18 08:04:41,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2024-09-18 08:04:41,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=402080.0, ans=0.0 2024-09-18 08:04:49,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=402120.0, ans=0.125 2024-09-18 08:04:50,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.78 vs. limit=15.0 2024-09-18 08:04:54,185 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.814e+01 9.447e+01 1.062e+02 2.466e+02, threshold=1.889e+02, percent-clipped=1.0 2024-09-18 08:05:02,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=402120.0, ans=0.5 2024-09-18 08:05:11,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.95 vs. limit=22.5 2024-09-18 08:05:21,294 INFO [train.py:1198] (1/2) Epoch 23, batch 1000, loss[loss=0.2382, ctc_loss=0.1261, cr_loss=0.3789, attn_decoder_loss=0.2423, over 29522.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1315, cr_loss=0.3741, attn_decoder_loss=0.2478, over 5737752.55 frames. ], batch size: 77, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:05:26,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=402200.0, ans=0.2 2024-09-18 08:05:28,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.79 vs. limit=15.0 2024-09-18 08:05:33,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=402200.0, ans=0.125 2024-09-18 08:05:35,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=402240.0, ans=0.0 2024-09-18 08:05:39,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2024-09-18 08:05:56,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=402280.0, ans=0.125 2024-09-18 08:06:05,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=402320.0, ans=0.0 2024-09-18 08:06:11,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.16 vs. limit=22.5 2024-09-18 08:06:20,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=402360.0, ans=0.2 2024-09-18 08:06:22,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=402360.0, ans=0.1 2024-09-18 08:06:28,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.50 vs. limit=15.0 2024-09-18 08:06:37,692 INFO [train.py:1198] (1/2) Epoch 23, batch 1050, loss[loss=0.2517, ctc_loss=0.1396, cr_loss=0.3903, attn_decoder_loss=0.2555, over 29657.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.131, cr_loss=0.3733, attn_decoder_loss=0.2471, over 5745761.73 frames. ], batch size: 85, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:06:41,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=402400.0, ans=0.125 2024-09-18 08:06:45,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=402400.0, ans=0.0 2024-09-18 08:06:54,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=402440.0, ans=0.2 2024-09-18 08:06:54,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=402440.0, ans=0.125 2024-09-18 08:07:10,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-09-18 08:07:19,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=402480.0, ans=0.0 2024-09-18 08:07:19,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=402480.0, ans=0.09899494936611666 2024-09-18 08:07:26,531 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 8.303e+01 8.731e+01 9.470e+01 1.420e+02, threshold=1.746e+02, percent-clipped=0.0 2024-09-18 08:07:31,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=402520.0, ans=0.125 2024-09-18 08:07:33,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.43 vs. limit=22.5 2024-09-18 08:07:54,164 INFO [train.py:1198] (1/2) Epoch 23, batch 1100, loss[loss=0.2453, ctc_loss=0.1408, cr_loss=0.3914, attn_decoder_loss=0.2482, over 29431.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1309, cr_loss=0.3732, attn_decoder_loss=0.2472, over 5756729.68 frames. ], batch size: 78, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:08:12,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=402640.0, ans=0.125 2024-09-18 08:08:20,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=402640.0, ans=0.125 2024-09-18 08:08:39,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=402680.0, ans=0.125 2024-09-18 08:08:53,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=402720.0, ans=0.125 2024-09-18 08:08:58,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=402760.0, ans=0.125 2024-09-18 08:09:12,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2024-09-18 08:09:13,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=402800.0, ans=0.125 2024-09-18 08:09:14,481 INFO [train.py:1198] (1/2) Epoch 23, batch 1150, loss[loss=0.2415, ctc_loss=0.1363, cr_loss=0.4074, attn_decoder_loss=0.2441, over 29435.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1312, cr_loss=0.374, attn_decoder_loss=0.2474, over 5753713.33 frames. ], batch size: 78, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:09:16,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402800.0, ans=0.1 2024-09-18 08:09:25,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=402800.0, ans=0.0 2024-09-18 08:09:25,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=402800.0, ans=0.2 2024-09-18 08:09:27,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=402800.0, ans=0.025 2024-09-18 08:10:03,213 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.564e+01 9.109e+01 9.682e+01 1.953e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 08:10:04,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=402920.0, ans=0.025 2024-09-18 08:10:05,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.21 vs. limit=15.0 2024-09-18 08:10:06,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=402920.0, ans=0.0 2024-09-18 08:10:21,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=402960.0, ans=0.0 2024-09-18 08:10:29,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.58 vs. limit=15.0 2024-09-18 08:10:30,399 INFO [train.py:1198] (1/2) Epoch 23, batch 1200, loss[loss=0.2505, ctc_loss=0.1362, cr_loss=0.3901, attn_decoder_loss=0.2545, over 29682.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1316, cr_loss=0.3748, attn_decoder_loss=0.2483, over 5745758.89 frames. ], batch size: 85, lr: 4.86e-03, grad_scale: 16.0 2024-09-18 08:10:53,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=403040.0, ans=0.0 2024-09-18 08:11:01,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=403080.0, ans=0.0 2024-09-18 08:11:05,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403080.0, ans=0.1 2024-09-18 08:11:13,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=403080.0, ans=0.0 2024-09-18 08:11:28,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=403120.0, ans=0.0 2024-09-18 08:11:36,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=15.0 2024-09-18 08:11:37,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-09-18 08:11:46,529 INFO [train.py:1198] (1/2) Epoch 23, batch 1250, loss[loss=0.2564, ctc_loss=0.1413, cr_loss=0.3975, attn_decoder_loss=0.2604, over 29526.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1318, cr_loss=0.3754, attn_decoder_loss=0.2486, over 5773666.10 frames. ], batch size: 92, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:12:34,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=403320.0, ans=0.0 2024-09-18 08:12:34,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=403320.0, ans=0.0 2024-09-18 08:12:39,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.243e+01 8.772e+01 9.696e+01 1.858e+02, threshold=1.754e+02, percent-clipped=1.0 2024-09-18 08:12:44,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.11 vs. limit=15.0 2024-09-18 08:12:47,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403320.0, ans=0.1 2024-09-18 08:13:04,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=403360.0, ans=0.2 2024-09-18 08:13:04,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2024-09-18 08:13:06,947 INFO [train.py:1198] (1/2) Epoch 23, batch 1300, loss[loss=0.2548, ctc_loss=0.1241, cr_loss=0.3611, attn_decoder_loss=0.2613, over 28241.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1313, cr_loss=0.3747, attn_decoder_loss=0.2481, over 5778322.18 frames. ], batch size: 111, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:13:07,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=403400.0, ans=0.07 2024-09-18 08:13:19,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=403400.0, ans=0.125 2024-09-18 08:13:27,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=403440.0, ans=0.025 2024-09-18 08:13:34,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=403440.0, ans=0.125 2024-09-18 08:14:22,794 INFO [train.py:1198] (1/2) Epoch 23, batch 1350, loss[loss=0.24, ctc_loss=0.128, cr_loss=0.3665, attn_decoder_loss=0.2443, over 29745.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1309, cr_loss=0.3736, attn_decoder_loss=0.2476, over 5796573.62 frames. ], batch size: 81, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:14:26,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=403600.0, ans=0.125 2024-09-18 08:14:28,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-18 08:14:34,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.32 vs. limit=15.0 2024-09-18 08:14:55,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=403680.0, ans=0.025 2024-09-18 08:15:12,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.370e+01 8.788e+01 9.254e+01 1.206e+02, threshold=1.758e+02, percent-clipped=0.0 2024-09-18 08:15:26,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=403760.0, ans=0.0 2024-09-18 08:15:37,899 INFO [train.py:1198] (1/2) Epoch 23, batch 1400, loss[loss=0.2179, ctc_loss=0.1189, cr_loss=0.3417, attn_decoder_loss=0.2213, over 29541.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.131, cr_loss=0.3741, attn_decoder_loss=0.2477, over 5807581.31 frames. ], batch size: 69, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:15:42,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=403800.0, ans=0.125 2024-09-18 08:15:47,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.30 vs. limit=22.5 2024-09-18 08:15:48,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403800.0, ans=0.1 2024-09-18 08:16:03,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=403840.0, ans=0.0 2024-09-18 08:16:12,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.77 vs. limit=15.0 2024-09-18 08:16:22,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=403880.0, ans=0.07 2024-09-18 08:16:25,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=403920.0, ans=0.1 2024-09-18 08:16:33,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403920.0, ans=0.1 2024-09-18 08:16:44,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2024-09-18 08:16:58,235 INFO [train.py:1198] (1/2) Epoch 23, batch 1450, loss[loss=0.2526, ctc_loss=0.1353, cr_loss=0.3863, attn_decoder_loss=0.2571, over 29446.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1313, cr_loss=0.3748, attn_decoder_loss=0.2483, over 5805456.39 frames. ], batch size: 94, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:17:23,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-09-18 08:17:46,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=404120.0, ans=0.125 2024-09-18 08:17:47,799 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.721e+01 9.213e+01 9.736e+01 2.438e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-18 08:17:55,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=404120.0, ans=0.125 2024-09-18 08:17:55,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=404120.0, ans=0.125 2024-09-18 08:18:01,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=404160.0, ans=0.09899494936611666 2024-09-18 08:18:13,662 INFO [train.py:1198] (1/2) Epoch 23, batch 1500, loss[loss=0.2582, ctc_loss=0.1391, cr_loss=0.3917, attn_decoder_loss=0.2627, over 29649.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1318, cr_loss=0.3752, attn_decoder_loss=0.2488, over 5806293.25 frames. ], batch size: 86, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:18:26,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=404200.0, ans=0.125 2024-09-18 08:18:30,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=404240.0, ans=0.125 2024-09-18 08:18:44,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=404280.0, ans=0.125 2024-09-18 08:19:07,449 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:19:08,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=404320.0, ans=0.125 2024-09-18 08:19:18,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.85 vs. limit=15.0 2024-09-18 08:19:24,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-09-18 08:19:29,873 INFO [train.py:1198] (1/2) Epoch 23, batch 1550, loss[loss=0.2643, ctc_loss=0.156, cr_loss=0.437, attn_decoder_loss=0.2666, over 29485.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1326, cr_loss=0.3767, attn_decoder_loss=0.249, over 5781024.97 frames. ], batch size: 90, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:19:53,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=404440.0, ans=0.125 2024-09-18 08:20:14,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=404480.0, ans=0.125 2024-09-18 08:20:17,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404520.0, ans=0.1 2024-09-18 08:20:21,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2024-09-18 08:20:22,179 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.646e+01 9.169e+01 9.938e+01 3.341e+02, threshold=1.834e+02, percent-clipped=2.0 2024-09-18 08:20:41,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=404560.0, ans=0.0 2024-09-18 08:20:50,183 INFO [train.py:1198] (1/2) Epoch 23, batch 1600, loss[loss=0.2579, ctc_loss=0.1396, cr_loss=0.3932, attn_decoder_loss=0.2623, over 29682.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1326, cr_loss=0.3764, attn_decoder_loss=0.2487, over 5764184.22 frames. ], batch size: 85, lr: 4.85e-03, grad_scale: 16.0 2024-09-18 08:20:51,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-18 08:21:03,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=404640.0, ans=0.0 2024-09-18 08:21:11,527 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:21:31,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=404680.0, ans=0.2 2024-09-18 08:21:35,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=404720.0, ans=0.2 2024-09-18 08:21:57,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.77 vs. limit=15.0 2024-09-18 08:22:06,228 INFO [train.py:1198] (1/2) Epoch 23, batch 1650, loss[loss=0.2479, ctc_loss=0.1275, cr_loss=0.3805, attn_decoder_loss=0.2529, over 29691.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1319, cr_loss=0.3753, attn_decoder_loss=0.2483, over 5758280.20 frames. ], batch size: 89, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:22:08,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=404800.0, ans=0.0 2024-09-18 08:22:20,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=404840.0, ans=0.0 2024-09-18 08:22:34,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=404840.0, ans=0.125 2024-09-18 08:22:58,240 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.577e+01 9.100e+01 9.886e+01 2.579e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-18 08:23:15,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=404960.0, ans=0.125 2024-09-18 08:23:17,044 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.56 vs. limit=10.0 2024-09-18 08:23:22,332 INFO [train.py:1198] (1/2) Epoch 23, batch 1700, loss[loss=0.2157, ctc_loss=0.1052, cr_loss=0.3225, attn_decoder_loss=0.2208, over 29527.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1314, cr_loss=0.3748, attn_decoder_loss=0.2479, over 5779587.89 frames. ], batch size: 69, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:23:22,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=12.0 2024-09-18 08:23:30,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=405000.0, ans=0.0 2024-09-18 08:23:41,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.66 vs. limit=5.0 2024-09-18 08:23:45,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=405040.0, ans=0.125 2024-09-18 08:24:01,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=405080.0, ans=0.025 2024-09-18 08:24:06,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2024-09-18 08:24:07,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=12.0 2024-09-18 08:24:42,501 INFO [train.py:1198] (1/2) Epoch 23, batch 1750, loss[loss=0.2139, ctc_loss=0.1138, cr_loss=0.3486, attn_decoder_loss=0.2173, over 29330.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1305, cr_loss=0.3728, attn_decoder_loss=0.2472, over 5787786.28 frames. ], batch size: 67, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:24:59,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=405240.0, ans=0.125 2024-09-18 08:25:01,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=405240.0, ans=0.1 2024-09-18 08:25:10,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=405240.0, ans=0.2 2024-09-18 08:25:11,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=405280.0, ans=0.0 2024-09-18 08:25:13,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=405280.0, ans=0.125 2024-09-18 08:25:18,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-18 08:25:33,641 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.508e+01 9.190e+01 9.615e+01 2.377e+02, threshold=1.838e+02, percent-clipped=1.0 2024-09-18 08:25:40,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=405320.0, ans=0.125 2024-09-18 08:25:54,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=405360.0, ans=0.125 2024-09-18 08:25:57,527 INFO [train.py:1198] (1/2) Epoch 23, batch 1800, loss[loss=0.2565, ctc_loss=0.1375, cr_loss=0.3899, attn_decoder_loss=0.261, over 29689.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1309, cr_loss=0.3737, attn_decoder_loss=0.2476, over 5791248.52 frames. ], batch size: 83, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:26:02,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=405400.0, ans=0.2 2024-09-18 08:26:21,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.30 vs. limit=10.0 2024-09-18 08:26:27,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=12.0 2024-09-18 08:26:58,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=405560.0, ans=0.2 2024-09-18 08:27:13,786 INFO [train.py:1198] (1/2) Epoch 23, batch 1850, loss[loss=0.2533, ctc_loss=0.1345, cr_loss=0.3789, attn_decoder_loss=0.258, over 29635.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1311, cr_loss=0.3744, attn_decoder_loss=0.2477, over 5797312.37 frames. ], batch size: 86, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:27:41,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=405640.0, ans=0.125 2024-09-18 08:27:42,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=405680.0, ans=0.125 2024-09-18 08:27:49,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=405680.0, ans=0.0 2024-09-18 08:28:07,768 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.499e+01 9.020e+01 9.564e+01 1.401e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-18 08:28:17,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2024-09-18 08:28:31,644 INFO [train.py:1198] (1/2) Epoch 23, batch 1900, loss[loss=0.2441, ctc_loss=0.1257, cr_loss=0.3518, attn_decoder_loss=0.2494, over 29706.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1316, cr_loss=0.3754, attn_decoder_loss=0.2482, over 5804990.08 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:28:46,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=405800.0, ans=0.125 2024-09-18 08:28:58,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=405840.0, ans=0.025 2024-09-18 08:29:08,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=405880.0, ans=0.125 2024-09-18 08:29:17,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=405880.0, ans=0.125 2024-09-18 08:29:27,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=405920.0, ans=0.2 2024-09-18 08:29:35,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405960.0, ans=0.1 2024-09-18 08:29:41,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=405960.0, ans=0.0 2024-09-18 08:29:50,195 INFO [train.py:1198] (1/2) Epoch 23, batch 1950, loss[loss=0.2425, ctc_loss=0.1328, cr_loss=0.3805, attn_decoder_loss=0.2463, over 29456.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1328, cr_loss=0.378, attn_decoder_loss=0.2496, over 5819533.60 frames. ], batch size: 78, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:29:52,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.16 vs. limit=22.5 2024-09-18 08:29:56,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=406000.0, ans=0.0 2024-09-18 08:30:05,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=406040.0, ans=0.125 2024-09-18 08:30:14,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=406040.0, ans=0.125 2024-09-18 08:30:41,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.634e+01 9.173e+01 9.833e+01 1.215e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-18 08:30:46,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=406120.0, ans=10.0 2024-09-18 08:31:01,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=406160.0, ans=0.125 2024-09-18 08:31:05,553 INFO [train.py:1198] (1/2) Epoch 23, batch 2000, loss[loss=0.2216, ctc_loss=0.1116, cr_loss=0.3403, attn_decoder_loss=0.2263, over 29348.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1331, cr_loss=0.378, attn_decoder_loss=0.25, over 5797773.57 frames. ], batch size: 67, lr: 4.84e-03, grad_scale: 16.0 2024-09-18 08:31:07,512 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:31:18,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.89 vs. limit=15.0 2024-09-18 08:31:36,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-09-18 08:31:41,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=406280.0, ans=0.125 2024-09-18 08:31:51,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=406320.0, ans=0.0 2024-09-18 08:31:51,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=406320.0, ans=0.125 2024-09-18 08:31:58,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.55 vs. limit=22.5 2024-09-18 08:32:22,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-09-18 08:32:23,713 INFO [train.py:1198] (1/2) Epoch 23, batch 2050, loss[loss=0.2202, ctc_loss=0.1175, cr_loss=0.3499, attn_decoder_loss=0.2239, over 29456.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1323, cr_loss=0.3753, attn_decoder_loss=0.2487, over 5789289.86 frames. ], batch size: 70, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:33:05,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.71 vs. limit=22.5 2024-09-18 08:33:07,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=406480.0, ans=0.0 2024-09-18 08:33:07,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-09-18 08:33:16,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=406520.0, ans=0.025 2024-09-18 08:33:18,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.483e+01 9.027e+01 9.590e+01 1.679e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-18 08:33:25,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2024-09-18 08:33:30,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-09-18 08:33:34,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=406560.0, ans=0.125 2024-09-18 08:33:41,484 INFO [train.py:1198] (1/2) Epoch 23, batch 2100, loss[loss=0.2415, ctc_loss=0.1282, cr_loss=0.3701, attn_decoder_loss=0.2458, over 29758.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1315, cr_loss=0.3743, attn_decoder_loss=0.2481, over 5799938.31 frames. ], batch size: 81, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:34:04,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=406640.0, ans=0.1 2024-09-18 08:34:13,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.55 vs. limit=22.5 2024-09-18 08:34:21,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=406680.0, ans=0.0 2024-09-18 08:34:25,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=406720.0, ans=0.125 2024-09-18 08:34:31,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=406720.0, ans=0.025 2024-09-18 08:34:46,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=406760.0, ans=0.125 2024-09-18 08:34:56,605 INFO [train.py:1198] (1/2) Epoch 23, batch 2150, loss[loss=0.2402, ctc_loss=0.1286, cr_loss=0.3725, attn_decoder_loss=0.2443, over 29443.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.131, cr_loss=0.3745, attn_decoder_loss=0.2477, over 5814850.91 frames. ], batch size: 78, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:35:42,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=406920.0, ans=0.125 2024-09-18 08:35:51,675 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.476e+01 8.832e+01 9.481e+01 1.697e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-18 08:35:56,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=406920.0, ans=0.025 2024-09-18 08:36:14,436 INFO [train.py:1198] (1/2) Epoch 23, batch 2200, loss[loss=0.2643, ctc_loss=0.1395, cr_loss=0.3926, attn_decoder_loss=0.2695, over 29643.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1312, cr_loss=0.3742, attn_decoder_loss=0.2479, over 5811756.58 frames. ], batch size: 86, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:36:28,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=407040.0, ans=0.125 2024-09-18 08:36:36,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=407040.0, ans=0.0 2024-09-18 08:36:50,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=407080.0, ans=0.125 2024-09-18 08:36:53,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=407080.0, ans=0.025 2024-09-18 08:37:06,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=407120.0, ans=0.1 2024-09-18 08:37:19,094 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:37:19,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2024-09-18 08:37:23,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=407160.0, ans=0.1 2024-09-18 08:37:32,955 INFO [train.py:1198] (1/2) Epoch 23, batch 2250, loss[loss=0.2479, ctc_loss=0.1269, cr_loss=0.3696, attn_decoder_loss=0.2532, over 29700.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.131, cr_loss=0.3736, attn_decoder_loss=0.2476, over 5811472.41 frames. ], batch size: 82, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:37:46,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=407240.0, ans=0.0 2024-09-18 08:37:51,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-18 08:38:06,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=407280.0, ans=0.025 2024-09-18 08:38:25,776 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.566e+01 9.041e+01 9.811e+01 1.660e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-18 08:38:36,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=407360.0, ans=0.125 2024-09-18 08:38:44,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=407360.0, ans=0.0 2024-09-18 08:38:48,507 INFO [train.py:1198] (1/2) Epoch 23, batch 2300, loss[loss=0.2149, ctc_loss=0.1101, cr_loss=0.3314, attn_decoder_loss=0.2192, over 29731.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1301, cr_loss=0.3717, attn_decoder_loss=0.2465, over 5799651.58 frames. ], batch size: 72, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:39:02,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=407440.0, ans=0.1 2024-09-18 08:39:03,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=407440.0, ans=0.0 2024-09-18 08:39:26,392 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:39:30,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=407480.0, ans=0.2 2024-09-18 08:39:50,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2024-09-18 08:40:06,181 INFO [train.py:1198] (1/2) Epoch 23, batch 2350, loss[loss=0.2663, ctc_loss=0.1469, cr_loss=0.4119, attn_decoder_loss=0.2704, over 29699.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1303, cr_loss=0.3721, attn_decoder_loss=0.2467, over 5805143.51 frames. ], batch size: 83, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:40:23,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.24 vs. limit=22.5 2024-09-18 08:40:30,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=407640.0, ans=0.125 2024-09-18 08:40:57,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=407720.0, ans=0.035 2024-09-18 08:40:57,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=407720.0, ans=0.125 2024-09-18 08:41:01,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.649e+01 9.243e+01 9.923e+01 8.680e+02, threshold=1.849e+02, percent-clipped=2.0 2024-09-18 08:41:12,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=407760.0, ans=0.0 2024-09-18 08:41:23,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.44 vs. limit=12.0 2024-09-18 08:41:24,566 INFO [train.py:1198] (1/2) Epoch 23, batch 2400, loss[loss=0.2351, ctc_loss=0.1225, cr_loss=0.35, attn_decoder_loss=0.2398, over 29534.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1302, cr_loss=0.3719, attn_decoder_loss=0.247, over 5807838.22 frames. ], batch size: 76, lr: 4.83e-03, grad_scale: 16.0 2024-09-18 08:41:27,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=15.0 2024-09-18 08:41:33,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=407800.0, ans=0.125 2024-09-18 08:41:36,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=407800.0, ans=0.2 2024-09-18 08:41:54,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=407880.0, ans=0.125 2024-09-18 08:42:31,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=407960.0, ans=0.2 2024-09-18 08:42:31,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.96 vs. limit=22.5 2024-09-18 08:42:40,653 INFO [train.py:1198] (1/2) Epoch 23, batch 2450, loss[loss=0.2438, ctc_loss=0.1303, cr_loss=0.3649, attn_decoder_loss=0.2483, over 29729.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1309, cr_loss=0.3729, attn_decoder_loss=0.2479, over 5784720.56 frames. ], batch size: 82, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:42:55,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2024-09-18 08:43:33,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2024-09-18 08:43:37,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.992e+01 9.709e+01 1.062e+02 3.982e+02, threshold=1.942e+02, percent-clipped=1.0 2024-09-18 08:43:53,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.69 vs. limit=15.0 2024-09-18 08:43:58,466 INFO [train.py:1198] (1/2) Epoch 23, batch 2500, loss[loss=0.2624, ctc_loss=0.1407, cr_loss=0.4168, attn_decoder_loss=0.2667, over 29609.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1306, cr_loss=0.3729, attn_decoder_loss=0.2477, over 5796263.33 frames. ], batch size: 86, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:44:07,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=408200.0, ans=0.125 2024-09-18 08:44:22,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.83 vs. limit=15.0 2024-09-18 08:44:24,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=408240.0, ans=0.025 2024-09-18 08:45:04,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=408360.0, ans=0.2 2024-09-18 08:45:08,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.72 vs. limit=10.0 2024-09-18 08:45:16,694 INFO [train.py:1198] (1/2) Epoch 23, batch 2550, loss[loss=0.2196, ctc_loss=0.1188, cr_loss=0.3627, attn_decoder_loss=0.2228, over 29354.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.131, cr_loss=0.3738, attn_decoder_loss=0.2478, over 5798919.59 frames. ], batch size: 67, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:45:19,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2024-09-18 08:45:41,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2024-09-18 08:45:47,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=408480.0, ans=0.0 2024-09-18 08:46:02,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=12.0 2024-09-18 08:46:05,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=408520.0, ans=0.125 2024-09-18 08:46:11,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.424e+01 8.872e+01 9.650e+01 4.846e+02, threshold=1.774e+02, percent-clipped=2.0 2024-09-18 08:46:11,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=408520.0, ans=0.025 2024-09-18 08:46:14,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=408520.0, ans=0.125 2024-09-18 08:46:14,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-18 08:46:25,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=408560.0, ans=0.125 2024-09-18 08:46:31,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=408600.0, ans=0.2 2024-09-18 08:46:32,466 INFO [train.py:1198] (1/2) Epoch 23, batch 2600, loss[loss=0.2429, ctc_loss=0.1397, cr_loss=0.3856, attn_decoder_loss=0.2458, over 29445.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1315, cr_loss=0.3744, attn_decoder_loss=0.2481, over 5795250.18 frames. ], batch size: 78, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:46:41,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=408600.0, ans=10.0 2024-09-18 08:46:43,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=408600.0, ans=0.125 2024-09-18 08:47:07,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=408680.0, ans=0.05 2024-09-18 08:47:23,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=408720.0, ans=0.125 2024-09-18 08:47:23,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=408720.0, ans=0.125 2024-09-18 08:47:27,591 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:47:42,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=408760.0, ans=0.025 2024-09-18 08:47:46,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=15.0 2024-09-18 08:47:50,072 INFO [train.py:1198] (1/2) Epoch 23, batch 2650, loss[loss=0.2557, ctc_loss=0.1391, cr_loss=0.3814, attn_decoder_loss=0.2602, over 29264.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1313, cr_loss=0.3744, attn_decoder_loss=0.2483, over 5801781.60 frames. ], batch size: 100, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:47:51,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=408800.0, ans=0.025 2024-09-18 08:47:53,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=408800.0, ans=0.2 2024-09-18 08:48:23,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=408880.0, ans=0.2 2024-09-18 08:48:33,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=408880.0, ans=0.125 2024-09-18 08:48:43,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=408920.0, ans=0.025 2024-09-18 08:48:46,443 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.322e+01 8.846e+01 9.392e+01 1.397e+02, threshold=1.769e+02, percent-clipped=0.0 2024-09-18 08:48:47,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-09-18 08:48:54,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=408960.0, ans=0.0 2024-09-18 08:48:59,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.52 vs. limit=22.5 2024-09-18 08:49:01,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=408960.0, ans=0.125 2024-09-18 08:49:07,644 INFO [train.py:1198] (1/2) Epoch 23, batch 2700, loss[loss=0.2503, ctc_loss=0.1251, cr_loss=0.3547, attn_decoder_loss=0.2563, over 29542.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1314, cr_loss=0.3745, attn_decoder_loss=0.2485, over 5795156.11 frames. ], batch size: 87, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:49:16,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=409000.0, ans=0.0 2024-09-18 08:49:17,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.12 vs. limit=22.5 2024-09-18 08:49:21,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.58 vs. limit=6.0 2024-09-18 08:49:38,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.53 vs. limit=15.0 2024-09-18 08:49:39,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=409080.0, ans=0.2 2024-09-18 08:50:00,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=409120.0, ans=0.0 2024-09-18 08:50:23,533 INFO [train.py:1198] (1/2) Epoch 23, batch 2750, loss[loss=0.2324, ctc_loss=0.1188, cr_loss=0.3378, attn_decoder_loss=0.2375, over 29510.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1303, cr_loss=0.3721, attn_decoder_loss=0.2473, over 5793699.69 frames. ], batch size: 75, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:50:40,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=409240.0, ans=0.5 2024-09-18 08:50:58,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=409280.0, ans=0.0 2024-09-18 08:51:08,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=409280.0, ans=0.125 2024-09-18 08:51:20,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.421e+01 8.872e+01 9.349e+01 6.581e+02, threshold=1.774e+02, percent-clipped=3.0 2024-09-18 08:51:41,683 INFO [train.py:1198] (1/2) Epoch 23, batch 2800, loss[loss=0.2655, ctc_loss=0.1673, cr_loss=0.4009, attn_decoder_loss=0.2676, over 19345.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1308, cr_loss=0.3722, attn_decoder_loss=0.2475, over 5774242.90 frames. ], batch size: 210, lr: 4.82e-03, grad_scale: 16.0 2024-09-18 08:51:49,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=409400.0, ans=10.0 2024-09-18 08:51:49,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=409400.0, ans=0.0 2024-09-18 08:52:06,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=409440.0, ans=0.0 2024-09-18 08:52:10,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=409480.0, ans=0.125 2024-09-18 08:52:13,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=409480.0, ans=0.1 2024-09-18 08:52:27,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=409520.0, ans=0.125 2024-09-18 08:52:35,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=409520.0, ans=0.2 2024-09-18 08:52:51,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=409560.0, ans=0.125 2024-09-18 08:52:59,484 INFO [train.py:1198] (1/2) Epoch 23, batch 2850, loss[loss=0.2445, ctc_loss=0.1285, cr_loss=0.3894, attn_decoder_loss=0.2487, over 29501.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1316, cr_loss=0.3736, attn_decoder_loss=0.2481, over 5761207.15 frames. ], batch size: 77, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:53:18,131 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:53:21,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=409640.0, ans=0.125 2024-09-18 08:53:55,293 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.592e+01 9.017e+01 9.666e+01 1.557e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-18 08:54:15,009 INFO [train.py:1198] (1/2) Epoch 23, batch 2900, loss[loss=0.2497, ctc_loss=0.1391, cr_loss=0.3943, attn_decoder_loss=0.2533, over 29429.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1322, cr_loss=0.3757, attn_decoder_loss=0.2491, over 5786642.11 frames. ], batch size: 79, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:54:27,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=409800.0, ans=0.2 2024-09-18 08:54:30,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=409840.0, ans=0.05 2024-09-18 08:54:36,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=409840.0, ans=0.2 2024-09-18 08:54:48,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-09-18 08:55:03,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2024-09-18 08:55:20,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=409960.0, ans=0.125 2024-09-18 08:55:33,240 INFO [train.py:1198] (1/2) Epoch 23, batch 2950, loss[loss=0.2287, ctc_loss=0.1186, cr_loss=0.3543, attn_decoder_loss=0.233, over 29522.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1318, cr_loss=0.3744, attn_decoder_loss=0.2483, over 5781914.27 frames. ], batch size: 75, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:55:36,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=410000.0, ans=0.0 2024-09-18 08:55:50,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=410040.0, ans=0.125 2024-09-18 08:56:00,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=410040.0, ans=0.125 2024-09-18 08:56:06,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=410080.0, ans=0.2 2024-09-18 08:56:08,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=410080.0, ans=0.02 2024-09-18 08:56:29,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-09-18 08:56:30,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410120.0, ans=0.1 2024-09-18 08:56:31,675 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.552e+01 9.299e+01 9.927e+01 2.795e+02, threshold=1.860e+02, percent-clipped=1.0 2024-09-18 08:56:51,630 INFO [train.py:1198] (1/2) Epoch 23, batch 3000, loss[loss=0.248, ctc_loss=0.1336, cr_loss=0.3793, attn_decoder_loss=0.2522, over 29764.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1317, cr_loss=0.3745, attn_decoder_loss=0.248, over 5783600.97 frames. ], batch size: 81, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:56:51,631 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 08:57:10,060 INFO [train.py:1230] (1/2) Epoch 23, validation: loss=0.2116, ctc_loss=0.03932, cr_loss=5.516e-15, attn_decoder_loss=0.2308, over 944034.00 frames. 2024-09-18 08:57:10,060 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 08:57:10,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=410200.0, ans=0.07 2024-09-18 08:57:24,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=410240.0, ans=0.025 2024-09-18 08:57:29,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=410240.0, ans=0.025 2024-09-18 08:57:30,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=410240.0, ans=0.125 2024-09-18 08:57:39,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten.whitening_limit, batch_count=410280.0, ans=15.0 2024-09-18 08:57:40,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=410280.0, ans=0.0 2024-09-18 08:58:26,565 INFO [train.py:1198] (1/2) Epoch 23, batch 3050, loss[loss=0.2308, ctc_loss=0.1221, cr_loss=0.3631, attn_decoder_loss=0.2349, over 29540.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1323, cr_loss=0.3755, attn_decoder_loss=0.2489, over 5776450.77 frames. ], batch size: 76, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:58:49,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=410440.0, ans=0.0 2024-09-18 08:58:58,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=410480.0, ans=0.125 2024-09-18 08:59:02,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=410480.0, ans=0.0 2024-09-18 08:59:08,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=410480.0, ans=0.0 2024-09-18 08:59:20,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=410520.0, ans=0.125 2024-09-18 08:59:24,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.808e+01 9.332e+01 1.013e+02 4.220e+02, threshold=1.866e+02, percent-clipped=2.0 2024-09-18 08:59:33,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=410560.0, ans=15.0 2024-09-18 08:59:38,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=410560.0, ans=0.125 2024-09-18 08:59:44,268 INFO [train.py:1198] (1/2) Epoch 23, batch 3100, loss[loss=0.2586, ctc_loss=0.1367, cr_loss=0.3715, attn_decoder_loss=0.2639, over 29250.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1317, cr_loss=0.3741, attn_decoder_loss=0.2484, over 5776925.08 frames. ], batch size: 100, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 08:59:47,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=410600.0, ans=0.0 2024-09-18 08:59:59,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=410640.0, ans=0.025 2024-09-18 09:00:01,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410640.0, ans=0.1 2024-09-18 09:00:09,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=410640.0, ans=0.125 2024-09-18 09:00:51,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410760.0, ans=0.1 2024-09-18 09:01:02,038 INFO [train.py:1198] (1/2) Epoch 23, batch 3150, loss[loss=0.2551, ctc_loss=0.1359, cr_loss=0.3831, attn_decoder_loss=0.2599, over 28898.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1317, cr_loss=0.3745, attn_decoder_loss=0.2484, over 5783974.54 frames. ], batch size: 104, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:01:19,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.43 vs. limit=15.0 2024-09-18 09:01:22,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=410840.0, ans=0.5 2024-09-18 09:01:31,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=15.0 2024-09-18 09:01:50,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=410920.0, ans=0.0 2024-09-18 09:01:56,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=410920.0, ans=0.125 2024-09-18 09:01:57,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.637e+01 9.168e+01 9.786e+01 2.272e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-18 09:02:10,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=12.0 2024-09-18 09:02:13,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410960.0, ans=0.1 2024-09-18 09:02:17,390 INFO [train.py:1198] (1/2) Epoch 23, batch 3200, loss[loss=0.2593, ctc_loss=0.1467, cr_loss=0.4305, attn_decoder_loss=0.2622, over 29422.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1313, cr_loss=0.3743, attn_decoder_loss=0.248, over 5793671.28 frames. ], batch size: 79, lr: 4.81e-03, grad_scale: 16.0 2024-09-18 09:02:33,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.74 vs. limit=22.5 2024-09-18 09:02:54,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2024-09-18 09:03:08,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=411120.0, ans=0.125 2024-09-18 09:03:20,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=411160.0, ans=0.125 2024-09-18 09:03:20,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=411160.0, ans=0.2 2024-09-18 09:03:22,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=411160.0, ans=0.2 2024-09-18 09:03:35,718 INFO [train.py:1198] (1/2) Epoch 23, batch 3250, loss[loss=0.2538, ctc_loss=0.1364, cr_loss=0.4052, attn_decoder_loss=0.2578, over 29696.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1318, cr_loss=0.3756, attn_decoder_loss=0.2485, over 5801720.39 frames. ], batch size: 84, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:03:57,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=411240.0, ans=0.0 2024-09-18 09:04:26,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=411320.0, ans=0.1 2024-09-18 09:04:34,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.655e+01 9.272e+01 9.823e+01 1.322e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-18 09:04:35,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=411320.0, ans=0.125 2024-09-18 09:04:53,307 INFO [train.py:1198] (1/2) Epoch 23, batch 3300, loss[loss=0.2495, ctc_loss=0.1352, cr_loss=0.3721, attn_decoder_loss=0.2539, over 28110.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1314, cr_loss=0.3745, attn_decoder_loss=0.2474, over 5799028.38 frames. ], batch size: 111, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:04:58,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-18 09:05:26,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=411480.0, ans=0.0 2024-09-18 09:05:26,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=411480.0, ans=0.125 2024-09-18 09:05:55,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=411560.0, ans=0.0 2024-09-18 09:06:05,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=22.5 2024-09-18 09:06:08,660 INFO [train.py:1198] (1/2) Epoch 23, batch 3350, loss[loss=0.2531, ctc_loss=0.1345, cr_loss=0.3879, attn_decoder_loss=0.2576, over 28794.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1322, cr_loss=0.376, attn_decoder_loss=0.2482, over 5775318.31 frames. ], batch size: 104, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:06:25,713 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2024-09-18 09:06:39,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-09-18 09:06:40,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=411680.0, ans=0.0 2024-09-18 09:06:43,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2024-09-18 09:06:46,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=411680.0, ans=0.0 2024-09-18 09:06:46,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=411680.0, ans=0.0 2024-09-18 09:06:47,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=411680.0, ans=0.09899494936611666 2024-09-18 09:06:58,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=411720.0, ans=0.125 2024-09-18 09:07:02,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=411720.0, ans=0.125 2024-09-18 09:07:03,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.03 vs. limit=15.0 2024-09-18 09:07:08,459 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.663e+01 9.206e+01 9.789e+01 2.075e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-18 09:07:11,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=411760.0, ans=0.05 2024-09-18 09:07:13,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-09-18 09:07:26,459 INFO [train.py:1198] (1/2) Epoch 23, batch 3400, loss[loss=0.2124, ctc_loss=0.1082, cr_loss=0.3285, attn_decoder_loss=0.2167, over 29346.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.132, cr_loss=0.3752, attn_decoder_loss=0.248, over 5766876.25 frames. ], batch size: 67, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:07:31,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=411800.0, ans=0.0 2024-09-18 09:08:12,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=411920.0, ans=0.0 2024-09-18 09:08:16,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=411920.0, ans=0.1 2024-09-18 09:08:25,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=411920.0, ans=0.035 2024-09-18 09:08:32,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=411960.0, ans=10.0 2024-09-18 09:08:35,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=411960.0, ans=0.5 2024-09-18 09:08:37,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=411960.0, ans=0.125 2024-09-18 09:08:41,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=411960.0, ans=0.025 2024-09-18 09:08:44,749 INFO [train.py:1198] (1/2) Epoch 23, batch 3450, loss[loss=0.2486, ctc_loss=0.1299, cr_loss=0.3834, attn_decoder_loss=0.2533, over 28185.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1321, cr_loss=0.3758, attn_decoder_loss=0.2484, over 5775213.16 frames. ], batch size: 111, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:08:45,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-09-18 09:08:47,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.90 vs. limit=10.0 2024-09-18 09:08:49,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=412000.0, ans=0.0 2024-09-18 09:08:55,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=412000.0, ans=0.0 2024-09-18 09:09:00,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=412040.0, ans=0.125 2024-09-18 09:09:01,771 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:09:01,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=412040.0, ans=0.125 2024-09-18 09:09:03,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.21 vs. limit=10.0 2024-09-18 09:09:05,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2024-09-18 09:09:28,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=412120.0, ans=0.2 2024-09-18 09:09:32,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.18 vs. limit=22.5 2024-09-18 09:09:33,558 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:09:36,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=412120.0, ans=0.2 2024-09-18 09:09:42,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.670e+01 9.056e+01 9.530e+01 1.937e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-18 09:09:48,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-09-18 09:09:54,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=412160.0, ans=0.1 2024-09-18 09:10:02,665 INFO [train.py:1198] (1/2) Epoch 23, batch 3500, loss[loss=0.2218, ctc_loss=0.1135, cr_loss=0.3277, attn_decoder_loss=0.2265, over 29320.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1314, cr_loss=0.3742, attn_decoder_loss=0.2476, over 5778079.98 frames. ], batch size: 71, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:10:07,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=412200.0, ans=0.0 2024-09-18 09:10:15,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=412200.0, ans=0.2 2024-09-18 09:10:16,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412240.0, ans=0.1 2024-09-18 09:10:21,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=412240.0, ans=0.125 2024-09-18 09:10:29,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=412240.0, ans=0.1 2024-09-18 09:10:37,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=412280.0, ans=0.0 2024-09-18 09:10:40,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.02 vs. limit=12.0 2024-09-18 09:10:55,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=412320.0, ans=0.2 2024-09-18 09:11:09,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-18 09:11:17,262 INFO [train.py:1198] (1/2) Epoch 23, batch 3550, loss[loss=0.2463, ctc_loss=0.1224, cr_loss=0.3501, attn_decoder_loss=0.2523, over 29691.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1311, cr_loss=0.374, attn_decoder_loss=0.2476, over 5786707.33 frames. ], batch size: 89, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:11:25,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=12.0 2024-09-18 09:11:29,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-09-18 09:11:35,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=412440.0, ans=0.2 2024-09-18 09:11:53,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=412480.0, ans=10.0 2024-09-18 09:11:54,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=412480.0, ans=0.125 2024-09-18 09:12:11,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=412520.0, ans=0.1 2024-09-18 09:12:14,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 8.498e+01 8.967e+01 9.754e+01 1.546e+02, threshold=1.793e+02, percent-clipped=1.0 2024-09-18 09:12:18,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=412560.0, ans=0.2 2024-09-18 09:12:20,327 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:12:26,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=412560.0, ans=0.025 2024-09-18 09:12:31,834 INFO [train.py:1198] (1/2) Epoch 23, batch 3600, loss[loss=0.2338, ctc_loss=0.124, cr_loss=0.3775, attn_decoder_loss=0.2376, over 29528.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1308, cr_loss=0.373, attn_decoder_loss=0.2475, over 5794801.49 frames. ], batch size: 77, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 09:12:59,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=412640.0, ans=0.1 2024-09-18 09:13:01,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.98 vs. limit=22.5 2024-09-18 09:13:02,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=412680.0, ans=0.125 2024-09-18 09:13:09,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=412680.0, ans=0.125 2024-09-18 09:13:18,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=412720.0, ans=0.125 2024-09-18 09:13:21,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=412720.0, ans=0.125 2024-09-18 09:13:21,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=412720.0, ans=0.125 2024-09-18 09:13:35,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=412760.0, ans=0.2 2024-09-18 09:13:42,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=412760.0, ans=0.5 2024-09-18 09:13:44,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=412760.0, ans=0.125 2024-09-18 09:13:47,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=412800.0, ans=0.0 2024-09-18 09:13:48,737 INFO [train.py:1198] (1/2) Epoch 23, batch 3650, loss[loss=0.2654, ctc_loss=0.1466, cr_loss=0.4131, attn_decoder_loss=0.2694, over 29519.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1302, cr_loss=0.3718, attn_decoder_loss=0.2468, over 5796426.38 frames. ], batch size: 90, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:14:01,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=22.5 2024-09-18 09:14:10,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-09-18 09:14:16,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-09-18 09:14:26,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=412880.0, ans=0.125 2024-09-18 09:14:46,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.375e+01 8.989e+01 9.606e+01 2.045e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-18 09:14:48,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=412960.0, ans=0.125 2024-09-18 09:14:57,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=412960.0, ans=0.0 2024-09-18 09:15:00,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2024-09-18 09:15:02,937 INFO [train.py:1198] (1/2) Epoch 23, batch 3700, loss[loss=0.2466, ctc_loss=0.1378, cr_loss=0.3814, attn_decoder_loss=0.2502, over 29708.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1302, cr_loss=0.3722, attn_decoder_loss=0.2469, over 5805968.35 frames. ], batch size: 84, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:15:33,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=413080.0, ans=0.125 2024-09-18 09:15:50,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.31 vs. limit=15.0 2024-09-18 09:16:19,396 INFO [train.py:1198] (1/2) Epoch 23, batch 3750, loss[loss=0.2159, ctc_loss=0.1121, cr_loss=0.3434, attn_decoder_loss=0.2198, over 29281.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.13, cr_loss=0.3716, attn_decoder_loss=0.2466, over 5809616.55 frames. ], batch size: 67, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:16:43,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=413240.0, ans=0.125 2024-09-18 09:16:55,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.58 vs. limit=15.0 2024-09-18 09:17:03,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.19 vs. limit=22.5 2024-09-18 09:17:17,038 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.618e+01 9.156e+01 9.859e+01 5.134e+02, threshold=1.831e+02, percent-clipped=3.0 2024-09-18 09:17:23,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=413360.0, ans=0.05 2024-09-18 09:17:30,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=413360.0, ans=0.2 2024-09-18 09:17:32,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=413400.0, ans=0.2 2024-09-18 09:17:33,576 INFO [train.py:1198] (1/2) Epoch 23, batch 3800, loss[loss=0.2545, ctc_loss=0.138, cr_loss=0.3806, attn_decoder_loss=0.259, over 29608.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1301, cr_loss=0.3717, attn_decoder_loss=0.2466, over 5799251.96 frames. ], batch size: 86, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:17:35,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=413400.0, ans=0.0 2024-09-18 09:17:45,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=413400.0, ans=0.0 2024-09-18 09:18:11,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=413480.0, ans=0.0 2024-09-18 09:18:26,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=413520.0, ans=0.125 2024-09-18 09:18:33,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413560.0, ans=0.1 2024-09-18 09:18:35,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=413560.0, ans=0.125 2024-09-18 09:18:48,739 INFO [train.py:1198] (1/2) Epoch 23, batch 3850, loss[loss=0.27, ctc_loss=0.1575, cr_loss=0.4341, attn_decoder_loss=0.2729, over 29333.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1304, cr_loss=0.3726, attn_decoder_loss=0.2469, over 5813826.96 frames. ], batch size: 100, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:19:02,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=413600.0, ans=0.0 2024-09-18 09:19:12,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=413640.0, ans=0.125 2024-09-18 09:19:15,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.01 vs. limit=15.0 2024-09-18 09:19:25,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.23 vs. limit=22.5 2024-09-18 09:19:46,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=413720.0, ans=0.1 2024-09-18 09:19:46,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=413720.0, ans=15.0 2024-09-18 09:19:48,639 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.512e+01 9.090e+01 9.629e+01 1.233e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-18 09:20:05,015 INFO [train.py:1198] (1/2) Epoch 23, batch 3900, loss[loss=0.2598, ctc_loss=0.1377, cr_loss=0.3842, attn_decoder_loss=0.2648, over 29625.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1309, cr_loss=0.3735, attn_decoder_loss=0.2475, over 5817715.17 frames. ], batch size: 86, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:20:09,818 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:20:23,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=413840.0, ans=0.125 2024-09-18 09:20:24,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.09 vs. limit=15.0 2024-09-18 09:20:37,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=413880.0, ans=0.0 2024-09-18 09:20:52,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=413920.0, ans=0.1 2024-09-18 09:20:59,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2024-09-18 09:21:03,158 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:21:08,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=413960.0, ans=0.125 2024-09-18 09:21:10,456 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:21:11,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=413960.0, ans=0.025 2024-09-18 09:21:19,075 INFO [train.py:1198] (1/2) Epoch 23, batch 3950, loss[loss=0.254, ctc_loss=0.1373, cr_loss=0.4042, attn_decoder_loss=0.258, over 29432.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1305, cr_loss=0.3737, attn_decoder_loss=0.2476, over 5836790.54 frames. ], batch size: 97, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:21:28,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414000.0, ans=0.1 2024-09-18 09:21:29,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=414000.0, ans=0.125 2024-09-18 09:21:41,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=414040.0, ans=0.04949747468305833 2024-09-18 09:21:53,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414080.0, ans=0.1 2024-09-18 09:21:55,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-09-18 09:21:58,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2024-09-18 09:22:03,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=414120.0, ans=0.125 2024-09-18 09:22:09,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=414120.0, ans=0.2 2024-09-18 09:22:18,242 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.566e+01 9.101e+01 9.931e+01 2.734e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-18 09:22:28,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414160.0, ans=0.1 2024-09-18 09:22:34,490 INFO [train.py:1198] (1/2) Epoch 23, batch 4000, loss[loss=0.2329, ctc_loss=0.1215, cr_loss=0.3526, attn_decoder_loss=0.2375, over 29526.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.131, cr_loss=0.3744, attn_decoder_loss=0.248, over 5813299.79 frames. ], batch size: 74, lr: 4.79e-03, grad_scale: 16.0 2024-09-18 09:22:52,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.77 vs. limit=12.0 2024-09-18 09:22:57,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-09-18 09:23:17,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.86 vs. limit=10.0 2024-09-18 09:23:21,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=414320.0, ans=0.125 2024-09-18 09:23:48,951 INFO [train.py:1198] (1/2) Epoch 23, batch 4050, loss[loss=0.2737, ctc_loss=0.171, cr_loss=0.4251, attn_decoder_loss=0.2756, over 21030.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1309, cr_loss=0.3734, attn_decoder_loss=0.2478, over 5796910.98 frames. ], batch size: 210, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:23:56,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=414400.0, ans=0.0 2024-09-18 09:24:03,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=414440.0, ans=0.125 2024-09-18 09:24:08,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=414440.0, ans=0.1 2024-09-18 09:24:49,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.682e+01 9.236e+01 9.757e+01 1.586e+02, threshold=1.847e+02, percent-clipped=0.0 2024-09-18 09:25:03,925 INFO [train.py:1198] (1/2) Epoch 23, batch 4100, loss[loss=0.2399, ctc_loss=0.1139, cr_loss=0.3376, attn_decoder_loss=0.2464, over 29478.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1306, cr_loss=0.373, attn_decoder_loss=0.2475, over 5792170.81 frames. ], batch size: 90, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:25:08,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=414600.0, ans=0.0 2024-09-18 09:25:14,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2024-09-18 09:26:09,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=414760.0, ans=0.0 2024-09-18 09:26:10,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.30 vs. limit=10.0 2024-09-18 09:26:18,790 INFO [train.py:1198] (1/2) Epoch 23, batch 4150, loss[loss=0.2474, ctc_loss=0.1367, cr_loss=0.3866, attn_decoder_loss=0.2512, over 29523.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1308, cr_loss=0.3738, attn_decoder_loss=0.2474, over 5798836.71 frames. ], batch size: 77, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:26:23,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=414800.0, ans=0.2 2024-09-18 09:26:29,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=414800.0, ans=0.0 2024-09-18 09:26:41,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=414840.0, ans=0.125 2024-09-18 09:27:07,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=414920.0, ans=0.2 2024-09-18 09:27:17,890 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.495e+01 8.961e+01 9.619e+01 1.585e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-18 09:27:32,606 INFO [train.py:1198] (1/2) Epoch 23, batch 4200, loss[loss=0.2564, ctc_loss=0.137, cr_loss=0.3915, attn_decoder_loss=0.261, over 29508.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1314, cr_loss=0.3748, attn_decoder_loss=0.2481, over 5801278.38 frames. ], batch size: 90, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:27:50,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=415040.0, ans=0.125 2024-09-18 09:27:55,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=415040.0, ans=0.07 2024-09-18 09:28:12,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=415080.0, ans=0.0 2024-09-18 09:28:16,991 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:28:21,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=415120.0, ans=0.0 2024-09-18 09:28:27,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=415120.0, ans=0.125 2024-09-18 09:28:33,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=415160.0, ans=0.0 2024-09-18 09:28:37,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=415160.0, ans=0.0 2024-09-18 09:28:48,192 INFO [train.py:1198] (1/2) Epoch 23, batch 4250, loss[loss=0.2291, ctc_loss=0.1163, cr_loss=0.3406, attn_decoder_loss=0.234, over 29500.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.131, cr_loss=0.3741, attn_decoder_loss=0.2481, over 5806020.64 frames. ], batch size: 74, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:28:59,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=415200.0, ans=15.0 2024-09-18 09:29:17,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=415280.0, ans=0.125 2024-09-18 09:29:45,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=415320.0, ans=0.0 2024-09-18 09:29:47,863 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.728e+01 9.274e+01 9.904e+01 2.860e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-18 09:29:50,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=415360.0, ans=0.0 2024-09-18 09:29:52,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=415360.0, ans=0.1 2024-09-18 09:30:00,087 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:30:00,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=415360.0, ans=0.1 2024-09-18 09:30:02,722 INFO [train.py:1198] (1/2) Epoch 23, batch 4300, loss[loss=0.2473, ctc_loss=0.1265, cr_loss=0.379, attn_decoder_loss=0.2523, over 29545.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1307, cr_loss=0.3737, attn_decoder_loss=0.2482, over 5794157.95 frames. ], batch size: 87, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:30:54,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=12.0 2024-09-18 09:31:09,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=415560.0, ans=0.025 2024-09-18 09:31:16,767 INFO [train.py:1198] (1/2) Epoch 23, batch 4350, loss[loss=0.2611, ctc_loss=0.1455, cr_loss=0.4132, attn_decoder_loss=0.2648, over 29491.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1333, cr_loss=0.3788, attn_decoder_loss=0.2515, over 5797165.84 frames. ], batch size: 97, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:31:22,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=415600.0, ans=0.0 2024-09-18 09:31:40,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-18 09:31:54,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=415680.0, ans=0.0 2024-09-18 09:32:16,359 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.901e+01 9.212e+01 9.767e+01 1.363e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-18 09:32:29,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=415800.0, ans=0.125 2024-09-18 09:32:31,047 INFO [train.py:1198] (1/2) Epoch 23, batch 4400, loss[loss=0.2661, ctc_loss=0.1575, cr_loss=0.4321, attn_decoder_loss=0.2686, over 27383.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.135, cr_loss=0.3817, attn_decoder_loss=0.2537, over 5766260.22 frames. ], batch size: 124, lr: 4.78e-03, grad_scale: 16.0 2024-09-18 09:32:37,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=415800.0, ans=0.125 2024-09-18 09:32:40,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=415800.0, ans=0.0 2024-09-18 09:32:50,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=415840.0, ans=0.125 2024-09-18 09:33:16,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2024-09-18 09:33:37,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=415960.0, ans=0.0 2024-09-18 09:33:53,117 INFO [train.py:1198] (1/2) Epoch 23, batch 4450, loss[loss=0.2602, ctc_loss=0.1614, cr_loss=0.398, attn_decoder_loss=0.2624, over 19469.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1392, cr_loss=0.3875, attn_decoder_loss=0.2561, over 5575429.89 frames. ], batch size: 209, lr: 4.78e-03, grad_scale: 16.0 2024-09-18 09:34:16,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=416040.0, ans=0.125 2024-09-18 09:34:19,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=416040.0, ans=0.125 2024-09-18 09:34:30,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416080.0, ans=0.1 2024-09-18 09:34:35,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=416080.0, ans=0.125 2024-09-18 09:34:41,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.66 vs. limit=10.0 2024-09-18 09:34:43,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=416120.0, ans=0.025 2024-09-18 09:34:45,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416120.0, ans=0.1 2024-09-18 09:34:49,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=416120.0, ans=0.125 2024-09-18 09:34:55,166 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.365e+01 9.450e+01 1.070e+02 1.179e+02 4.631e+02, threshold=2.141e+02, percent-clipped=3.0 2024-09-18 09:34:58,753 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:35:09,057 INFO [train.py:1198] (1/2) Epoch 23, batch 4500, loss[loss=0.2627, ctc_loss=0.1641, cr_loss=0.3864, attn_decoder_loss=0.2651, over 19636.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1433, cr_loss=0.3899, attn_decoder_loss=0.2583, over 5237968.89 frames. ], batch size: 209, lr: 4.78e-03, grad_scale: 8.0 2024-09-18 09:35:15,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=416200.0, ans=0.0 2024-09-18 09:35:21,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=416200.0, ans=0.125 2024-09-18 09:35:21,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=416200.0, ans=0.125 2024-09-18 09:36:38,046 INFO [train.py:1198] (1/2) Epoch 24, batch 0, loss[loss=0.2208, ctc_loss=0.1088, cr_loss=0.3344, attn_decoder_loss=0.2259, over 29605.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1088, cr_loss=0.3344, attn_decoder_loss=0.2259, over 29605.00 frames. ], batch size: 73, lr: 4.68e-03, grad_scale: 16.0 2024-09-18 09:36:38,047 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 09:36:58,732 INFO [train.py:1230] (1/2) Epoch 24, validation: loss=0.2127, ctc_loss=0.03777, cr_loss=4.976e-15, attn_decoder_loss=0.2321, over 944034.00 frames. 2024-09-18 09:36:58,732 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 09:37:33,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=416380.0, ans=0.0 2024-09-18 09:37:35,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=416380.0, ans=0.125 2024-09-18 09:37:40,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=416380.0, ans=0.1 2024-09-18 09:38:14,655 INFO [train.py:1198] (1/2) Epoch 24, batch 50, loss[loss=0.2263, ctc_loss=0.118, cr_loss=0.3601, attn_decoder_loss=0.2303, over 29447.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1327, cr_loss=0.3803, attn_decoder_loss=0.2489, over 1269680.50 frames. ], batch size: 70, lr: 4.68e-03, grad_scale: 8.0 2024-09-18 09:38:40,701 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.838e+01 9.694e+01 1.103e+02 3.363e+02, threshold=1.939e+02, percent-clipped=1.0 2024-09-18 09:38:42,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=416540.0, ans=0.0 2024-09-18 09:39:10,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=416620.0, ans=0.05 2024-09-18 09:39:30,898 INFO [train.py:1198] (1/2) Epoch 24, batch 100, loss[loss=0.222, ctc_loss=0.1115, cr_loss=0.3482, attn_decoder_loss=0.2265, over 29520.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1336, cr_loss=0.38, attn_decoder_loss=0.251, over 2253055.54 frames. ], batch size: 76, lr: 4.68e-03, grad_scale: 8.0 2024-09-18 09:39:45,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.75 vs. limit=22.5 2024-09-18 09:39:52,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=416740.0, ans=0.2 2024-09-18 09:40:13,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.32 vs. limit=15.0 2024-09-18 09:40:18,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=416820.0, ans=0.125 2024-09-18 09:40:20,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=416820.0, ans=0.07 2024-09-18 09:40:21,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2024-09-18 09:40:24,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=416820.0, ans=0.0 2024-09-18 09:40:24,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=416820.0, ans=0.0 2024-09-18 09:40:50,687 INFO [train.py:1198] (1/2) Epoch 24, batch 150, loss[loss=0.2127, ctc_loss=0.1042, cr_loss=0.3237, attn_decoder_loss=0.2175, over 29436.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1308, cr_loss=0.376, attn_decoder_loss=0.2482, over 3046595.96 frames. ], batch size: 70, lr: 4.68e-03, grad_scale: 8.0 2024-09-18 09:41:16,575 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.498e+01 9.006e+01 9.810e+01 1.466e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-18 09:41:25,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=416980.0, ans=0.0 2024-09-18 09:41:37,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-09-18 09:42:06,293 INFO [train.py:1198] (1/2) Epoch 24, batch 200, loss[loss=0.2603, ctc_loss=0.1455, cr_loss=0.3956, attn_decoder_loss=0.2643, over 27140.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1307, cr_loss=0.3757, attn_decoder_loss=0.2477, over 3657806.87 frames. ], batch size: 125, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:42:09,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=417100.0, ans=0.125 2024-09-18 09:42:12,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417100.0, ans=0.1 2024-09-18 09:42:30,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-09-18 09:42:36,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2024-09-18 09:43:04,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=417220.0, ans=0.2 2024-09-18 09:43:08,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=417260.0, ans=0.125 2024-09-18 09:43:13,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=417260.0, ans=0.125 2024-09-18 09:43:17,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=417260.0, ans=0.0 2024-09-18 09:43:22,130 INFO [train.py:1198] (1/2) Epoch 24, batch 250, loss[loss=0.2541, ctc_loss=0.1375, cr_loss=0.3851, attn_decoder_loss=0.2585, over 29243.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1298, cr_loss=0.3737, attn_decoder_loss=0.2473, over 4139780.76 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:43:22,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=417300.0, ans=0.07 2024-09-18 09:43:23,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=417300.0, ans=0.025 2024-09-18 09:43:47,872 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.833e+01 9.396e+01 1.002e+02 2.195e+02, threshold=1.879e+02, percent-clipped=2.0 2024-09-18 09:44:11,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=417420.0, ans=0.2 2024-09-18 09:44:16,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=417420.0, ans=0.125 2024-09-18 09:44:20,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.47 vs. limit=22.5 2024-09-18 09:44:26,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=417460.0, ans=0.0 2024-09-18 09:44:36,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2024-09-18 09:44:42,567 INFO [train.py:1198] (1/2) Epoch 24, batch 300, loss[loss=0.2523, ctc_loss=0.1369, cr_loss=0.3738, attn_decoder_loss=0.2568, over 29499.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1299, cr_loss=0.3736, attn_decoder_loss=0.2472, over 4507168.44 frames. ], batch size: 92, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:45:22,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=417580.0, ans=0.1 2024-09-18 09:45:24,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-09-18 09:45:31,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=417620.0, ans=0.125 2024-09-18 09:45:39,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=417620.0, ans=0.0 2024-09-18 09:45:43,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=417660.0, ans=0.0 2024-09-18 09:45:51,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=417660.0, ans=0.025 2024-09-18 09:45:54,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=417660.0, ans=0.2 2024-09-18 09:45:58,774 INFO [train.py:1198] (1/2) Epoch 24, batch 350, loss[loss=0.2229, ctc_loss=0.1164, cr_loss=0.3445, attn_decoder_loss=0.2271, over 29313.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1308, cr_loss=0.3757, attn_decoder_loss=0.2482, over 4793702.78 frames. ], batch size: 71, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:46:05,093 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:46:13,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=15.0 2024-09-18 09:46:25,834 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.494e+01 8.951e+01 9.745e+01 1.329e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-18 09:47:14,291 INFO [train.py:1198] (1/2) Epoch 24, batch 400, loss[loss=0.2455, ctc_loss=0.1291, cr_loss=0.3797, attn_decoder_loss=0.25, over 29693.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1306, cr_loss=0.375, attn_decoder_loss=0.2481, over 5023870.83 frames. ], batch size: 82, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:47:14,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=417900.0, ans=0.09899494936611666 2024-09-18 09:47:22,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=417900.0, ans=0.0 2024-09-18 09:47:28,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=417940.0, ans=0.125 2024-09-18 09:47:36,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=417940.0, ans=0.2 2024-09-18 09:48:03,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=418020.0, ans=0.2 2024-09-18 09:48:05,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-18 09:48:31,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=418060.0, ans=0.0 2024-09-18 09:48:35,414 INFO [train.py:1198] (1/2) Epoch 24, batch 450, loss[loss=0.2516, ctc_loss=0.1374, cr_loss=0.384, attn_decoder_loss=0.2558, over 29694.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1306, cr_loss=0.3746, attn_decoder_loss=0.2482, over 5186864.25 frames. ], batch size: 83, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:48:35,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=418100.0, ans=0.0 2024-09-18 09:48:37,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=418100.0, ans=0.125 2024-09-18 09:49:02,619 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.530e+01 9.135e+01 9.796e+01 4.658e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-18 09:49:30,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2024-09-18 09:49:42,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=418260.0, ans=0.125 2024-09-18 09:49:50,882 INFO [train.py:1198] (1/2) Epoch 24, batch 500, loss[loss=0.2596, ctc_loss=0.1426, cr_loss=0.403, attn_decoder_loss=0.2636, over 29448.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1298, cr_loss=0.3729, attn_decoder_loss=0.2472, over 5329753.03 frames. ], batch size: 94, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:49:54,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=418300.0, ans=0.2 2024-09-18 09:50:08,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=418340.0, ans=0.125 2024-09-18 09:50:18,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=418340.0, ans=0.125 2024-09-18 09:50:23,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=418380.0, ans=0.125 2024-09-18 09:50:31,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=418380.0, ans=0.125 2024-09-18 09:50:40,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=418420.0, ans=0.125 2024-09-18 09:50:47,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=418420.0, ans=0.0 2024-09-18 09:50:49,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418420.0, ans=0.1 2024-09-18 09:50:55,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=418460.0, ans=0.125 2024-09-18 09:51:07,113 INFO [train.py:1198] (1/2) Epoch 24, batch 550, loss[loss=0.258, ctc_loss=0.1439, cr_loss=0.411, attn_decoder_loss=0.2615, over 28810.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.13, cr_loss=0.3735, attn_decoder_loss=0.2475, over 5422793.82 frames. ], batch size: 104, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:51:07,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-09-18 09:51:18,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=418500.0, ans=0.0 2024-09-18 09:51:19,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418500.0, ans=0.1 2024-09-18 09:51:25,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=418540.0, ans=0.2 2024-09-18 09:51:34,482 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.569e+01 9.031e+01 9.630e+01 1.358e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-18 09:51:34,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=418540.0, ans=0.125 2024-09-18 09:51:39,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=12.0 2024-09-18 09:51:48,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=418580.0, ans=0.125 2024-09-18 09:52:27,729 INFO [train.py:1198] (1/2) Epoch 24, batch 600, loss[loss=0.2527, ctc_loss=0.1295, cr_loss=0.3756, attn_decoder_loss=0.2581, over 29267.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1301, cr_loss=0.3738, attn_decoder_loss=0.2476, over 5509527.79 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:52:39,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.34 vs. limit=15.0 2024-09-18 09:52:58,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=418780.0, ans=0.07 2024-09-18 09:53:08,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-09-18 09:53:24,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.72 vs. limit=15.0 2024-09-18 09:53:26,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=418860.0, ans=0.0 2024-09-18 09:53:40,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-09-18 09:53:42,745 INFO [train.py:1198] (1/2) Epoch 24, batch 650, loss[loss=0.2361, ctc_loss=0.1151, cr_loss=0.3482, attn_decoder_loss=0.2418, over 29776.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1287, cr_loss=0.3714, attn_decoder_loss=0.2465, over 5586691.52 frames. ], batch size: 81, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 09:53:52,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.12 vs. limit=10.0 2024-09-18 09:54:10,118 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.656e+01 8.941e+01 9.589e+01 2.067e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 09:54:19,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=418980.0, ans=0.125 2024-09-18 09:54:30,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2024-09-18 09:54:37,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419020.0, ans=0.1 2024-09-18 09:54:52,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=419060.0, ans=0.5 2024-09-18 09:54:54,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=419060.0, ans=0.125 2024-09-18 09:54:58,689 INFO [train.py:1198] (1/2) Epoch 24, batch 700, loss[loss=0.2224, ctc_loss=0.1076, cr_loss=0.3348, attn_decoder_loss=0.2277, over 29527.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1294, cr_loss=0.3729, attn_decoder_loss=0.2471, over 5636472.79 frames. ], batch size: 76, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 09:55:32,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=419180.0, ans=0.125 2024-09-18 09:55:33,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=419180.0, ans=0.0 2024-09-18 09:55:59,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=419260.0, ans=0.125 2024-09-18 09:56:17,371 INFO [train.py:1198] (1/2) Epoch 24, batch 750, loss[loss=0.2385, ctc_loss=0.1174, cr_loss=0.3444, attn_decoder_loss=0.2443, over 29688.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1296, cr_loss=0.3723, attn_decoder_loss=0.2469, over 5674250.63 frames. ], batch size: 82, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 09:56:19,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=419300.0, ans=0.125 2024-09-18 09:56:46,705 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.693e+01 9.112e+01 9.779e+01 2.514e+02, threshold=1.822e+02, percent-clipped=3.0 2024-09-18 09:56:57,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=419380.0, ans=0.125 2024-09-18 09:57:04,065 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:57:19,284 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:57:36,006 INFO [train.py:1198] (1/2) Epoch 24, batch 800, loss[loss=0.219, ctc_loss=0.1139, cr_loss=0.342, attn_decoder_loss=0.223, over 29600.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1291, cr_loss=0.3718, attn_decoder_loss=0.2467, over 5705834.25 frames. ], batch size: 73, lr: 4.66e-03, grad_scale: 16.0 2024-09-18 09:57:45,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=419500.0, ans=0.2 2024-09-18 09:57:49,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=419540.0, ans=0.125 2024-09-18 09:58:25,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=419620.0, ans=0.125 2024-09-18 09:58:52,065 INFO [train.py:1198] (1/2) Epoch 24, batch 850, loss[loss=0.2511, ctc_loss=0.1286, cr_loss=0.363, attn_decoder_loss=0.2566, over 29694.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.129, cr_loss=0.3711, attn_decoder_loss=0.2463, over 5734649.73 frames. ], batch size: 89, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 09:58:58,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=419700.0, ans=0.125 2024-09-18 09:59:21,100 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.516e+01 9.029e+01 9.587e+01 2.043e+02, threshold=1.806e+02, percent-clipped=2.0 2024-09-18 09:59:51,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=419820.0, ans=0.125 2024-09-18 09:59:54,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=419860.0, ans=0.07 2024-09-18 10:00:01,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=419860.0, ans=0.2 2024-09-18 10:00:09,278 INFO [train.py:1198] (1/2) Epoch 24, batch 900, loss[loss=0.2253, ctc_loss=0.1118, cr_loss=0.3447, attn_decoder_loss=0.2302, over 29578.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1293, cr_loss=0.3723, attn_decoder_loss=0.2466, over 5739197.66 frames. ], batch size: 73, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 10:00:24,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=419900.0, ans=0.2 2024-09-18 10:00:30,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=419940.0, ans=0.125 2024-09-18 10:00:41,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=419940.0, ans=0.125 2024-09-18 10:00:44,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=419980.0, ans=0.0 2024-09-18 10:01:15,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=420060.0, ans=0.125 2024-09-18 10:01:17,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.84 vs. limit=15.0 2024-09-18 10:01:31,751 INFO [train.py:1198] (1/2) Epoch 24, batch 950, loss[loss=0.2296, ctc_loss=0.1194, cr_loss=0.3624, attn_decoder_loss=0.2338, over 29504.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1293, cr_loss=0.3722, attn_decoder_loss=0.2469, over 5740883.39 frames. ], batch size: 74, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 10:01:51,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=420140.0, ans=0.05 2024-09-18 10:02:00,914 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.680e+01 8.541e+01 9.200e+01 9.747e+01 2.326e+02, threshold=1.840e+02, percent-clipped=1.0 2024-09-18 10:02:01,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=420180.0, ans=0.0 2024-09-18 10:02:17,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2024-09-18 10:02:19,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=420220.0, ans=0.125 2024-09-18 10:02:24,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420220.0, ans=0.1 2024-09-18 10:02:26,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-18 10:02:27,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420220.0, ans=0.1 2024-09-18 10:02:37,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420260.0, ans=0.1 2024-09-18 10:02:38,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2024-09-18 10:02:47,897 INFO [train.py:1198] (1/2) Epoch 24, batch 1000, loss[loss=0.2351, ctc_loss=0.1286, cr_loss=0.3804, attn_decoder_loss=0.2385, over 29521.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1302, cr_loss=0.3733, attn_decoder_loss=0.2474, over 5734405.69 frames. ], batch size: 77, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 10:02:57,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=420300.0, ans=0.125 2024-09-18 10:03:03,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=420340.0, ans=0.0 2024-09-18 10:03:03,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420340.0, ans=0.1 2024-09-18 10:03:20,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420380.0, ans=0.1 2024-09-18 10:03:21,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=420380.0, ans=0.0 2024-09-18 10:03:40,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.93 vs. limit=10.0 2024-09-18 10:04:03,950 INFO [train.py:1198] (1/2) Epoch 24, batch 1050, loss[loss=0.2622, ctc_loss=0.1453, cr_loss=0.3923, attn_decoder_loss=0.2665, over 29696.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1302, cr_loss=0.3737, attn_decoder_loss=0.247, over 5743784.42 frames. ], batch size: 85, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 10:04:18,896 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:04:19,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.42 vs. limit=10.0 2024-09-18 10:04:23,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=420540.0, ans=0.95 2024-09-18 10:04:35,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.418e+01 8.849e+01 9.632e+01 1.961e+02, threshold=1.770e+02, percent-clipped=1.0 2024-09-18 10:04:37,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=420580.0, ans=0.125 2024-09-18 10:04:44,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=420580.0, ans=0.125 2024-09-18 10:05:01,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=420620.0, ans=0.1 2024-09-18 10:05:22,853 INFO [train.py:1198] (1/2) Epoch 24, batch 1100, loss[loss=0.2368, ctc_loss=0.1206, cr_loss=0.3539, attn_decoder_loss=0.2418, over 29470.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1297, cr_loss=0.3726, attn_decoder_loss=0.2467, over 5756175.46 frames. ], batch size: 78, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:05:27,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-09-18 10:05:41,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=420740.0, ans=0.0 2024-09-18 10:06:04,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.01 vs. limit=22.5 2024-09-18 10:06:08,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-18 10:06:19,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-18 10:06:39,960 INFO [train.py:1198] (1/2) Epoch 24, batch 1150, loss[loss=0.2335, ctc_loss=0.1207, cr_loss=0.3622, attn_decoder_loss=0.2379, over 29450.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1299, cr_loss=0.3735, attn_decoder_loss=0.2467, over 5754086.85 frames. ], batch size: 78, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:06:41,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=420900.0, ans=0.2 2024-09-18 10:06:54,190 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:06:58,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=420940.0, ans=0.125 2024-09-18 10:07:09,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 8.363e+01 8.865e+01 9.557e+01 3.982e+02, threshold=1.773e+02, percent-clipped=2.0 2024-09-18 10:07:24,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=421020.0, ans=0.125 2024-09-18 10:07:26,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=421020.0, ans=0.125 2024-09-18 10:07:44,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=421060.0, ans=0.125 2024-09-18 10:07:58,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-18 10:07:58,434 INFO [train.py:1198] (1/2) Epoch 24, batch 1200, loss[loss=0.2518, ctc_loss=0.1317, cr_loss=0.3683, attn_decoder_loss=0.2569, over 29678.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1303, cr_loss=0.3733, attn_decoder_loss=0.2476, over 5746721.04 frames. ], batch size: 85, lr: 4.65e-03, grad_scale: 16.0 2024-09-18 10:08:01,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=421100.0, ans=10.0 2024-09-18 10:08:20,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=421140.0, ans=0.0 2024-09-18 10:08:33,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2024-09-18 10:08:52,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=421220.0, ans=0.2 2024-09-18 10:09:01,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=421260.0, ans=0.125 2024-09-18 10:09:03,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=421260.0, ans=0.125 2024-09-18 10:09:16,338 INFO [train.py:1198] (1/2) Epoch 24, batch 1250, loss[loss=0.2678, ctc_loss=0.143, cr_loss=0.3954, attn_decoder_loss=0.2729, over 29563.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1304, cr_loss=0.374, attn_decoder_loss=0.2479, over 5774101.85 frames. ], batch size: 92, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:09:21,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=421300.0, ans=0.125 2024-09-18 10:09:25,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=421300.0, ans=0.5 2024-09-18 10:09:33,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421340.0, ans=0.1 2024-09-18 10:09:46,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.716e+01 9.105e+01 9.689e+01 1.606e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-18 10:09:56,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-09-18 10:10:20,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=421460.0, ans=0.2 2024-09-18 10:10:22,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=421460.0, ans=0.5 2024-09-18 10:10:25,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2024-09-18 10:10:32,438 INFO [train.py:1198] (1/2) Epoch 24, batch 1300, loss[loss=0.2478, ctc_loss=0.1309, cr_loss=0.3743, attn_decoder_loss=0.2524, over 28364.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1298, cr_loss=0.3726, attn_decoder_loss=0.2469, over 5780140.03 frames. ], batch size: 112, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:11:00,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=421540.0, ans=0.035 2024-09-18 10:11:11,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421580.0, ans=0.1 2024-09-18 10:11:26,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=421620.0, ans=0.125 2024-09-18 10:11:34,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=421660.0, ans=0.0 2024-09-18 10:11:49,275 INFO [train.py:1198] (1/2) Epoch 24, batch 1350, loss[loss=0.2451, ctc_loss=0.1237, cr_loss=0.3611, attn_decoder_loss=0.2506, over 29763.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1293, cr_loss=0.3725, attn_decoder_loss=0.2468, over 5798220.99 frames. ], batch size: 81, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:11:54,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=421700.0, ans=0.125 2024-09-18 10:12:02,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=421700.0, ans=0.2 2024-09-18 10:12:24,087 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.854e+01 9.285e+01 9.935e+01 1.189e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-18 10:12:27,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=421780.0, ans=0.0 2024-09-18 10:12:45,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=421820.0, ans=0.125 2024-09-18 10:12:51,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=421820.0, ans=0.0 2024-09-18 10:13:02,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421860.0, ans=0.1 2024-09-18 10:13:09,384 INFO [train.py:1198] (1/2) Epoch 24, batch 1400, loss[loss=0.2162, ctc_loss=0.1119, cr_loss=0.3179, attn_decoder_loss=0.2208, over 29559.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1289, cr_loss=0.3714, attn_decoder_loss=0.2466, over 5808672.95 frames. ], batch size: 69, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:13:15,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2024-09-18 10:13:15,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421900.0, ans=0.1 2024-09-18 10:13:35,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=421940.0, ans=0.125 2024-09-18 10:14:14,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=422060.0, ans=0.125 2024-09-18 10:14:17,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=422060.0, ans=0.125 2024-09-18 10:14:17,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=422060.0, ans=0.025 2024-09-18 10:14:24,943 INFO [train.py:1198] (1/2) Epoch 24, batch 1450, loss[loss=0.2607, ctc_loss=0.1497, cr_loss=0.4058, attn_decoder_loss=0.264, over 29436.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1292, cr_loss=0.3721, attn_decoder_loss=0.2472, over 5804611.02 frames. ], batch size: 94, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:14:42,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=422140.0, ans=0.125 2024-09-18 10:14:55,267 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.531e+01 9.051e+01 9.633e+01 1.306e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-18 10:15:23,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=422260.0, ans=0.2 2024-09-18 10:15:40,534 INFO [train.py:1198] (1/2) Epoch 24, batch 1500, loss[loss=0.2477, ctc_loss=0.1305, cr_loss=0.3611, attn_decoder_loss=0.2527, over 29639.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1298, cr_loss=0.3734, attn_decoder_loss=0.2478, over 5805094.36 frames. ], batch size: 86, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:15:50,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=422300.0, ans=0.2 2024-09-18 10:15:55,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=12.0 2024-09-18 10:16:13,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2024-09-18 10:16:17,615 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:16:20,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=422380.0, ans=0.0 2024-09-18 10:16:24,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=422380.0, ans=0.125 2024-09-18 10:17:01,792 INFO [train.py:1198] (1/2) Epoch 24, batch 1550, loss[loss=0.268, ctc_loss=0.152, cr_loss=0.4249, attn_decoder_loss=0.2714, over 29516.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1305, cr_loss=0.3741, attn_decoder_loss=0.2479, over 5781098.13 frames. ], batch size: 90, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:17:15,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=422540.0, ans=0.1 2024-09-18 10:17:32,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 8.697e+01 9.200e+01 9.648e+01 4.928e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-18 10:17:41,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=422580.0, ans=0.125 2024-09-18 10:17:45,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=422620.0, ans=0.125 2024-09-18 10:17:58,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=422620.0, ans=0.0 2024-09-18 10:18:06,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=422660.0, ans=0.2 2024-09-18 10:18:17,231 INFO [train.py:1198] (1/2) Epoch 24, batch 1600, loss[loss=0.2461, ctc_loss=0.1241, cr_loss=0.3498, attn_decoder_loss=0.2518, over 29698.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1308, cr_loss=0.3742, attn_decoder_loss=0.2479, over 5762841.10 frames. ], batch size: 85, lr: 4.64e-03, grad_scale: 16.0 2024-09-18 10:18:35,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=422740.0, ans=0.125 2024-09-18 10:19:15,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422820.0, ans=0.1 2024-09-18 10:19:25,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=422860.0, ans=0.2 2024-09-18 10:19:25,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=422860.0, ans=0.2 2024-09-18 10:19:35,112 INFO [train.py:1198] (1/2) Epoch 24, batch 1650, loss[loss=0.255, ctc_loss=0.1373, cr_loss=0.4148, attn_decoder_loss=0.2588, over 29698.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1307, cr_loss=0.3742, attn_decoder_loss=0.2477, over 5756359.49 frames. ], batch size: 89, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:19:35,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=422900.0, ans=0.125 2024-09-18 10:19:51,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422940.0, ans=0.1 2024-09-18 10:20:03,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2024-09-18 10:20:08,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.438e+01 9.287e+01 9.952e+01 1.595e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-18 10:20:23,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.15 vs. limit=10.0 2024-09-18 10:20:30,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=423020.0, ans=0.0 2024-09-18 10:20:32,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=423020.0, ans=0.0 2024-09-18 10:20:39,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=423060.0, ans=0.015 2024-09-18 10:20:45,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.71 vs. limit=22.5 2024-09-18 10:20:52,755 INFO [train.py:1198] (1/2) Epoch 24, batch 1700, loss[loss=0.2113, ctc_loss=0.1053, cr_loss=0.3333, attn_decoder_loss=0.2156, over 29546.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1302, cr_loss=0.3736, attn_decoder_loss=0.2471, over 5779302.32 frames. ], batch size: 69, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:20:56,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=423100.0, ans=0.0 2024-09-18 10:20:59,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2024-09-18 10:21:03,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=423100.0, ans=0.0 2024-09-18 10:21:35,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=423180.0, ans=10.0 2024-09-18 10:21:35,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423180.0, ans=0.1 2024-09-18 10:21:46,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-18 10:21:49,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.32 vs. limit=15.0 2024-09-18 10:21:50,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=423220.0, ans=0.125 2024-09-18 10:22:07,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=423300.0, ans=0.0 2024-09-18 10:22:08,370 INFO [train.py:1198] (1/2) Epoch 24, batch 1750, loss[loss=0.2071, ctc_loss=0.1033, cr_loss=0.3146, attn_decoder_loss=0.2117, over 29373.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1297, cr_loss=0.3725, attn_decoder_loss=0.2466, over 5788099.21 frames. ], batch size: 67, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:22:13,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=12.0 2024-09-18 10:22:33,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=423340.0, ans=0.0 2024-09-18 10:22:40,204 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.086e+01 8.547e+01 8.974e+01 9.351e+01 1.739e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-18 10:22:45,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=423380.0, ans=0.125 2024-09-18 10:23:12,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=423460.0, ans=0.0 2024-09-18 10:23:25,962 INFO [train.py:1198] (1/2) Epoch 24, batch 1800, loss[loss=0.255, ctc_loss=0.1365, cr_loss=0.3751, attn_decoder_loss=0.2598, over 29716.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1296, cr_loss=0.3722, attn_decoder_loss=0.2468, over 5791797.05 frames. ], batch size: 83, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:23:27,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=423500.0, ans=0.125 2024-09-18 10:24:05,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.44 vs. limit=10.0 2024-09-18 10:24:43,915 INFO [train.py:1198] (1/2) Epoch 24, batch 1850, loss[loss=0.2484, ctc_loss=0.1313, cr_loss=0.3625, attn_decoder_loss=0.2533, over 29634.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1297, cr_loss=0.3725, attn_decoder_loss=0.2468, over 5798014.97 frames. ], batch size: 86, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:24:50,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=423700.0, ans=0.025 2024-09-18 10:24:56,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=423700.0, ans=0.05 2024-09-18 10:25:15,525 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.229e+01 8.631e+01 9.392e+01 8.263e+02, threshold=1.726e+02, percent-clipped=1.0 2024-09-18 10:25:20,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=423780.0, ans=0.125 2024-09-18 10:25:24,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=12.0 2024-09-18 10:25:28,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=423820.0, ans=0.125 2024-09-18 10:25:43,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=423860.0, ans=0.125 2024-09-18 10:25:43,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.12 vs. limit=15.0 2024-09-18 10:25:59,346 INFO [train.py:1198] (1/2) Epoch 24, batch 1900, loss[loss=0.2613, ctc_loss=0.1434, cr_loss=0.3924, attn_decoder_loss=0.2657, over 29727.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1301, cr_loss=0.3732, attn_decoder_loss=0.2475, over 5805191.85 frames. ], batch size: 89, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:25:59,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=423900.0, ans=0.2 2024-09-18 10:26:02,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=423900.0, ans=0.0 2024-09-18 10:26:15,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=423940.0, ans=0.125 2024-09-18 10:26:27,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=423940.0, ans=0.0 2024-09-18 10:26:39,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=423980.0, ans=0.125 2024-09-18 10:26:41,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=423980.0, ans=0.125 2024-09-18 10:27:01,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.62 vs. limit=15.0 2024-09-18 10:27:15,265 INFO [train.py:1198] (1/2) Epoch 24, batch 1950, loss[loss=0.2405, ctc_loss=0.1225, cr_loss=0.3649, attn_decoder_loss=0.2455, over 29417.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1307, cr_loss=0.3745, attn_decoder_loss=0.2486, over 5819917.94 frames. ], batch size: 78, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:27:31,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=424140.0, ans=0.125 2024-09-18 10:27:37,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=424140.0, ans=0.125 2024-09-18 10:27:37,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=424140.0, ans=0.0 2024-09-18 10:27:48,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=424180.0, ans=0.0 2024-09-18 10:27:49,217 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.703e+01 9.158e+01 9.577e+01 1.650e+02, threshold=1.832e+02, percent-clipped=0.0 2024-09-18 10:28:20,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=424260.0, ans=0.04949747468305833 2024-09-18 10:28:35,221 INFO [train.py:1198] (1/2) Epoch 24, batch 2000, loss[loss=0.2224, ctc_loss=0.1135, cr_loss=0.3322, attn_decoder_loss=0.2271, over 29337.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.131, cr_loss=0.375, attn_decoder_loss=0.2488, over 5795293.81 frames. ], batch size: 67, lr: 4.64e-03, grad_scale: 16.0 2024-09-18 10:29:18,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=424380.0, ans=0.125 2024-09-18 10:29:43,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-09-18 10:29:47,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=424460.0, ans=0.09899494936611666 2024-09-18 10:29:51,157 INFO [train.py:1198] (1/2) Epoch 24, batch 2050, loss[loss=0.2196, ctc_loss=0.1146, cr_loss=0.3615, attn_decoder_loss=0.2232, over 29457.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.13, cr_loss=0.3734, attn_decoder_loss=0.2476, over 5787631.79 frames. ], batch size: 70, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:29:57,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=424500.0, ans=0.2 2024-09-18 10:30:06,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=424540.0, ans=0.0 2024-09-18 10:30:11,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=424540.0, ans=0.025 2024-09-18 10:30:24,690 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.469e+01 9.021e+01 9.794e+01 2.013e+02, threshold=1.804e+02, percent-clipped=1.0 2024-09-18 10:30:36,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2024-09-18 10:30:45,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2024-09-18 10:31:07,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=424700.0, ans=0.07 2024-09-18 10:31:08,882 INFO [train.py:1198] (1/2) Epoch 24, batch 2100, loss[loss=0.2405, ctc_loss=0.1259, cr_loss=0.358, attn_decoder_loss=0.2453, over 29770.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1296, cr_loss=0.3726, attn_decoder_loss=0.2472, over 5800025.72 frames. ], batch size: 81, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:31:16,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=424700.0, ans=0.125 2024-09-18 10:31:42,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=424780.0, ans=0.125 2024-09-18 10:31:55,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.74 vs. limit=15.0 2024-09-18 10:32:07,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=424820.0, ans=0.125 2024-09-18 10:32:07,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-09-18 10:32:24,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=424860.0, ans=0.125 2024-09-18 10:32:26,897 INFO [train.py:1198] (1/2) Epoch 24, batch 2150, loss[loss=0.2454, ctc_loss=0.1329, cr_loss=0.375, attn_decoder_loss=0.2495, over 29444.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1287, cr_loss=0.371, attn_decoder_loss=0.2466, over 5815404.79 frames. ], batch size: 78, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:32:28,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=424900.0, ans=0.1 2024-09-18 10:32:42,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=424940.0, ans=0.125 2024-09-18 10:33:00,319 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.375e+01 8.762e+01 9.510e+01 1.706e+02, threshold=1.752e+02, percent-clipped=0.0 2024-09-18 10:33:10,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2024-09-18 10:33:13,334 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.18 vs. limit=15.0 2024-09-18 10:33:35,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=425060.0, ans=0.0 2024-09-18 10:33:35,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2024-09-18 10:33:42,619 INFO [train.py:1198] (1/2) Epoch 24, batch 2200, loss[loss=0.2424, ctc_loss=0.1291, cr_loss=0.3726, attn_decoder_loss=0.2467, over 29626.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1289, cr_loss=0.3714, attn_decoder_loss=0.2467, over 5810433.60 frames. ], batch size: 86, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:33:49,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=425100.0, ans=0.125 2024-09-18 10:33:50,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=425100.0, ans=0.05 2024-09-18 10:34:19,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=425180.0, ans=0.125 2024-09-18 10:34:23,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=425180.0, ans=0.125 2024-09-18 10:34:28,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=425220.0, ans=0.0 2024-09-18 10:34:44,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=12.0 2024-09-18 10:34:45,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=425260.0, ans=0.0 2024-09-18 10:34:50,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=12.0 2024-09-18 10:34:58,661 INFO [train.py:1198] (1/2) Epoch 24, batch 2250, loss[loss=0.2442, ctc_loss=0.1286, cr_loss=0.3822, attn_decoder_loss=0.2485, over 29714.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.129, cr_loss=0.3719, attn_decoder_loss=0.2469, over 5810122.47 frames. ], batch size: 82, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:35:17,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=425340.0, ans=0.2 2024-09-18 10:35:31,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=425380.0, ans=0.125 2024-09-18 10:35:33,888 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.517e+01 9.003e+01 9.651e+01 2.176e+02, threshold=1.801e+02, percent-clipped=2.0 2024-09-18 10:35:47,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=425420.0, ans=0.0 2024-09-18 10:35:51,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=425420.0, ans=0.0 2024-09-18 10:36:18,253 INFO [train.py:1198] (1/2) Epoch 24, batch 2300, loss[loss=0.2231, ctc_loss=0.1151, cr_loss=0.3465, attn_decoder_loss=0.2274, over 29329.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1287, cr_loss=0.3704, attn_decoder_loss=0.246, over 5798521.93 frames. ], batch size: 71, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:36:18,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=425500.0, ans=0.2 2024-09-18 10:37:06,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=425620.0, ans=0.09899494936611666 2024-09-18 10:37:09,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=425620.0, ans=0.0 2024-09-18 10:37:12,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2024-09-18 10:37:34,648 INFO [train.py:1198] (1/2) Epoch 24, batch 2350, loss[loss=0.2447, ctc_loss=0.128, cr_loss=0.3704, attn_decoder_loss=0.2494, over 29688.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1289, cr_loss=0.371, attn_decoder_loss=0.2463, over 5804547.36 frames. ], batch size: 83, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:38:07,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.476e+01 9.011e+01 9.684e+01 2.166e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-18 10:38:09,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=425780.0, ans=0.2 2024-09-18 10:38:26,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=425820.0, ans=0.125 2024-09-18 10:38:35,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=425860.0, ans=0.025 2024-09-18 10:38:50,535 INFO [train.py:1198] (1/2) Epoch 24, batch 2400, loss[loss=0.2289, ctc_loss=0.1172, cr_loss=0.3458, attn_decoder_loss=0.2336, over 29540.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1291, cr_loss=0.3715, attn_decoder_loss=0.2466, over 5807727.24 frames. ], batch size: 76, lr: 4.63e-03, grad_scale: 16.0 2024-09-18 10:38:57,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.63 vs. limit=15.0 2024-09-18 10:39:07,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=425940.0, ans=0.125 2024-09-18 10:39:15,721 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:40:09,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.70 vs. limit=15.0 2024-09-18 10:40:10,568 INFO [train.py:1198] (1/2) Epoch 24, batch 2450, loss[loss=0.2428, ctc_loss=0.1269, cr_loss=0.3869, attn_decoder_loss=0.2471, over 29730.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.13, cr_loss=0.3729, attn_decoder_loss=0.2476, over 5784836.66 frames. ], batch size: 82, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:40:12,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=426100.0, ans=0.2 2024-09-18 10:40:33,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=426140.0, ans=0.95 2024-09-18 10:40:35,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=426140.0, ans=0.125 2024-09-18 10:40:37,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=12.0 2024-09-18 10:40:45,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.992e+01 9.865e+01 1.103e+02 3.120e+02, threshold=1.973e+02, percent-clipped=1.0 2024-09-18 10:40:59,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426220.0, ans=0.1 2024-09-18 10:41:02,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426220.0, ans=0.1 2024-09-18 10:41:26,504 INFO [train.py:1198] (1/2) Epoch 24, batch 2500, loss[loss=0.2492, ctc_loss=0.1346, cr_loss=0.3784, attn_decoder_loss=0.2535, over 29645.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1298, cr_loss=0.3729, attn_decoder_loss=0.2474, over 5795113.19 frames. ], batch size: 86, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:41:46,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426340.0, ans=0.1 2024-09-18 10:41:54,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=426340.0, ans=0.0 2024-09-18 10:41:58,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=426380.0, ans=0.125 2024-09-18 10:42:11,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=426420.0, ans=0.0 2024-09-18 10:42:35,785 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:42:45,155 INFO [train.py:1198] (1/2) Epoch 24, batch 2550, loss[loss=0.2286, ctc_loss=0.1221, cr_loss=0.3697, attn_decoder_loss=0.2322, over 29334.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1297, cr_loss=0.3729, attn_decoder_loss=0.2474, over 5799323.89 frames. ], batch size: 67, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:43:06,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=426540.0, ans=0.125 2024-09-18 10:43:19,600 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.669e+01 9.245e+01 9.655e+01 1.436e+02, threshold=1.849e+02, percent-clipped=0.0 2024-09-18 10:43:26,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=426580.0, ans=0.125 2024-09-18 10:44:03,063 INFO [train.py:1198] (1/2) Epoch 24, batch 2600, loss[loss=0.2345, ctc_loss=0.1272, cr_loss=0.3633, attn_decoder_loss=0.2384, over 29432.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1301, cr_loss=0.374, attn_decoder_loss=0.2479, over 5795871.30 frames. ], batch size: 78, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:44:06,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.55 vs. limit=22.5 2024-09-18 10:44:12,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-09-18 10:44:21,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=426740.0, ans=0.125 2024-09-18 10:44:21,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=426740.0, ans=0.125 2024-09-18 10:44:27,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=426740.0, ans=0.125 2024-09-18 10:44:31,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=426780.0, ans=0.025 2024-09-18 10:44:37,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=426780.0, ans=0.125 2024-09-18 10:45:11,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=426860.0, ans=15.0 2024-09-18 10:45:18,306 INFO [train.py:1198] (1/2) Epoch 24, batch 2650, loss[loss=0.2607, ctc_loss=0.1411, cr_loss=0.4122, attn_decoder_loss=0.2648, over 29301.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1297, cr_loss=0.3736, attn_decoder_loss=0.2479, over 5801789.58 frames. ], batch size: 100, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:45:53,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 8.422e+01 8.884e+01 9.489e+01 2.051e+02, threshold=1.777e+02, percent-clipped=1.0 2024-09-18 10:46:05,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=427020.0, ans=0.125 2024-09-18 10:46:23,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=427060.0, ans=0.125 2024-09-18 10:46:35,896 INFO [train.py:1198] (1/2) Epoch 24, batch 2700, loss[loss=0.2424, ctc_loss=0.1243, cr_loss=0.3477, attn_decoder_loss=0.2478, over 29504.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1302, cr_loss=0.374, attn_decoder_loss=0.2479, over 5796766.64 frames. ], batch size: 87, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:46:45,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.44 vs. limit=15.0 2024-09-18 10:46:46,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=427100.0, ans=0.125 2024-09-18 10:46:48,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=427100.0, ans=0.125 2024-09-18 10:46:58,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=427140.0, ans=0.0 2024-09-18 10:47:07,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=427180.0, ans=0.0 2024-09-18 10:47:07,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=427180.0, ans=0.125 2024-09-18 10:47:41,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=427260.0, ans=0.0 2024-09-18 10:47:48,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=427260.0, ans=0.2 2024-09-18 10:47:54,350 INFO [train.py:1198] (1/2) Epoch 24, batch 2750, loss[loss=0.2367, ctc_loss=0.1305, cr_loss=0.3819, attn_decoder_loss=0.24, over 29527.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1298, cr_loss=0.373, attn_decoder_loss=0.247, over 5796191.12 frames. ], batch size: 75, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:48:08,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=427340.0, ans=0.1 2024-09-18 10:48:23,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=427380.0, ans=0.025 2024-09-18 10:48:28,941 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.612e+01 9.140e+01 9.786e+01 3.109e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-18 10:48:38,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=427420.0, ans=0.05 2024-09-18 10:48:44,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=427420.0, ans=0.95 2024-09-18 10:48:55,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=427460.0, ans=0.2 2024-09-18 10:49:00,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=12.0 2024-09-18 10:49:10,153 INFO [train.py:1198] (1/2) Epoch 24, batch 2800, loss[loss=0.2711, ctc_loss=0.1654, cr_loss=0.4066, attn_decoder_loss=0.2738, over 20653.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1303, cr_loss=0.3742, attn_decoder_loss=0.2474, over 5778468.14 frames. ], batch size: 210, lr: 4.62e-03, grad_scale: 16.0 2024-09-18 10:49:11,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=427500.0, ans=0.04949747468305833 2024-09-18 10:49:39,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=427580.0, ans=0.025 2024-09-18 10:49:50,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-09-18 10:49:54,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=427620.0, ans=0.07 2024-09-18 10:50:27,671 INFO [train.py:1198] (1/2) Epoch 24, batch 2850, loss[loss=0.2306, ctc_loss=0.12, cr_loss=0.3408, attn_decoder_loss=0.2353, over 29515.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1302, cr_loss=0.3736, attn_decoder_loss=0.2474, over 5761380.07 frames. ], batch size: 77, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:50:29,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=427700.0, ans=0.125 2024-09-18 10:50:32,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=427700.0, ans=0.0 2024-09-18 10:50:43,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=427740.0, ans=0.2 2024-09-18 10:50:48,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-09-18 10:51:04,123 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 8.759e+01 9.407e+01 9.943e+01 3.710e+02, threshold=1.881e+02, percent-clipped=1.0 2024-09-18 10:51:10,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427780.0, ans=0.1 2024-09-18 10:51:19,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=427820.0, ans=0.0 2024-09-18 10:51:23,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=427820.0, ans=0.0 2024-09-18 10:51:33,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.97 vs. limit=12.0 2024-09-18 10:51:44,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=427900.0, ans=0.0 2024-09-18 10:51:45,762 INFO [train.py:1198] (1/2) Epoch 24, batch 2900, loss[loss=0.2457, ctc_loss=0.134, cr_loss=0.395, attn_decoder_loss=0.2493, over 29441.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1314, cr_loss=0.3766, attn_decoder_loss=0.249, over 5787365.47 frames. ], batch size: 79, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:52:00,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=427940.0, ans=0.0 2024-09-18 10:52:07,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427940.0, ans=0.1 2024-09-18 10:52:31,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=428020.0, ans=0.125 2024-09-18 10:52:36,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.27 vs. limit=15.0 2024-09-18 10:52:51,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-09-18 10:52:51,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=428060.0, ans=0.025 2024-09-18 10:53:02,270 INFO [train.py:1198] (1/2) Epoch 24, batch 2950, loss[loss=0.2274, ctc_loss=0.115, cr_loss=0.3599, attn_decoder_loss=0.2319, over 29525.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1305, cr_loss=0.3744, attn_decoder_loss=0.2475, over 5781612.25 frames. ], batch size: 75, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:53:28,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=428140.0, ans=0.0 2024-09-18 10:53:38,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.398e+01 8.942e+01 9.654e+01 3.446e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 10:54:06,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=428260.0, ans=0.125 2024-09-18 10:54:08,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=428260.0, ans=0.125 2024-09-18 10:54:20,523 INFO [train.py:1198] (1/2) Epoch 24, batch 3000, loss[loss=0.2414, ctc_loss=0.127, cr_loss=0.3707, attn_decoder_loss=0.2459, over 29757.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1302, cr_loss=0.3736, attn_decoder_loss=0.2473, over 5782653.75 frames. ], batch size: 81, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:54:20,524 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 10:54:23,362 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.1916, 3.5015, 3.5165, 3.6745], device='cuda:1') 2024-09-18 10:54:24,927 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.4.self_attn_weights, attn_weights_entropy = tensor([2.6869, 2.2879, 1.9646, 2.2364, 2.2940, 1.4754, 1.9805, 2.0655], device='cuda:1') 2024-09-18 10:54:38,998 INFO [train.py:1230] (1/2) Epoch 24, validation: loss=0.2118, ctc_loss=0.03891, cr_loss=5.525e-15, attn_decoder_loss=0.231, over 944034.00 frames. 2024-09-18 10:54:38,998 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 10:54:55,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.71 vs. limit=22.5 2024-09-18 10:54:58,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=428340.0, ans=0.125 2024-09-18 10:54:58,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=428340.0, ans=0.02 2024-09-18 10:55:02,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428340.0, ans=0.1 2024-09-18 10:55:09,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=428380.0, ans=0.125 2024-09-18 10:55:20,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=428380.0, ans=0.125 2024-09-18 10:55:34,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=428420.0, ans=0.0 2024-09-18 10:55:49,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=428460.0, ans=0.2 2024-09-18 10:55:57,315 INFO [train.py:1198] (1/2) Epoch 24, batch 3050, loss[loss=0.2301, ctc_loss=0.116, cr_loss=0.3515, attn_decoder_loss=0.235, over 29510.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1303, cr_loss=0.3736, attn_decoder_loss=0.2477, over 5776421.50 frames. ], batch size: 76, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:56:11,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=428540.0, ans=0.0 2024-09-18 10:56:15,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=428540.0, ans=0.125 2024-09-18 10:56:22,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.15 vs. limit=15.0 2024-09-18 10:56:30,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=428580.0, ans=0.2 2024-09-18 10:56:32,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=428580.0, ans=0.1 2024-09-18 10:56:33,589 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.657e+01 9.220e+01 9.690e+01 1.587e+02, threshold=1.844e+02, percent-clipped=0.0 2024-09-18 10:56:46,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=428620.0, ans=0.125 2024-09-18 10:56:47,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=428620.0, ans=0.125 2024-09-18 10:56:53,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=428620.0, ans=0.0 2024-09-18 10:57:05,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=428660.0, ans=0.0 2024-09-18 10:57:10,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=428660.0, ans=0.125 2024-09-18 10:57:13,016 INFO [train.py:1198] (1/2) Epoch 24, batch 3100, loss[loss=0.2546, ctc_loss=0.1382, cr_loss=0.3933, attn_decoder_loss=0.2588, over 29251.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1296, cr_loss=0.3723, attn_decoder_loss=0.2473, over 5775578.72 frames. ], batch size: 100, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:57:14,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=428700.0, ans=0.125 2024-09-18 10:57:42,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=428780.0, ans=0.125 2024-09-18 10:58:22,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=428860.0, ans=0.025 2024-09-18 10:58:31,274 INFO [train.py:1198] (1/2) Epoch 24, batch 3150, loss[loss=0.2594, ctc_loss=0.141, cr_loss=0.4038, attn_decoder_loss=0.2636, over 28756.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1295, cr_loss=0.3717, attn_decoder_loss=0.2474, over 5781780.04 frames. ], batch size: 104, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:58:31,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=428900.0, ans=10.0 2024-09-18 10:58:36,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=428900.0, ans=0.2 2024-09-18 10:58:36,324 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:59:00,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=428980.0, ans=0.2 2024-09-18 10:59:07,547 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.611e+01 9.043e+01 9.612e+01 2.237e+02, threshold=1.809e+02, percent-clipped=2.0 2024-09-18 10:59:07,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=428980.0, ans=0.125 2024-09-18 10:59:18,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=429020.0, ans=0.125 2024-09-18 10:59:36,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=22.5 2024-09-18 10:59:49,146 INFO [train.py:1198] (1/2) Epoch 24, batch 3200, loss[loss=0.2431, ctc_loss=0.1299, cr_loss=0.3688, attn_decoder_loss=0.2475, over 29435.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1294, cr_loss=0.3712, attn_decoder_loss=0.2469, over 5791831.69 frames. ], batch size: 79, lr: 4.61e-03, grad_scale: 16.0 2024-09-18 10:59:55,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=429100.0, ans=0.125 2024-09-18 11:00:01,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=429100.0, ans=0.125 2024-09-18 11:00:04,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=429140.0, ans=0.125 2024-09-18 11:00:18,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=429180.0, ans=0.0 2024-09-18 11:00:27,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=429180.0, ans=0.125 2024-09-18 11:00:28,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=429180.0, ans=0.04949747468305833 2024-09-18 11:00:30,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-09-18 11:00:31,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=429180.0, ans=0.025 2024-09-18 11:00:36,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=429220.0, ans=0.0 2024-09-18 11:00:44,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=429220.0, ans=0.125 2024-09-18 11:00:44,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.02 vs. limit=10.0 2024-09-18 11:00:56,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=429260.0, ans=0.025 2024-09-18 11:00:59,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=429260.0, ans=0.0 2024-09-18 11:01:01,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.24 vs. limit=22.5 2024-09-18 11:01:04,859 INFO [train.py:1198] (1/2) Epoch 24, batch 3250, loss[loss=0.2552, ctc_loss=0.1434, cr_loss=0.4049, attn_decoder_loss=0.2586, over 29715.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1301, cr_loss=0.3727, attn_decoder_loss=0.2476, over 5797951.18 frames. ], batch size: 84, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 11:01:26,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.73 vs. limit=15.0 2024-09-18 11:01:42,457 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.528e+01 8.996e+01 9.575e+01 1.279e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-18 11:01:45,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=429380.0, ans=0.0 2024-09-18 11:01:56,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.45 vs. limit=15.0 2024-09-18 11:02:22,368 INFO [train.py:1198] (1/2) Epoch 24, batch 3300, loss[loss=0.2439, ctc_loss=0.1261, cr_loss=0.372, attn_decoder_loss=0.2487, over 28413.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1294, cr_loss=0.3716, attn_decoder_loss=0.2465, over 5795114.10 frames. ], batch size: 111, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 11:02:29,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-18 11:02:46,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-18 11:02:50,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-18 11:02:53,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=429580.0, ans=0.025 2024-09-18 11:03:02,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=429580.0, ans=0.025 2024-09-18 11:03:03,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=429580.0, ans=0.1 2024-09-18 11:03:04,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=429580.0, ans=0.125 2024-09-18 11:03:22,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=429620.0, ans=0.2 2024-09-18 11:03:27,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=429660.0, ans=0.125 2024-09-18 11:03:40,494 INFO [train.py:1198] (1/2) Epoch 24, batch 3350, loss[loss=0.2654, ctc_loss=0.1415, cr_loss=0.3975, attn_decoder_loss=0.2703, over 28816.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1304, cr_loss=0.374, attn_decoder_loss=0.2476, over 5773207.39 frames. ], batch size: 104, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 11:04:18,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=429780.0, ans=6.0 2024-09-18 11:04:18,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.492e+01 9.207e+01 9.979e+01 1.773e+02, threshold=1.841e+02, percent-clipped=0.0 2024-09-18 11:04:19,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=429780.0, ans=0.125 2024-09-18 11:04:23,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=429780.0, ans=0.1 2024-09-18 11:04:37,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=429820.0, ans=0.0 2024-09-18 11:04:43,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=429860.0, ans=0.125 2024-09-18 11:04:56,630 INFO [train.py:1198] (1/2) Epoch 24, batch 3400, loss[loss=0.2237, ctc_loss=0.118, cr_loss=0.3483, attn_decoder_loss=0.2277, over 29373.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1305, cr_loss=0.3741, attn_decoder_loss=0.2475, over 5765806.24 frames. ], batch size: 67, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 11:04:56,949 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:05:04,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=429900.0, ans=0.0 2024-09-18 11:05:10,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=429940.0, ans=0.125 2024-09-18 11:05:19,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429940.0, ans=0.1 2024-09-18 11:05:23,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.99 vs. limit=15.0 2024-09-18 11:05:28,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=429980.0, ans=0.2 2024-09-18 11:05:35,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=429980.0, ans=0.125 2024-09-18 11:05:46,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=430020.0, ans=0.125 2024-09-18 11:05:46,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-18 11:06:01,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=430060.0, ans=0.1 2024-09-18 11:06:04,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=430060.0, ans=0.09899494936611666 2024-09-18 11:06:14,374 INFO [train.py:1198] (1/2) Epoch 24, batch 3450, loss[loss=0.2492, ctc_loss=0.1246, cr_loss=0.3765, attn_decoder_loss=0.2547, over 28225.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1305, cr_loss=0.3748, attn_decoder_loss=0.2478, over 5773975.42 frames. ], batch size: 111, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:06:31,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=430140.0, ans=0.2 2024-09-18 11:06:43,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2024-09-18 11:06:51,985 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.449e+01 8.954e+01 9.468e+01 1.386e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-18 11:07:05,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-18 11:07:32,350 INFO [train.py:1198] (1/2) Epoch 24, batch 3500, loss[loss=0.219, ctc_loss=0.1138, cr_loss=0.3422, attn_decoder_loss=0.2231, over 29318.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1304, cr_loss=0.374, attn_decoder_loss=0.2473, over 5776149.67 frames. ], batch size: 71, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:07:35,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=430300.0, ans=0.125 2024-09-18 11:07:37,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=430300.0, ans=0.125 2024-09-18 11:07:43,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=430300.0, ans=0.2 2024-09-18 11:07:51,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.21 vs. limit=22.5 2024-09-18 11:07:52,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=430340.0, ans=0.125 2024-09-18 11:08:02,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-18 11:08:07,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.28 vs. limit=22.5 2024-09-18 11:08:19,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=430420.0, ans=0.025 2024-09-18 11:08:26,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=430420.0, ans=0.125 2024-09-18 11:08:37,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=430460.0, ans=0.125 2024-09-18 11:08:47,394 INFO [train.py:1198] (1/2) Epoch 24, batch 3550, loss[loss=0.2531, ctc_loss=0.1275, cr_loss=0.3726, attn_decoder_loss=0.2588, over 29698.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1297, cr_loss=0.3726, attn_decoder_loss=0.247, over 5781032.38 frames. ], batch size: 89, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:09:03,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2024-09-18 11:09:06,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=430540.0, ans=0.0 2024-09-18 11:09:24,141 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.484e+01 9.073e+01 9.801e+01 1.561e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 11:09:29,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.63 vs. limit=15.0 2024-09-18 11:09:45,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=430660.0, ans=0.0 2024-09-18 11:10:01,177 INFO [train.py:1198] (1/2) Epoch 24, batch 3600, loss[loss=0.2302, ctc_loss=0.1179, cr_loss=0.3473, attn_decoder_loss=0.235, over 29466.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1293, cr_loss=0.3716, attn_decoder_loss=0.2471, over 5790921.72 frames. ], batch size: 77, lr: 4.60e-03, grad_scale: 16.0 2024-09-18 11:10:01,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=430700.0, ans=0.05 2024-09-18 11:10:10,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=430700.0, ans=0.125 2024-09-18 11:10:23,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=430740.0, ans=0.0 2024-09-18 11:10:24,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=430740.0, ans=22.5 2024-09-18 11:10:33,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=430780.0, ans=0.2 2024-09-18 11:10:43,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=430780.0, ans=0.025 2024-09-18 11:10:47,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2024-09-18 11:10:52,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=430820.0, ans=0.2 2024-09-18 11:10:54,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=430820.0, ans=0.125 2024-09-18 11:10:55,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=430820.0, ans=0.125 2024-09-18 11:11:01,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=430860.0, ans=0.125 2024-09-18 11:11:07,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=430860.0, ans=0.0 2024-09-18 11:11:11,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=430860.0, ans=0.0 2024-09-18 11:11:17,342 INFO [train.py:1198] (1/2) Epoch 24, batch 3650, loss[loss=0.2628, ctc_loss=0.1522, cr_loss=0.4154, attn_decoder_loss=0.2659, over 29480.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1288, cr_loss=0.3705, attn_decoder_loss=0.2465, over 5793097.12 frames. ], batch size: 90, lr: 4.60e-03, grad_scale: 16.0 2024-09-18 11:11:22,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2024-09-18 11:11:26,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=430900.0, ans=0.125 2024-09-18 11:11:43,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2024-09-18 11:11:49,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=430980.0, ans=0.1 2024-09-18 11:11:56,435 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.409e+01 9.046e+01 9.842e+01 1.750e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-18 11:11:58,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=430980.0, ans=0.125 2024-09-18 11:12:32,072 INFO [train.py:1198] (1/2) Epoch 24, batch 3700, loss[loss=0.2459, ctc_loss=0.127, cr_loss=0.3674, attn_decoder_loss=0.251, over 29695.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1288, cr_loss=0.371, attn_decoder_loss=0.2466, over 5802942.00 frames. ], batch size: 84, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:12:32,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=431100.0, ans=0.025 2024-09-18 11:13:07,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.55 vs. limit=15.0 2024-09-18 11:13:44,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=431260.0, ans=12.0 2024-09-18 11:13:48,724 INFO [train.py:1198] (1/2) Epoch 24, batch 3750, loss[loss=0.2196, ctc_loss=0.1141, cr_loss=0.3517, attn_decoder_loss=0.2235, over 29376.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1288, cr_loss=0.3712, attn_decoder_loss=0.2465, over 5806875.66 frames. ], batch size: 67, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:13:56,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=431300.0, ans=0.0 2024-09-18 11:14:03,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=7.97 vs. limit=22.5 2024-09-18 11:14:09,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=431340.0, ans=0.125 2024-09-18 11:14:09,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=431340.0, ans=0.125 2024-09-18 11:14:24,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=431380.0, ans=0.95 2024-09-18 11:14:27,338 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.348e+01 8.770e+01 9.473e+01 2.105e+02, threshold=1.754e+02, percent-clipped=1.0 2024-09-18 11:14:29,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=431380.0, ans=0.0 2024-09-18 11:14:51,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=431460.0, ans=0.125 2024-09-18 11:15:03,202 INFO [train.py:1198] (1/2) Epoch 24, batch 3800, loss[loss=0.256, ctc_loss=0.1385, cr_loss=0.3968, attn_decoder_loss=0.2602, over 29625.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1284, cr_loss=0.3702, attn_decoder_loss=0.2461, over 5798659.79 frames. ], batch size: 86, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:15:26,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=431540.0, ans=0.125 2024-09-18 11:15:39,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=431580.0, ans=0.025 2024-09-18 11:15:39,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=431580.0, ans=0.0 2024-09-18 11:16:07,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431660.0, ans=0.125 2024-09-18 11:16:19,239 INFO [train.py:1198] (1/2) Epoch 24, batch 3850, loss[loss=0.253, ctc_loss=0.1344, cr_loss=0.39, attn_decoder_loss=0.2575, over 29300.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1278, cr_loss=0.3696, attn_decoder_loss=0.2457, over 5811703.34 frames. ], batch size: 100, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:16:20,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=431700.0, ans=0.5 2024-09-18 11:16:23,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=431700.0, ans=0.125 2024-09-18 11:16:30,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-18 11:16:31,190 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:16:50,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431780.0, ans=0.1 2024-09-18 11:16:57,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.427e+01 9.024e+01 9.626e+01 1.408e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-18 11:16:58,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=431780.0, ans=0.09899494936611666 2024-09-18 11:17:18,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=431860.0, ans=0.125 2024-09-18 11:17:21,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=431860.0, ans=0.2 2024-09-18 11:17:33,515 INFO [train.py:1198] (1/2) Epoch 24, batch 3900, loss[loss=0.2595, ctc_loss=0.138, cr_loss=0.3959, attn_decoder_loss=0.2642, over 29626.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1282, cr_loss=0.3707, attn_decoder_loss=0.2461, over 5816358.76 frames. ], batch size: 86, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:17:45,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=431900.0, ans=0.0 2024-09-18 11:17:48,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=431940.0, ans=0.125 2024-09-18 11:17:54,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=431940.0, ans=0.125 2024-09-18 11:17:57,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=431940.0, ans=0.2 2024-09-18 11:18:03,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=431980.0, ans=0.125 2024-09-18 11:18:43,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=432060.0, ans=0.0 2024-09-18 11:18:55,357 INFO [train.py:1198] (1/2) Epoch 24, batch 3950, loss[loss=0.2511, ctc_loss=0.1317, cr_loss=0.384, attn_decoder_loss=0.2558, over 29449.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1275, cr_loss=0.3695, attn_decoder_loss=0.246, over 5835801.72 frames. ], batch size: 97, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:18:56,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-09-18 11:18:57,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=432100.0, ans=0.125 2024-09-18 11:19:00,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=432100.0, ans=0.2 2024-09-18 11:19:06,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432100.0, ans=0.1 2024-09-18 11:19:30,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2024-09-18 11:19:35,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.348e+01 8.902e+01 9.353e+01 3.258e+02, threshold=1.780e+02, percent-clipped=1.0 2024-09-18 11:19:37,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=432180.0, ans=0.125 2024-09-18 11:19:38,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=432180.0, ans=0.0 2024-09-18 11:19:47,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=432220.0, ans=0.2 2024-09-18 11:20:00,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=432260.0, ans=0.125 2024-09-18 11:20:09,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=432300.0, ans=0.0 2024-09-18 11:20:10,532 INFO [train.py:1198] (1/2) Epoch 24, batch 4000, loss[loss=0.22, ctc_loss=0.1086, cr_loss=0.3315, attn_decoder_loss=0.225, over 29520.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1278, cr_loss=0.37, attn_decoder_loss=0.2461, over 5813650.97 frames. ], batch size: 74, lr: 4.59e-03, grad_scale: 16.0 2024-09-18 11:20:28,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432340.0, ans=0.1 2024-09-18 11:20:34,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=432340.0, ans=0.125 2024-09-18 11:20:43,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=432380.0, ans=0.2 2024-09-18 11:20:43,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=432380.0, ans=0.125 2024-09-18 11:21:05,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=432420.0, ans=0.5 2024-09-18 11:21:10,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=432460.0, ans=0.125 2024-09-18 11:21:25,731 INFO [train.py:1198] (1/2) Epoch 24, batch 4050, loss[loss=0.265, ctc_loss=0.1635, cr_loss=0.3888, attn_decoder_loss=0.2676, over 19825.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1279, cr_loss=0.3693, attn_decoder_loss=0.2462, over 5797268.41 frames. ], batch size: 210, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:21:26,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=432500.0, ans=0.0 2024-09-18 11:21:26,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-09-18 11:21:38,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2024-09-18 11:21:56,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=432580.0, ans=0.2 2024-09-18 11:22:05,480 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.379e+01 9.029e+01 9.565e+01 1.787e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-18 11:22:26,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432660.0, ans=0.1 2024-09-18 11:22:39,426 INFO [train.py:1198] (1/2) Epoch 24, batch 4100, loss[loss=0.2612, ctc_loss=0.1463, cr_loss=0.4071, attn_decoder_loss=0.2649, over 29515.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1285, cr_loss=0.3703, attn_decoder_loss=0.2465, over 5792973.60 frames. ], batch size: 90, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:22:44,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432700.0, ans=0.1 2024-09-18 11:22:45,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=432700.0, ans=0.125 2024-09-18 11:22:58,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=432740.0, ans=0.125 2024-09-18 11:23:10,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=432780.0, ans=0.125 2024-09-18 11:23:38,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=432860.0, ans=0.1 2024-09-18 11:23:54,710 INFO [train.py:1198] (1/2) Epoch 24, batch 4150, loss[loss=0.2363, ctc_loss=0.1317, cr_loss=0.3711, attn_decoder_loss=0.2397, over 29508.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1286, cr_loss=0.3703, attn_decoder_loss=0.2463, over 5798770.20 frames. ], batch size: 77, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:24:28,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432980.0, ans=0.1 2024-09-18 11:24:34,390 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.706e+01 9.310e+01 1.000e+02 1.548e+02, threshold=1.862e+02, percent-clipped=0.0 2024-09-18 11:24:37,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=433020.0, ans=0.0 2024-09-18 11:24:40,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=433020.0, ans=0.0 2024-09-18 11:24:58,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=433060.0, ans=0.125 2024-09-18 11:24:59,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.72 vs. limit=15.0 2024-09-18 11:25:05,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=433060.0, ans=0.0 2024-09-18 11:25:08,588 INFO [train.py:1198] (1/2) Epoch 24, batch 4200, loss[loss=0.2627, ctc_loss=0.1491, cr_loss=0.4055, attn_decoder_loss=0.2663, over 29537.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1289, cr_loss=0.3711, attn_decoder_loss=0.2468, over 5800216.01 frames. ], batch size: 90, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:25:10,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=433100.0, ans=0.0 2024-09-18 11:25:32,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=433140.0, ans=0.125 2024-09-18 11:25:54,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=433220.0, ans=0.125 2024-09-18 11:26:10,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=433260.0, ans=0.125 2024-09-18 11:26:15,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=12.0 2024-09-18 11:26:19,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=433260.0, ans=0.125 2024-09-18 11:26:23,507 INFO [train.py:1198] (1/2) Epoch 24, batch 4250, loss[loss=0.2194, ctc_loss=0.1102, cr_loss=0.3378, attn_decoder_loss=0.224, over 29525.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1284, cr_loss=0.3706, attn_decoder_loss=0.2467, over 5806648.16 frames. ], batch size: 74, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:26:45,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.72 vs. limit=6.0 2024-09-18 11:27:03,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.615e+01 9.136e+01 9.637e+01 1.647e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-18 11:27:05,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=433380.0, ans=0.0 2024-09-18 11:27:15,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=433420.0, ans=0.0 2024-09-18 11:27:18,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=433420.0, ans=0.125 2024-09-18 11:27:27,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=433460.0, ans=0.125 2024-09-18 11:27:34,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=433460.0, ans=0.125 2024-09-18 11:27:38,617 INFO [train.py:1198] (1/2) Epoch 24, batch 4300, loss[loss=0.2484, ctc_loss=0.1294, cr_loss=0.3641, attn_decoder_loss=0.2535, over 29521.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1282, cr_loss=0.3704, attn_decoder_loss=0.2469, over 5796486.45 frames. ], batch size: 87, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:27:47,851 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:27:58,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.33 vs. limit=15.0 2024-09-18 11:28:01,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=433540.0, ans=0.0 2024-09-18 11:28:10,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=433580.0, ans=0.025 2024-09-18 11:28:10,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.55 vs. limit=15.0 2024-09-18 11:28:17,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2024-09-18 11:28:21,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=433580.0, ans=0.125 2024-09-18 11:28:22,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=433620.0, ans=0.125 2024-09-18 11:28:25,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=433620.0, ans=10.0 2024-09-18 11:28:54,031 INFO [train.py:1198] (1/2) Epoch 24, batch 4350, loss[loss=0.2428, ctc_loss=0.1137, cr_loss=0.3334, attn_decoder_loss=0.2498, over 29469.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1313, cr_loss=0.3766, attn_decoder_loss=0.2502, over 5798407.31 frames. ], batch size: 97, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:29:09,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=433740.0, ans=0.125 2024-09-18 11:29:17,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=433740.0, ans=0.125 2024-09-18 11:29:29,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.33 vs. limit=15.0 2024-09-18 11:29:33,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.959e+01 8.974e+01 9.434e+01 1.011e+02 1.996e+02, threshold=1.887e+02, percent-clipped=1.0 2024-09-18 11:30:01,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=433860.0, ans=0.2 2024-09-18 11:30:07,013 INFO [train.py:1198] (1/2) Epoch 24, batch 4400, loss[loss=0.2492, ctc_loss=0.1327, cr_loss=0.3959, attn_decoder_loss=0.2534, over 27559.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1328, cr_loss=0.38, attn_decoder_loss=0.2525, over 5768554.08 frames. ], batch size: 124, lr: 4.58e-03, grad_scale: 16.0 2024-09-18 11:30:13,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=433900.0, ans=0.025 2024-09-18 11:30:16,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=433900.0, ans=0.125 2024-09-18 11:30:32,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=433940.0, ans=0.2 2024-09-18 11:30:43,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=433980.0, ans=0.125 2024-09-18 11:30:43,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=433980.0, ans=0.0 2024-09-18 11:30:53,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=434020.0, ans=0.95 2024-09-18 11:30:59,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=434020.0, ans=0.1 2024-09-18 11:31:21,664 INFO [train.py:1198] (1/2) Epoch 24, batch 4450, loss[loss=0.263, ctc_loss=0.1554, cr_loss=0.3955, attn_decoder_loss=0.2661, over 20132.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1373, cr_loss=0.3848, attn_decoder_loss=0.2551, over 5574638.46 frames. ], batch size: 210, lr: 4.58e-03, grad_scale: 8.0 2024-09-18 11:31:23,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=434100.0, ans=0.125 2024-09-18 11:31:35,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=434140.0, ans=0.1 2024-09-18 11:31:46,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=434140.0, ans=0.025 2024-09-18 11:31:46,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-09-18 11:31:51,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=434180.0, ans=0.125 2024-09-18 11:31:52,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-09-18 11:31:53,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=434180.0, ans=0.1 2024-09-18 11:32:04,067 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 9.021e+01 9.778e+01 1.211e+02 1.854e+02, threshold=1.956e+02, percent-clipped=0.0 2024-09-18 11:32:04,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=434180.0, ans=0.0 2024-09-18 11:32:12,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2024-09-18 11:32:20,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=434220.0, ans=0.125 2024-09-18 11:32:25,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=434260.0, ans=0.125 2024-09-18 11:32:34,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=434260.0, ans=0.125 2024-09-18 11:32:37,512 INFO [train.py:1198] (1/2) Epoch 24, batch 4500, loss[loss=0.2738, ctc_loss=0.1765, cr_loss=0.4277, attn_decoder_loss=0.2751, over 20221.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1416, cr_loss=0.387, attn_decoder_loss=0.2573, over 5236157.26 frames. ], batch size: 209, lr: 4.58e-03, grad_scale: 8.0 2024-09-18 11:32:45,099 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:33:07,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2024-09-18 11:34:00,538 INFO [train.py:1198] (1/2) Epoch 25, batch 0, loss[loss=0.2239, ctc_loss=0.1064, cr_loss=0.3412, attn_decoder_loss=0.2293, over 29615.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1064, cr_loss=0.3412, attn_decoder_loss=0.2293, over 29615.00 frames. ], batch size: 73, lr: 4.49e-03, grad_scale: 16.0 2024-09-18 11:34:00,538 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 11:34:18,957 INFO [train.py:1230] (1/2) Epoch 25, validation: loss=0.2119, ctc_loss=0.03765, cr_loss=5.538e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 11:34:18,958 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 11:34:53,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=434480.0, ans=0.1 2024-09-18 11:34:57,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=434480.0, ans=0.125 2024-09-18 11:35:36,617 INFO [train.py:1198] (1/2) Epoch 25, batch 50, loss[loss=0.2203, ctc_loss=0.1107, cr_loss=0.332, attn_decoder_loss=0.2251, over 29407.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1304, cr_loss=0.3766, attn_decoder_loss=0.2477, over 1269232.14 frames. ], batch size: 70, lr: 4.49e-03, grad_scale: 8.0 2024-09-18 11:35:42,736 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.952e+01 1.043e+02 1.177e+02 2.373e+02, threshold=2.086e+02, percent-clipped=2.0 2024-09-18 11:35:42,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=434600.0, ans=0.2 2024-09-18 11:36:02,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=434640.0, ans=0.125 2024-09-18 11:36:19,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=434680.0, ans=0.125 2024-09-18 11:36:53,504 INFO [train.py:1198] (1/2) Epoch 25, batch 100, loss[loss=0.2325, ctc_loss=0.124, cr_loss=0.36, attn_decoder_loss=0.2365, over 29543.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1319, cr_loss=0.3785, attn_decoder_loss=0.2501, over 2254033.48 frames. ], batch size: 76, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:37:16,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2024-09-18 11:37:23,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=434880.0, ans=0.0 2024-09-18 11:37:32,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=434880.0, ans=0.125 2024-09-18 11:37:43,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=22.5 2024-09-18 11:38:04,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=434960.0, ans=0.0 2024-09-18 11:38:05,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-09-18 11:38:08,148 INFO [train.py:1198] (1/2) Epoch 25, batch 150, loss[loss=0.2123, ctc_loss=0.1123, cr_loss=0.3391, attn_decoder_loss=0.2159, over 29420.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1297, cr_loss=0.3739, attn_decoder_loss=0.2473, over 3048048.12 frames. ], batch size: 70, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:38:08,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=435000.0, ans=0.125 2024-09-18 11:38:10,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=435000.0, ans=0.125 2024-09-18 11:38:11,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=435000.0, ans=0.0 2024-09-18 11:38:14,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.648e+01 9.269e+01 9.917e+01 1.697e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-18 11:38:20,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=435000.0, ans=0.125 2024-09-18 11:38:23,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=435040.0, ans=0.2 2024-09-18 11:38:35,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=435040.0, ans=0.125 2024-09-18 11:38:43,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.86 vs. limit=15.0 2024-09-18 11:38:47,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=435080.0, ans=0.0 2024-09-18 11:38:48,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=15.0 2024-09-18 11:39:07,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=435160.0, ans=0.125 2024-09-18 11:39:22,454 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:39:22,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=435200.0, ans=0.125 2024-09-18 11:39:24,077 INFO [train.py:1198] (1/2) Epoch 25, batch 200, loss[loss=0.2609, ctc_loss=0.145, cr_loss=0.3935, attn_decoder_loss=0.2651, over 27369.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1287, cr_loss=0.372, attn_decoder_loss=0.2461, over 3658365.28 frames. ], batch size: 124, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:39:24,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2024-09-18 11:39:31,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=435200.0, ans=0.125 2024-09-18 11:39:34,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=435200.0, ans=0.0 2024-09-18 11:40:03,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435280.0, ans=0.1 2024-09-18 11:40:04,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-09-18 11:40:07,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.25 vs. limit=10.0 2024-09-18 11:40:14,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=435320.0, ans=0.2 2024-09-18 11:40:25,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-18 11:40:38,973 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:40:44,515 INFO [train.py:1198] (1/2) Epoch 25, batch 250, loss[loss=0.2588, ctc_loss=0.1406, cr_loss=0.3836, attn_decoder_loss=0.2634, over 29166.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1282, cr_loss=0.3718, attn_decoder_loss=0.2463, over 4141460.07 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:40:50,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.660e+01 8.459e+01 8.857e+01 9.365e+01 1.077e+02, threshold=1.771e+02, percent-clipped=0.0 2024-09-18 11:40:55,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=435400.0, ans=0.0 2024-09-18 11:41:18,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=435480.0, ans=0.0 2024-09-18 11:41:26,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.40 vs. limit=22.5 2024-09-18 11:41:30,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=435520.0, ans=0.125 2024-09-18 11:41:51,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=435560.0, ans=0.2 2024-09-18 11:41:52,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=435560.0, ans=0.125 2024-09-18 11:41:59,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=435600.0, ans=0.125 2024-09-18 11:42:00,691 INFO [train.py:1198] (1/2) Epoch 25, batch 300, loss[loss=0.252, ctc_loss=0.133, cr_loss=0.3814, attn_decoder_loss=0.2567, over 29514.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.128, cr_loss=0.3717, attn_decoder_loss=0.2464, over 4507734.67 frames. ], batch size: 92, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:42:18,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.27 vs. limit=10.0 2024-09-18 11:42:28,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=435640.0, ans=0.0 2024-09-18 11:42:40,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=435680.0, ans=0.125 2024-09-18 11:43:04,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-09-18 11:43:15,963 INFO [train.py:1198] (1/2) Epoch 25, batch 350, loss[loss=0.2194, ctc_loss=0.1093, cr_loss=0.3332, attn_decoder_loss=0.2242, over 29320.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1286, cr_loss=0.3727, attn_decoder_loss=0.247, over 4793791.52 frames. ], batch size: 71, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:43:18,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=22.5 2024-09-18 11:43:21,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.434e+01 8.932e+01 9.530e+01 2.745e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-18 11:43:22,189 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:43:33,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=435840.0, ans=0.2 2024-09-18 11:43:53,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.54 vs. limit=12.0 2024-09-18 11:44:03,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=435880.0, ans=0.1 2024-09-18 11:44:06,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=435920.0, ans=0.125 2024-09-18 11:44:16,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=435920.0, ans=0.2 2024-09-18 11:44:18,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=435920.0, ans=0.125 2024-09-18 11:44:36,423 INFO [train.py:1198] (1/2) Epoch 25, batch 400, loss[loss=0.2445, ctc_loss=0.1279, cr_loss=0.3747, attn_decoder_loss=0.2491, over 29733.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1279, cr_loss=0.3713, attn_decoder_loss=0.2464, over 5023274.57 frames. ], batch size: 82, lr: 4.48e-03, grad_scale: 16.0 2024-09-18 11:45:02,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=436040.0, ans=0.2 2024-09-18 11:45:31,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=436120.0, ans=0.0 2024-09-18 11:45:40,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=436160.0, ans=0.2 2024-09-18 11:45:43,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=436160.0, ans=0.125 2024-09-18 11:45:52,029 INFO [train.py:1198] (1/2) Epoch 25, batch 450, loss[loss=0.254, ctc_loss=0.1386, cr_loss=0.3975, attn_decoder_loss=0.258, over 29691.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.128, cr_loss=0.3704, attn_decoder_loss=0.2465, over 5185224.35 frames. ], batch size: 83, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:45:52,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=436200.0, ans=0.2 2024-09-18 11:45:59,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.625e+01 9.050e+01 9.660e+01 1.722e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-18 11:46:02,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=436200.0, ans=0.1 2024-09-18 11:46:47,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=436320.0, ans=0.2 2024-09-18 11:46:51,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=436360.0, ans=0.125 2024-09-18 11:47:04,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=15.0 2024-09-18 11:47:07,834 INFO [train.py:1198] (1/2) Epoch 25, batch 500, loss[loss=0.2615, ctc_loss=0.1459, cr_loss=0.415, attn_decoder_loss=0.2651, over 29439.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1276, cr_loss=0.3697, attn_decoder_loss=0.2457, over 5328690.67 frames. ], batch size: 94, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:47:08,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=436400.0, ans=0.125 2024-09-18 11:47:39,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.54 vs. limit=15.0 2024-09-18 11:47:40,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=436440.0, ans=0.125 2024-09-18 11:47:41,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=436480.0, ans=0.2 2024-09-18 11:47:49,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.36 vs. limit=12.0 2024-09-18 11:48:07,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=436520.0, ans=0.125 2024-09-18 11:48:08,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=436520.0, ans=0.2 2024-09-18 11:48:19,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=436560.0, ans=0.125 2024-09-18 11:48:23,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=436560.0, ans=0.1 2024-09-18 11:48:28,244 INFO [train.py:1198] (1/2) Epoch 25, batch 550, loss[loss=0.2478, ctc_loss=0.1327, cr_loss=0.3779, attn_decoder_loss=0.2522, over 28894.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1273, cr_loss=0.369, attn_decoder_loss=0.2456, over 5421714.56 frames. ], batch size: 104, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:48:35,875 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.562e+01 9.108e+01 9.510e+01 4.336e+02, threshold=1.822e+02, percent-clipped=3.0 2024-09-18 11:48:39,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=436600.0, ans=0.0 2024-09-18 11:48:54,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=436640.0, ans=0.2 2024-09-18 11:49:06,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=436680.0, ans=0.125 2024-09-18 11:49:06,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=436680.0, ans=0.025 2024-09-18 11:49:14,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=436720.0, ans=0.125 2024-09-18 11:49:26,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=436720.0, ans=0.0 2024-09-18 11:49:26,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=436720.0, ans=0.0 2024-09-18 11:49:45,354 INFO [train.py:1198] (1/2) Epoch 25, batch 600, loss[loss=0.2616, ctc_loss=0.1476, cr_loss=0.403, attn_decoder_loss=0.2653, over 29247.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1275, cr_loss=0.3702, attn_decoder_loss=0.2461, over 5509417.54 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:49:48,630 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:49:54,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=436800.0, ans=0.1 2024-09-18 11:49:57,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=436800.0, ans=0.2 2024-09-18 11:49:59,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=436840.0, ans=0.0 2024-09-18 11:50:16,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=436880.0, ans=0.125 2024-09-18 11:50:18,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=436880.0, ans=0.1 2024-09-18 11:50:21,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=436880.0, ans=0.2 2024-09-18 11:50:36,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=436920.0, ans=0.05 2024-09-18 11:50:37,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=436920.0, ans=0.125 2024-09-18 11:50:45,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=436960.0, ans=0.1 2024-09-18 11:50:48,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=436960.0, ans=0.025 2024-09-18 11:50:56,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436960.0, ans=0.1 2024-09-18 11:51:00,506 INFO [train.py:1198] (1/2) Epoch 25, batch 650, loss[loss=0.2439, ctc_loss=0.1325, cr_loss=0.3875, attn_decoder_loss=0.2477, over 29772.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1271, cr_loss=0.3696, attn_decoder_loss=0.2455, over 5586628.85 frames. ], batch size: 81, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:51:08,135 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.416e+01 8.904e+01 9.509e+01 2.097e+02, threshold=1.781e+02, percent-clipped=1.0 2024-09-18 11:51:09,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=12.0 2024-09-18 11:51:09,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=437000.0, ans=0.125 2024-09-18 11:51:20,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=437040.0, ans=0.125 2024-09-18 11:51:45,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437080.0, ans=0.1 2024-09-18 11:52:21,117 INFO [train.py:1198] (1/2) Epoch 25, batch 700, loss[loss=0.232, ctc_loss=0.1263, cr_loss=0.3659, attn_decoder_loss=0.2356, over 29532.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1277, cr_loss=0.3704, attn_decoder_loss=0.2459, over 5638237.31 frames. ], batch size: 76, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:52:21,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=437200.0, ans=0.0 2024-09-18 11:52:27,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=437200.0, ans=0.0 2024-09-18 11:53:00,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-09-18 11:53:21,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.71 vs. limit=15.0 2024-09-18 11:53:23,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.80 vs. limit=15.0 2024-09-18 11:53:28,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=437360.0, ans=0.125 2024-09-18 11:53:36,276 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:53:37,366 INFO [train.py:1198] (1/2) Epoch 25, batch 750, loss[loss=0.2497, ctc_loss=0.1337, cr_loss=0.3833, attn_decoder_loss=0.2541, over 29717.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1277, cr_loss=0.3703, attn_decoder_loss=0.2456, over 5676319.17 frames. ], batch size: 82, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:53:37,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=437400.0, ans=0.125 2024-09-18 11:53:44,706 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.436e+01 8.901e+01 9.527e+01 2.571e+02, threshold=1.780e+02, percent-clipped=1.0 2024-09-18 11:54:06,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=437480.0, ans=0.125 2024-09-18 11:54:09,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=437480.0, ans=0.125 2024-09-18 11:54:13,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=437480.0, ans=0.125 2024-09-18 11:54:17,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=437480.0, ans=0.125 2024-09-18 11:54:31,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-09-18 11:54:45,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=437560.0, ans=0.125 2024-09-18 11:54:53,469 INFO [train.py:1198] (1/2) Epoch 25, batch 800, loss[loss=0.2175, ctc_loss=0.1042, cr_loss=0.3152, attn_decoder_loss=0.2231, over 29584.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1278, cr_loss=0.3705, attn_decoder_loss=0.2456, over 5707140.16 frames. ], batch size: 73, lr: 4.47e-03, grad_scale: 16.0 2024-09-18 11:54:53,902 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:55:10,531 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:55:11,935 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:55:15,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=437640.0, ans=0.2 2024-09-18 11:55:16,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=437640.0, ans=0.2 2024-09-18 11:55:21,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=437640.0, ans=0.125 2024-09-18 11:55:25,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=437680.0, ans=0.0 2024-09-18 11:55:47,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=437720.0, ans=0.125 2024-09-18 11:55:48,614 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:56:06,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=437760.0, ans=0.0 2024-09-18 11:56:13,698 INFO [train.py:1198] (1/2) Epoch 25, batch 850, loss[loss=0.256, ctc_loss=0.1321, cr_loss=0.3847, attn_decoder_loss=0.2612, over 29726.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1273, cr_loss=0.3701, attn_decoder_loss=0.2455, over 5736012.44 frames. ], batch size: 89, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:56:17,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=437800.0, ans=0.025 2024-09-18 11:56:22,488 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.420e+01 8.934e+01 9.567e+01 3.952e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 11:56:25,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=437800.0, ans=0.0 2024-09-18 11:56:39,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=437840.0, ans=0.5 2024-09-18 11:56:59,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=437920.0, ans=0.125 2024-09-18 11:57:08,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=437920.0, ans=0.0 2024-09-18 11:57:20,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=437960.0, ans=0.0 2024-09-18 11:57:23,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=437960.0, ans=0.1 2024-09-18 11:57:29,253 INFO [train.py:1198] (1/2) Epoch 25, batch 900, loss[loss=0.2141, ctc_loss=0.1036, cr_loss=0.3285, attn_decoder_loss=0.2191, over 29610.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1277, cr_loss=0.3707, attn_decoder_loss=0.2458, over 5739192.88 frames. ], batch size: 73, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:57:41,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=438000.0, ans=0.125 2024-09-18 11:57:53,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=438040.0, ans=0.125 2024-09-18 11:58:10,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=438080.0, ans=0.125 2024-09-18 11:58:17,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=438120.0, ans=0.2 2024-09-18 11:58:21,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=12.0 2024-09-18 11:58:38,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=438160.0, ans=0.125 2024-09-18 11:58:40,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=438160.0, ans=0.025 2024-09-18 11:58:44,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.52 vs. limit=15.0 2024-09-18 11:58:44,558 INFO [train.py:1198] (1/2) Epoch 25, batch 950, loss[loss=0.2283, ctc_loss=0.1148, cr_loss=0.3464, attn_decoder_loss=0.2333, over 29500.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1278, cr_loss=0.37, attn_decoder_loss=0.246, over 5740247.79 frames. ], batch size: 74, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:58:53,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.540e+01 9.168e+01 9.959e+01 1.680e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-18 11:59:13,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=438280.0, ans=0.07 2024-09-18 11:59:18,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=438280.0, ans=0.2 2024-09-18 11:59:59,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2024-09-18 12:00:04,896 INFO [train.py:1198] (1/2) Epoch 25, batch 1000, loss[loss=0.2447, ctc_loss=0.1345, cr_loss=0.3837, attn_decoder_loss=0.2484, over 29499.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1289, cr_loss=0.372, attn_decoder_loss=0.2469, over 5733262.52 frames. ], batch size: 77, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 12:00:08,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=438400.0, ans=0.09899494936611666 2024-09-18 12:00:18,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=438440.0, ans=0.125 2024-09-18 12:00:37,240 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:00:52,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=438520.0, ans=0.0 2024-09-18 12:00:56,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=438520.0, ans=0.125 2024-09-18 12:01:02,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=438520.0, ans=0.0 2024-09-18 12:01:05,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=438560.0, ans=0.0 2024-09-18 12:01:07,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=438560.0, ans=0.125 2024-09-18 12:01:20,734 INFO [train.py:1198] (1/2) Epoch 25, batch 1050, loss[loss=0.2534, ctc_loss=0.1381, cr_loss=0.4028, attn_decoder_loss=0.2573, over 29684.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1283, cr_loss=0.3712, attn_decoder_loss=0.2461, over 5742235.47 frames. ], batch size: 85, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 12:01:24,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-09-18 12:01:25,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=438600.0, ans=0.125 2024-09-18 12:01:29,762 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.550e+01 9.112e+01 9.812e+01 2.455e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 12:01:37,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=438640.0, ans=0.035 2024-09-18 12:02:14,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2024-09-18 12:02:15,707 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:02:21,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=438760.0, ans=0.125 2024-09-18 12:02:24,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=438760.0, ans=0.025 2024-09-18 12:02:24,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=438760.0, ans=0.125 2024-09-18 12:02:36,545 INFO [train.py:1198] (1/2) Epoch 25, batch 1100, loss[loss=0.2428, ctc_loss=0.1197, cr_loss=0.3684, attn_decoder_loss=0.2483, over 29452.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1276, cr_loss=0.3696, attn_decoder_loss=0.2455, over 5755622.99 frames. ], batch size: 78, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:02:51,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.42 vs. limit=10.0 2024-09-18 12:02:53,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=438840.0, ans=0.0 2024-09-18 12:02:57,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=438840.0, ans=0.04949747468305833 2024-09-18 12:02:59,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=438840.0, ans=0.125 2024-09-18 12:03:02,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=438840.0, ans=0.125 2024-09-18 12:03:04,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=438840.0, ans=0.125 2024-09-18 12:03:15,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=438880.0, ans=0.0 2024-09-18 12:03:19,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=438880.0, ans=0.125 2024-09-18 12:03:44,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=438960.0, ans=0.125 2024-09-18 12:03:48,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.89 vs. limit=10.0 2024-09-18 12:03:55,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=439000.0, ans=0.125 2024-09-18 12:03:56,687 INFO [train.py:1198] (1/2) Epoch 25, batch 1150, loss[loss=0.2349, ctc_loss=0.1204, cr_loss=0.3499, attn_decoder_loss=0.2398, over 29457.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1278, cr_loss=0.3697, attn_decoder_loss=0.2458, over 5754249.69 frames. ], batch size: 78, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:04:04,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=439000.0, ans=0.125 2024-09-18 12:04:05,928 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.574e+01 9.064e+01 9.855e+01 2.778e+02, threshold=1.813e+02, percent-clipped=2.0 2024-09-18 12:04:33,909 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:04:33,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=439080.0, ans=0.2 2024-09-18 12:05:13,552 INFO [train.py:1198] (1/2) Epoch 25, batch 1200, loss[loss=0.244, ctc_loss=0.1219, cr_loss=0.3481, attn_decoder_loss=0.2498, over 29691.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1288, cr_loss=0.3716, attn_decoder_loss=0.2466, over 5747590.72 frames. ], batch size: 85, lr: 4.46e-03, grad_scale: 16.0 2024-09-18 12:05:15,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=439200.0, ans=0.125 2024-09-18 12:05:22,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=439200.0, ans=0.0 2024-09-18 12:05:36,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=439240.0, ans=0.0 2024-09-18 12:06:16,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=439360.0, ans=0.125 2024-09-18 12:06:17,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=439360.0, ans=0.125 2024-09-18 12:06:25,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=439360.0, ans=0.125 2024-09-18 12:06:29,680 INFO [train.py:1198] (1/2) Epoch 25, batch 1250, loss[loss=0.2531, ctc_loss=0.1322, cr_loss=0.3846, attn_decoder_loss=0.258, over 29533.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1289, cr_loss=0.3723, attn_decoder_loss=0.2471, over 5775062.60 frames. ], batch size: 92, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:06:33,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=439400.0, ans=0.2 2024-09-18 12:06:40,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.697e+01 9.266e+01 9.820e+01 4.128e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-18 12:06:45,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=439440.0, ans=0.125 2024-09-18 12:07:04,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-09-18 12:07:29,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.59 vs. limit=15.0 2024-09-18 12:07:49,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=439600.0, ans=0.2 2024-09-18 12:07:50,379 INFO [train.py:1198] (1/2) Epoch 25, batch 1300, loss[loss=0.2485, ctc_loss=0.1319, cr_loss=0.3636, attn_decoder_loss=0.2534, over 28075.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1279, cr_loss=0.3702, attn_decoder_loss=0.2462, over 5778324.42 frames. ], batch size: 111, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:08:23,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.97 vs. limit=10.0 2024-09-18 12:08:29,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=439680.0, ans=0.125 2024-09-18 12:08:39,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=439720.0, ans=0.125 2024-09-18 12:08:40,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=439720.0, ans=0.125 2024-09-18 12:09:03,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=439760.0, ans=0.2 2024-09-18 12:09:06,245 INFO [train.py:1198] (1/2) Epoch 25, batch 1350, loss[loss=0.2457, ctc_loss=0.1374, cr_loss=0.3771, attn_decoder_loss=0.2493, over 29746.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1274, cr_loss=0.3695, attn_decoder_loss=0.2456, over 5796859.96 frames. ], batch size: 81, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:09:08,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=439800.0, ans=0.2 2024-09-18 12:09:16,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.561e+01 9.293e+01 1.003e+02 2.081e+02, threshold=1.859e+02, percent-clipped=1.0 2024-09-18 12:09:28,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=439840.0, ans=0.125 2024-09-18 12:09:48,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=439880.0, ans=0.1 2024-09-18 12:10:04,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=439960.0, ans=0.2 2024-09-18 12:10:21,578 INFO [train.py:1198] (1/2) Epoch 25, batch 1400, loss[loss=0.2092, ctc_loss=0.09743, cr_loss=0.2977, attn_decoder_loss=0.215, over 29612.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1274, cr_loss=0.3695, attn_decoder_loss=0.2455, over 5807620.00 frames. ], batch size: 69, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:10:28,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=440000.0, ans=0.025 2024-09-18 12:10:47,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=440040.0, ans=0.0 2024-09-18 12:10:53,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.08 vs. limit=12.0 2024-09-18 12:10:56,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=440080.0, ans=0.125 2024-09-18 12:11:00,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=440080.0, ans=0.125 2024-09-18 12:11:15,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=440120.0, ans=0.0 2024-09-18 12:11:41,713 INFO [train.py:1198] (1/2) Epoch 25, batch 1450, loss[loss=0.2669, ctc_loss=0.1448, cr_loss=0.4193, attn_decoder_loss=0.2711, over 29429.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1279, cr_loss=0.3705, attn_decoder_loss=0.2462, over 5805167.73 frames. ], batch size: 94, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:11:51,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=440200.0, ans=0.0 2024-09-18 12:11:52,219 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.790e+01 8.701e+01 9.305e+01 9.884e+01 1.753e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-18 12:11:53,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=440200.0, ans=0.0 2024-09-18 12:12:11,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.62 vs. limit=22.5 2024-09-18 12:12:21,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=12.0 2024-09-18 12:12:46,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=440360.0, ans=0.5 2024-09-18 12:12:57,332 INFO [train.py:1198] (1/2) Epoch 25, batch 1500, loss[loss=0.2509, ctc_loss=0.1402, cr_loss=0.4052, attn_decoder_loss=0.2542, over 29628.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1284, cr_loss=0.3714, attn_decoder_loss=0.2467, over 5806811.32 frames. ], batch size: 86, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:13:17,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=440440.0, ans=0.125 2024-09-18 12:13:24,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=440440.0, ans=0.2 2024-09-18 12:13:27,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=440480.0, ans=0.2 2024-09-18 12:13:32,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=440480.0, ans=0.025 2024-09-18 12:13:33,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=440480.0, ans=0.125 2024-09-18 12:13:37,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=440480.0, ans=0.0 2024-09-18 12:13:47,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=440520.0, ans=0.2 2024-09-18 12:13:52,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=440520.0, ans=0.025 2024-09-18 12:13:53,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440520.0, ans=0.1 2024-09-18 12:13:54,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.15 vs. limit=22.5 2024-09-18 12:13:55,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=440520.0, ans=0.0 2024-09-18 12:14:13,209 INFO [train.py:1198] (1/2) Epoch 25, batch 1550, loss[loss=0.2608, ctc_loss=0.1427, cr_loss=0.4113, attn_decoder_loss=0.2647, over 29499.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1287, cr_loss=0.3718, attn_decoder_loss=0.2467, over 5783024.13 frames. ], batch size: 90, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:14:14,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440600.0, ans=0.1 2024-09-18 12:14:23,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.497e+01 9.186e+01 9.794e+01 2.835e+02, threshold=1.837e+02, percent-clipped=2.0 2024-09-18 12:14:29,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.53 vs. limit=15.0 2024-09-18 12:14:36,069 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:14:47,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=440680.0, ans=0.2 2024-09-18 12:15:03,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=440720.0, ans=0.0 2024-09-18 12:15:25,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=440760.0, ans=0.125 2024-09-18 12:15:33,644 INFO [train.py:1198] (1/2) Epoch 25, batch 1600, loss[loss=0.2372, ctc_loss=0.1173, cr_loss=0.3529, attn_decoder_loss=0.2427, over 29661.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1285, cr_loss=0.3711, attn_decoder_loss=0.2465, over 5764793.43 frames. ], batch size: 85, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 12:15:45,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=440800.0, ans=0.0 2024-09-18 12:15:56,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=440840.0, ans=0.2 2024-09-18 12:16:02,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=440880.0, ans=0.07 2024-09-18 12:16:17,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440920.0, ans=0.1 2024-09-18 12:16:20,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=440920.0, ans=0.0 2024-09-18 12:16:28,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=440920.0, ans=0.0 2024-09-18 12:16:28,574 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:16:31,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=440920.0, ans=0.0 2024-09-18 12:16:49,306 INFO [train.py:1198] (1/2) Epoch 25, batch 1650, loss[loss=0.2556, ctc_loss=0.134, cr_loss=0.402, attn_decoder_loss=0.2602, over 29699.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1285, cr_loss=0.3706, attn_decoder_loss=0.2465, over 5759199.50 frames. ], batch size: 89, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:17:01,297 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.636e+01 9.380e+01 1.005e+02 4.034e+02, threshold=1.876e+02, percent-clipped=3.0 2024-09-18 12:17:22,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=441080.0, ans=0.0 2024-09-18 12:17:30,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=441080.0, ans=0.0 2024-09-18 12:17:33,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=441120.0, ans=0.125 2024-09-18 12:17:49,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.89 vs. limit=15.0 2024-09-18 12:18:04,966 INFO [train.py:1198] (1/2) Epoch 25, batch 1700, loss[loss=0.2224, ctc_loss=0.1154, cr_loss=0.3456, attn_decoder_loss=0.2266, over 29582.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1282, cr_loss=0.3701, attn_decoder_loss=0.2464, over 5781448.75 frames. ], batch size: 69, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:18:25,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.86 vs. limit=15.0 2024-09-18 12:18:29,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=441240.0, ans=0.09899494936611666 2024-09-18 12:18:42,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=441280.0, ans=0.125 2024-09-18 12:18:45,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=441280.0, ans=0.05 2024-09-18 12:18:51,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=441320.0, ans=0.0 2024-09-18 12:19:14,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=441360.0, ans=0.125 2024-09-18 12:19:25,317 INFO [train.py:1198] (1/2) Epoch 25, batch 1750, loss[loss=0.2182, ctc_loss=0.1095, cr_loss=0.3307, attn_decoder_loss=0.2229, over 29301.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1276, cr_loss=0.3691, attn_decoder_loss=0.2458, over 5790219.41 frames. ], batch size: 67, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:19:37,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.435e+01 8.870e+01 9.715e+01 1.342e+02, threshold=1.774e+02, percent-clipped=0.0 2024-09-18 12:19:41,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2024-09-18 12:19:42,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=441440.0, ans=0.125 2024-09-18 12:20:38,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2024-09-18 12:20:41,486 INFO [train.py:1198] (1/2) Epoch 25, batch 1800, loss[loss=0.2548, ctc_loss=0.1408, cr_loss=0.3832, attn_decoder_loss=0.2589, over 29693.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1277, cr_loss=0.3694, attn_decoder_loss=0.2459, over 5792026.91 frames. ], batch size: 83, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:20:53,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441600.0, ans=0.1 2024-09-18 12:21:07,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=441640.0, ans=0.0 2024-09-18 12:21:29,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=441720.0, ans=0.04949747468305833 2024-09-18 12:21:32,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=441720.0, ans=0.2 2024-09-18 12:21:57,722 INFO [train.py:1198] (1/2) Epoch 25, batch 1850, loss[loss=0.2509, ctc_loss=0.1243, cr_loss=0.3524, attn_decoder_loss=0.2572, over 29608.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1275, cr_loss=0.3695, attn_decoder_loss=0.2458, over 5796513.90 frames. ], batch size: 86, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:22:05,656 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:22:09,697 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.488e+01 8.939e+01 9.551e+01 1.184e+02, threshold=1.788e+02, percent-clipped=0.0 2024-09-18 12:22:15,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=441840.0, ans=0.125 2024-09-18 12:22:19,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441840.0, ans=0.1 2024-09-18 12:22:19,160 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:23:11,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=441960.0, ans=0.125 2024-09-18 12:23:15,214 INFO [train.py:1198] (1/2) Epoch 25, batch 1900, loss[loss=0.2625, ctc_loss=0.1466, cr_loss=0.4076, attn_decoder_loss=0.2663, over 29710.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1282, cr_loss=0.3714, attn_decoder_loss=0.2466, over 5804424.74 frames. ], batch size: 89, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:23:28,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=442000.0, ans=0.1 2024-09-18 12:23:36,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=442040.0, ans=0.125 2024-09-18 12:23:43,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=442040.0, ans=0.0 2024-09-18 12:23:48,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.13 vs. limit=15.0 2024-09-18 12:23:59,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442080.0, ans=0.1 2024-09-18 12:24:24,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-18 12:24:24,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=442160.0, ans=0.2 2024-09-18 12:24:33,475 INFO [train.py:1198] (1/2) Epoch 25, batch 1950, loss[loss=0.249, ctc_loss=0.1391, cr_loss=0.4224, attn_decoder_loss=0.2518, over 29436.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1291, cr_loss=0.3737, attn_decoder_loss=0.2478, over 5819268.93 frames. ], batch size: 78, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:24:45,598 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 8.609e+01 9.254e+01 9.710e+01 4.424e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-18 12:24:45,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=442200.0, ans=0.0 2024-09-18 12:24:47,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=442240.0, ans=0.05 2024-09-18 12:24:47,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=442240.0, ans=0.125 2024-09-18 12:24:58,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=442240.0, ans=0.0 2024-09-18 12:24:59,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=442240.0, ans=0.125 2024-09-18 12:25:39,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=442360.0, ans=0.04949747468305833 2024-09-18 12:25:49,608 INFO [train.py:1198] (1/2) Epoch 25, batch 2000, loss[loss=0.2204, ctc_loss=0.1153, cr_loss=0.3403, attn_decoder_loss=0.2246, over 29359.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1295, cr_loss=0.3741, attn_decoder_loss=0.2483, over 5798153.46 frames. ], batch size: 67, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 12:25:49,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=442400.0, ans=0.05 2024-09-18 12:25:58,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.51 vs. limit=12.0 2024-09-18 12:26:00,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442400.0, ans=0.1 2024-09-18 12:26:12,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-18 12:26:34,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=442480.0, ans=0.0 2024-09-18 12:26:37,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=442520.0, ans=0.125 2024-09-18 12:27:03,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=442560.0, ans=0.125 2024-09-18 12:27:07,724 INFO [train.py:1198] (1/2) Epoch 25, batch 2050, loss[loss=0.2199, ctc_loss=0.1175, cr_loss=0.3413, attn_decoder_loss=0.2237, over 29439.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1289, cr_loss=0.3726, attn_decoder_loss=0.2471, over 5788729.66 frames. ], batch size: 70, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:27:17,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.77 vs. limit=15.0 2024-09-18 12:27:23,554 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 8.436e+01 8.905e+01 9.396e+01 1.982e+02, threshold=1.781e+02, percent-clipped=1.0 2024-09-18 12:27:23,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=442640.0, ans=0.0 2024-09-18 12:27:25,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=442640.0, ans=0.125 2024-09-18 12:27:34,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=442640.0, ans=0.0 2024-09-18 12:27:57,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2024-09-18 12:28:24,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=442800.0, ans=0.2 2024-09-18 12:28:25,821 INFO [train.py:1198] (1/2) Epoch 25, batch 2100, loss[loss=0.2338, ctc_loss=0.1136, cr_loss=0.3459, attn_decoder_loss=0.2395, over 29750.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1281, cr_loss=0.3709, attn_decoder_loss=0.2464, over 5800945.29 frames. ], batch size: 81, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:28:39,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=442840.0, ans=0.1 2024-09-18 12:28:56,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=442880.0, ans=0.025 2024-09-18 12:29:05,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=442880.0, ans=0.125 2024-09-18 12:29:12,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=442920.0, ans=0.0 2024-09-18 12:29:17,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=442920.0, ans=0.0 2024-09-18 12:29:27,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=442960.0, ans=0.0 2024-09-18 12:29:41,142 INFO [train.py:1198] (1/2) Epoch 25, batch 2150, loss[loss=0.2337, ctc_loss=0.1195, cr_loss=0.3757, attn_decoder_loss=0.2381, over 29443.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1274, cr_loss=0.3697, attn_decoder_loss=0.2459, over 5815827.38 frames. ], batch size: 78, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:29:42,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=443000.0, ans=0.04949747468305833 2024-09-18 12:29:52,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=443000.0, ans=0.0 2024-09-18 12:29:54,895 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.437e+01 8.935e+01 9.622e+01 1.303e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-18 12:29:55,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=443040.0, ans=0.125 2024-09-18 12:30:10,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2024-09-18 12:30:21,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=443080.0, ans=0.2 2024-09-18 12:30:22,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.86 vs. limit=15.0 2024-09-18 12:30:22,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=443080.0, ans=0.125 2024-09-18 12:30:30,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443120.0, ans=0.1 2024-09-18 12:30:37,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2024-09-18 12:30:59,553 INFO [train.py:1198] (1/2) Epoch 25, batch 2200, loss[loss=0.2385, ctc_loss=0.1272, cr_loss=0.3605, attn_decoder_loss=0.2428, over 29624.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1274, cr_loss=0.3699, attn_decoder_loss=0.2457, over 5813657.21 frames. ], batch size: 86, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:31:10,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-09-18 12:31:11,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.09 vs. limit=15.0 2024-09-18 12:31:15,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=443240.0, ans=0.05 2024-09-18 12:31:20,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=443240.0, ans=0.0 2024-09-18 12:32:05,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=443360.0, ans=0.0 2024-09-18 12:32:07,252 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:32:11,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=443360.0, ans=0.2 2024-09-18 12:32:17,477 INFO [train.py:1198] (1/2) Epoch 25, batch 2250, loss[loss=0.2493, ctc_loss=0.1282, cr_loss=0.3874, attn_decoder_loss=0.2541, over 29715.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1272, cr_loss=0.3695, attn_decoder_loss=0.2456, over 5812335.81 frames. ], batch size: 82, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:32:25,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=443400.0, ans=0.0 2024-09-18 12:32:31,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.316e+01 8.883e+01 9.424e+01 4.658e+02, threshold=1.777e+02, percent-clipped=2.0 2024-09-18 12:32:46,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-09-18 12:33:01,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=443520.0, ans=0.025 2024-09-18 12:33:22,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=443560.0, ans=0.125 2024-09-18 12:33:24,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=443560.0, ans=0.125 2024-09-18 12:33:33,023 INFO [train.py:1198] (1/2) Epoch 25, batch 2300, loss[loss=0.2235, ctc_loss=0.1155, cr_loss=0.3487, attn_decoder_loss=0.2278, over 29307.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1266, cr_loss=0.3681, attn_decoder_loss=0.2447, over 5800365.13 frames. ], batch size: 71, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:33:38,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=12.0 2024-09-18 12:33:40,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=443600.0, ans=0.125 2024-09-18 12:33:48,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=443640.0, ans=0.125 2024-09-18 12:34:08,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=443680.0, ans=0.125 2024-09-18 12:34:13,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=443680.0, ans=0.07 2024-09-18 12:34:50,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=443800.0, ans=0.125 2024-09-18 12:34:51,416 INFO [train.py:1198] (1/2) Epoch 25, batch 2350, loss[loss=0.2533, ctc_loss=0.131, cr_loss=0.3636, attn_decoder_loss=0.2588, over 29708.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1271, cr_loss=0.3693, attn_decoder_loss=0.245, over 5804680.41 frames. ], batch size: 83, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:35:04,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.571e+01 9.088e+01 9.554e+01 1.522e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-18 12:35:05,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=443840.0, ans=0.125 2024-09-18 12:35:09,841 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:35:38,246 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:35:44,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=443920.0, ans=0.125 2024-09-18 12:35:51,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=443920.0, ans=0.0 2024-09-18 12:36:10,420 INFO [train.py:1198] (1/2) Epoch 25, batch 2400, loss[loss=0.2308, ctc_loss=0.1185, cr_loss=0.3588, attn_decoder_loss=0.2353, over 29552.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1278, cr_loss=0.3705, attn_decoder_loss=0.2456, over 5808458.00 frames. ], batch size: 76, lr: 4.44e-03, grad_scale: 16.0 2024-09-18 12:36:15,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=444000.0, ans=0.2 2024-09-18 12:36:15,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=444000.0, ans=0.125 2024-09-18 12:36:21,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=444000.0, ans=0.125 2024-09-18 12:36:47,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=444080.0, ans=0.0 2024-09-18 12:37:06,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=444120.0, ans=0.125 2024-09-18 12:37:09,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=444160.0, ans=0.2 2024-09-18 12:37:26,063 INFO [train.py:1198] (1/2) Epoch 25, batch 2450, loss[loss=0.2493, ctc_loss=0.1383, cr_loss=0.3895, attn_decoder_loss=0.2529, over 29733.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1283, cr_loss=0.3712, attn_decoder_loss=0.2464, over 5783697.38 frames. ], batch size: 82, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:37:38,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=444200.0, ans=0.1 2024-09-18 12:37:40,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.902e+01 9.594e+01 1.053e+02 2.320e+02, threshold=1.919e+02, percent-clipped=3.0 2024-09-18 12:37:51,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=22.5 2024-09-18 12:38:10,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=444280.0, ans=0.125 2024-09-18 12:38:41,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=444360.0, ans=0.2 2024-09-18 12:38:43,740 INFO [train.py:1198] (1/2) Epoch 25, batch 2500, loss[loss=0.2481, ctc_loss=0.1295, cr_loss=0.3685, attn_decoder_loss=0.2531, over 29626.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1282, cr_loss=0.3714, attn_decoder_loss=0.2463, over 5794113.86 frames. ], batch size: 86, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:39:00,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=444440.0, ans=0.025 2024-09-18 12:39:30,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444520.0, ans=0.1 2024-09-18 12:39:31,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=444520.0, ans=0.2 2024-09-18 12:39:35,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=444520.0, ans=0.125 2024-09-18 12:39:39,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=444520.0, ans=0.125 2024-09-18 12:39:41,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2024-09-18 12:39:44,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=444520.0, ans=0.125 2024-09-18 12:40:02,043 INFO [train.py:1198] (1/2) Epoch 25, batch 2550, loss[loss=0.2259, ctc_loss=0.1211, cr_loss=0.3713, attn_decoder_loss=0.2293, over 29335.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1284, cr_loss=0.372, attn_decoder_loss=0.2466, over 5796708.31 frames. ], batch size: 67, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:40:03,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=444600.0, ans=0.05 2024-09-18 12:40:17,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.248e+01 8.745e+01 9.244e+01 1.627e+02, threshold=1.749e+02, percent-clipped=0.0 2024-09-18 12:40:34,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=444680.0, ans=0.125 2024-09-18 12:40:40,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=444680.0, ans=0.5 2024-09-18 12:40:55,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444720.0, ans=0.1 2024-09-18 12:41:18,555 INFO [train.py:1198] (1/2) Epoch 25, batch 2600, loss[loss=0.2336, ctc_loss=0.1204, cr_loss=0.3532, attn_decoder_loss=0.2383, over 29449.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1284, cr_loss=0.3719, attn_decoder_loss=0.2468, over 5792551.67 frames. ], batch size: 78, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:41:28,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444800.0, ans=0.1 2024-09-18 12:41:44,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=444840.0, ans=0.0 2024-09-18 12:41:49,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-09-18 12:42:24,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444960.0, ans=0.1 2024-09-18 12:42:36,073 INFO [train.py:1198] (1/2) Epoch 25, batch 2650, loss[loss=0.2542, ctc_loss=0.1376, cr_loss=0.373, attn_decoder_loss=0.2589, over 29227.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1286, cr_loss=0.3724, attn_decoder_loss=0.2471, over 5799304.18 frames. ], batch size: 100, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:42:51,109 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.486e+01 8.913e+01 9.474e+01 1.768e+02, threshold=1.783e+02, percent-clipped=1.0 2024-09-18 12:43:02,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=445040.0, ans=0.0 2024-09-18 12:43:20,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=445080.0, ans=0.125 2024-09-18 12:43:30,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=445120.0, ans=0.2 2024-09-18 12:43:49,137 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:43:53,473 INFO [train.py:1198] (1/2) Epoch 25, batch 2700, loss[loss=0.245, ctc_loss=0.1315, cr_loss=0.3797, attn_decoder_loss=0.2491, over 29519.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1293, cr_loss=0.3732, attn_decoder_loss=0.2475, over 5793995.42 frames. ], batch size: 87, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:44:01,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=445200.0, ans=0.0 2024-09-18 12:44:14,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=445240.0, ans=0.0 2024-09-18 12:44:16,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=445240.0, ans=0.0 2024-09-18 12:44:28,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=445280.0, ans=0.025 2024-09-18 12:44:34,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=445280.0, ans=0.025 2024-09-18 12:44:39,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=445320.0, ans=0.125 2024-09-18 12:44:51,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-18 12:45:09,582 INFO [train.py:1198] (1/2) Epoch 25, batch 2750, loss[loss=0.2398, ctc_loss=0.1343, cr_loss=0.3793, attn_decoder_loss=0.2431, over 29519.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1285, cr_loss=0.3715, attn_decoder_loss=0.2462, over 5794106.93 frames. ], batch size: 75, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:45:16,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=445400.0, ans=0.0 2024-09-18 12:45:24,757 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.540e+01 9.041e+01 9.626e+01 3.086e+02, threshold=1.808e+02, percent-clipped=2.0 2024-09-18 12:45:29,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=445440.0, ans=0.125 2024-09-18 12:45:48,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=445480.0, ans=0.025 2024-09-18 12:45:57,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445520.0, ans=0.1 2024-09-18 12:46:28,560 INFO [train.py:1198] (1/2) Epoch 25, batch 2800, loss[loss=0.2712, ctc_loss=0.1676, cr_loss=0.3983, attn_decoder_loss=0.2738, over 20165.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1288, cr_loss=0.3723, attn_decoder_loss=0.2466, over 5774040.58 frames. ], batch size: 210, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 12:46:47,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=445640.0, ans=0.125 2024-09-18 12:47:43,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=445760.0, ans=0.025 2024-09-18 12:47:46,146 INFO [train.py:1198] (1/2) Epoch 25, batch 2850, loss[loss=0.232, ctc_loss=0.1162, cr_loss=0.3462, attn_decoder_loss=0.2372, over 29521.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1292, cr_loss=0.3732, attn_decoder_loss=0.2468, over 5759592.33 frames. ], batch size: 77, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:47:58,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=445800.0, ans=0.125 2024-09-18 12:48:02,746 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.566e+01 9.294e+01 9.797e+01 4.897e+02, threshold=1.859e+02, percent-clipped=4.0 2024-09-18 12:49:01,842 INFO [train.py:1198] (1/2) Epoch 25, batch 2900, loss[loss=0.2305, ctc_loss=0.1163, cr_loss=0.3486, attn_decoder_loss=0.2354, over 29421.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1297, cr_loss=0.3748, attn_decoder_loss=0.2477, over 5784697.43 frames. ], batch size: 79, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:49:02,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=446000.0, ans=0.0 2024-09-18 12:49:09,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=446000.0, ans=0.0 2024-09-18 12:49:11,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.47 vs. limit=22.5 2024-09-18 12:49:30,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=446040.0, ans=0.2 2024-09-18 12:49:36,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=446080.0, ans=0.125 2024-09-18 12:49:49,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=446120.0, ans=0.0 2024-09-18 12:49:51,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=446120.0, ans=0.07 2024-09-18 12:50:01,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=446120.0, ans=0.125 2024-09-18 12:50:09,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446160.0, ans=0.1 2024-09-18 12:50:19,428 INFO [train.py:1198] (1/2) Epoch 25, batch 2950, loss[loss=0.2357, ctc_loss=0.1218, cr_loss=0.3566, attn_decoder_loss=0.2405, over 29497.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1285, cr_loss=0.3724, attn_decoder_loss=0.2466, over 5780630.42 frames. ], batch size: 75, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:50:25,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=446200.0, ans=0.2 2024-09-18 12:50:28,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=446200.0, ans=0.125 2024-09-18 12:50:28,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=446200.0, ans=0.125 2024-09-18 12:50:36,056 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.376e+01 8.898e+01 9.637e+01 1.288e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-18 12:50:40,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=446240.0, ans=0.125 2024-09-18 12:50:45,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.66 vs. limit=10.0 2024-09-18 12:50:51,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=446280.0, ans=0.125 2024-09-18 12:50:57,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-09-18 12:51:01,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=446280.0, ans=0.0 2024-09-18 12:51:10,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446320.0, ans=0.1 2024-09-18 12:51:13,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=446320.0, ans=0.2 2024-09-18 12:51:38,071 INFO [train.py:1198] (1/2) Epoch 25, batch 3000, loss[loss=0.2362, ctc_loss=0.1248, cr_loss=0.3586, attn_decoder_loss=0.2406, over 29761.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1281, cr_loss=0.3716, attn_decoder_loss=0.2461, over 5780669.11 frames. ], batch size: 81, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:51:38,072 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 12:51:56,649 INFO [train.py:1230] (1/2) Epoch 25, validation: loss=0.2113, ctc_loss=0.03809, cr_loss=5.582e-15, attn_decoder_loss=0.2305, over 944034.00 frames. 2024-09-18 12:51:56,649 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 12:52:04,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=446400.0, ans=0.125 2024-09-18 12:52:05,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2024-09-18 12:52:13,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=446440.0, ans=0.125 2024-09-18 12:52:15,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=446440.0, ans=0.025 2024-09-18 12:52:19,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2024-09-18 12:52:30,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=446480.0, ans=0.2 2024-09-18 12:52:35,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=446480.0, ans=0.1 2024-09-18 12:52:41,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=446520.0, ans=0.0 2024-09-18 12:53:11,417 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:53:12,534 INFO [train.py:1198] (1/2) Epoch 25, batch 3050, loss[loss=0.24, ctc_loss=0.1317, cr_loss=0.391, attn_decoder_loss=0.2433, over 29545.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1284, cr_loss=0.3717, attn_decoder_loss=0.2469, over 5775150.35 frames. ], batch size: 76, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:53:26,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=446600.0, ans=0.0 2024-09-18 12:53:31,784 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.617e+01 9.221e+01 9.973e+01 3.035e+02, threshold=1.844e+02, percent-clipped=2.0 2024-09-18 12:53:33,595 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:53:48,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446680.0, ans=0.1 2024-09-18 12:54:01,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-09-18 12:54:11,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2024-09-18 12:54:12,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446720.0, ans=0.1 2024-09-18 12:54:12,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.40 vs. limit=15.0 2024-09-18 12:54:15,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=446760.0, ans=0.0 2024-09-18 12:54:17,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=446760.0, ans=0.125 2024-09-18 12:54:30,186 INFO [train.py:1198] (1/2) Epoch 25, batch 3100, loss[loss=0.251, ctc_loss=0.1349, cr_loss=0.3932, attn_decoder_loss=0.2551, over 29239.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1282, cr_loss=0.3709, attn_decoder_loss=0.2467, over 5775844.60 frames. ], batch size: 100, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 12:54:32,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-18 12:55:04,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446880.0, ans=0.1 2024-09-18 12:55:04,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446880.0, ans=0.1 2024-09-18 12:55:24,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=446920.0, ans=0.125 2024-09-18 12:55:28,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=446920.0, ans=0.1 2024-09-18 12:55:29,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=22.5 2024-09-18 12:55:31,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=446960.0, ans=0.0 2024-09-18 12:55:32,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.38 vs. limit=15.0 2024-09-18 12:55:36,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=446960.0, ans=0.125 2024-09-18 12:55:48,326 INFO [train.py:1198] (1/2) Epoch 25, batch 3150, loss[loss=0.2537, ctc_loss=0.1275, cr_loss=0.3817, attn_decoder_loss=0.2593, over 28859.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.128, cr_loss=0.3706, attn_decoder_loss=0.2466, over 5781162.40 frames. ], batch size: 104, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 12:56:05,069 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.834e+01 8.618e+01 9.043e+01 9.824e+01 1.542e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-18 12:56:05,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=447040.0, ans=0.025 2024-09-18 12:56:11,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=447040.0, ans=0.1 2024-09-18 12:56:21,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=447080.0, ans=0.025 2024-09-18 12:56:53,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=447160.0, ans=0.04949747468305833 2024-09-18 12:56:56,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=447160.0, ans=0.0 2024-09-18 12:57:01,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-09-18 12:57:02,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=447200.0, ans=0.125 2024-09-18 12:57:04,420 INFO [train.py:1198] (1/2) Epoch 25, batch 3200, loss[loss=0.2409, ctc_loss=0.1257, cr_loss=0.3542, attn_decoder_loss=0.2458, over 29428.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1275, cr_loss=0.3696, attn_decoder_loss=0.2462, over 5791470.18 frames. ], batch size: 79, lr: 4.42e-03, grad_scale: 16.0 2024-09-18 12:57:14,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=447200.0, ans=0.125 2024-09-18 12:57:34,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=447240.0, ans=0.0 2024-09-18 12:57:44,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=447280.0, ans=0.125 2024-09-18 12:58:08,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.52 vs. limit=12.0 2024-09-18 12:58:22,733 INFO [train.py:1198] (1/2) Epoch 25, batch 3250, loss[loss=0.2526, ctc_loss=0.138, cr_loss=0.391, attn_decoder_loss=0.2566, over 29707.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1277, cr_loss=0.3704, attn_decoder_loss=0.2464, over 5799256.55 frames. ], batch size: 84, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 12:58:27,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=447400.0, ans=0.2 2024-09-18 12:58:40,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.602e+01 9.212e+01 9.778e+01 1.600e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-18 12:58:42,805 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:59:13,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447520.0, ans=0.1 2024-09-18 12:59:15,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=447520.0, ans=0.0 2024-09-18 12:59:27,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-18 12:59:32,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_na.min_abs, batch_count=447560.0, ans=0.02 2024-09-18 12:59:38,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=447560.0, ans=0.0 2024-09-18 12:59:40,960 INFO [train.py:1198] (1/2) Epoch 25, batch 3300, loss[loss=0.2572, ctc_loss=0.1298, cr_loss=0.3601, attn_decoder_loss=0.2633, over 28252.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.127, cr_loss=0.3689, attn_decoder_loss=0.2454, over 5797135.82 frames. ], batch size: 111, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 12:59:41,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=447600.0, ans=0.0 2024-09-18 12:59:49,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.35 vs. limit=15.0 2024-09-18 12:59:50,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=447600.0, ans=0.025 2024-09-18 12:59:53,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=447600.0, ans=0.0 2024-09-18 12:59:57,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.53 vs. limit=22.5 2024-09-18 13:00:07,253 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:00:10,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.41 vs. limit=12.0 2024-09-18 13:00:19,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=447680.0, ans=0.125 2024-09-18 13:00:19,703 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2024-09-18 13:00:46,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447760.0, ans=0.1 2024-09-18 13:00:58,871 INFO [train.py:1198] (1/2) Epoch 25, batch 3350, loss[loss=0.2573, ctc_loss=0.1414, cr_loss=0.3891, attn_decoder_loss=0.2616, over 28896.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1277, cr_loss=0.3703, attn_decoder_loss=0.2461, over 5773554.05 frames. ], batch size: 104, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:01:00,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=447800.0, ans=0.025 2024-09-18 13:01:17,274 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.840e+01 9.298e+01 1.002e+02 3.178e+02, threshold=1.860e+02, percent-clipped=4.0 2024-09-18 13:01:18,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.11 vs. limit=15.0 2024-09-18 13:01:22,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2024-09-18 13:01:25,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2024-09-18 13:01:25,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.37 vs. limit=15.0 2024-09-18 13:01:26,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=447840.0, ans=0.125 2024-09-18 13:01:40,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=447880.0, ans=0.09899494936611666 2024-09-18 13:01:44,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=447920.0, ans=0.07 2024-09-18 13:01:53,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.05 vs. limit=10.0 2024-09-18 13:02:07,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=447960.0, ans=0.125 2024-09-18 13:02:12,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=447960.0, ans=0.2 2024-09-18 13:02:22,690 INFO [train.py:1198] (1/2) Epoch 25, batch 3400, loss[loss=0.2161, ctc_loss=0.1218, cr_loss=0.3638, attn_decoder_loss=0.2185, over 29358.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1283, cr_loss=0.3719, attn_decoder_loss=0.2466, over 5766616.52 frames. ], batch size: 67, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:02:46,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-18 13:02:49,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=448040.0, ans=0.125 2024-09-18 13:02:55,596 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:03:13,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=448120.0, ans=0.125 2024-09-18 13:03:21,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=448120.0, ans=0.05 2024-09-18 13:03:38,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=448160.0, ans=0.125 2024-09-18 13:03:40,800 INFO [train.py:1198] (1/2) Epoch 25, batch 3450, loss[loss=0.2508, ctc_loss=0.1291, cr_loss=0.3668, attn_decoder_loss=0.2561, over 28346.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1282, cr_loss=0.3717, attn_decoder_loss=0.2466, over 5775259.49 frames. ], batch size: 111, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:03:49,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.10 vs. limit=10.0 2024-09-18 13:03:58,830 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.450e+01 9.075e+01 9.587e+01 1.383e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 13:04:12,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=448280.0, ans=0.125 2024-09-18 13:04:17,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=448280.0, ans=0.07 2024-09-18 13:04:40,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=448360.0, ans=0.02 2024-09-18 13:04:58,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=22.5 2024-09-18 13:04:58,914 INFO [train.py:1198] (1/2) Epoch 25, batch 3500, loss[loss=0.2197, ctc_loss=0.1084, cr_loss=0.3317, attn_decoder_loss=0.2247, over 29309.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1283, cr_loss=0.3719, attn_decoder_loss=0.2463, over 5777337.69 frames. ], batch size: 71, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:05:00,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=448400.0, ans=0.125 2024-09-18 13:05:08,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=448400.0, ans=0.125 2024-09-18 13:05:23,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=448440.0, ans=0.125 2024-09-18 13:05:25,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=448440.0, ans=0.07 2024-09-18 13:05:32,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448480.0, ans=0.1 2024-09-18 13:05:40,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=448480.0, ans=0.025 2024-09-18 13:05:41,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=448480.0, ans=0.125 2024-09-18 13:05:57,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=448560.0, ans=0.2 2024-09-18 13:06:00,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.45 vs. limit=15.0 2024-09-18 13:06:05,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=448560.0, ans=0.2 2024-09-18 13:06:13,959 INFO [train.py:1198] (1/2) Epoch 25, batch 3550, loss[loss=0.2607, ctc_loss=0.14, cr_loss=0.4004, attn_decoder_loss=0.2652, over 29707.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1278, cr_loss=0.371, attn_decoder_loss=0.2461, over 5783590.36 frames. ], batch size: 89, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:06:17,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=448600.0, ans=0.5 2024-09-18 13:06:27,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=12.0 2024-09-18 13:06:31,448 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 8.657e+01 9.167e+01 9.744e+01 2.782e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-18 13:06:33,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=448640.0, ans=0.2 2024-09-18 13:06:46,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=448680.0, ans=0.025 2024-09-18 13:06:48,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=448680.0, ans=0.125 2024-09-18 13:06:48,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-09-18 13:07:28,556 INFO [train.py:1198] (1/2) Epoch 25, batch 3600, loss[loss=0.2393, ctc_loss=0.1295, cr_loss=0.3715, attn_decoder_loss=0.2433, over 29499.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1279, cr_loss=0.3714, attn_decoder_loss=0.2462, over 5792523.34 frames. ], batch size: 77, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 13:07:54,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=448840.0, ans=0.05 2024-09-18 13:08:14,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=448920.0, ans=0.125 2024-09-18 13:08:17,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=448920.0, ans=0.0 2024-09-18 13:08:18,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=448920.0, ans=0.125 2024-09-18 13:08:44,785 INFO [train.py:1198] (1/2) Epoch 25, batch 3650, loss[loss=0.2543, ctc_loss=0.1353, cr_loss=0.4022, attn_decoder_loss=0.2585, over 29489.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1275, cr_loss=0.3709, attn_decoder_loss=0.2459, over 5794062.23 frames. ], batch size: 90, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 13:08:46,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=449000.0, ans=0.2 2024-09-18 13:08:52,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=449000.0, ans=0.2 2024-09-18 13:08:58,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2024-09-18 13:09:02,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.539e+01 8.955e+01 9.424e+01 1.447e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-18 13:09:02,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=449040.0, ans=0.07 2024-09-18 13:09:04,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=449040.0, ans=0.2 2024-09-18 13:09:06,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-18 13:09:11,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.07 vs. limit=15.0 2024-09-18 13:09:21,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=449080.0, ans=0.0 2024-09-18 13:09:26,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=8.0 2024-09-18 13:09:58,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=449200.0, ans=0.025 2024-09-18 13:09:59,591 INFO [train.py:1198] (1/2) Epoch 25, batch 3700, loss[loss=0.248, ctc_loss=0.1395, cr_loss=0.3928, attn_decoder_loss=0.2513, over 29697.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1279, cr_loss=0.3719, attn_decoder_loss=0.2464, over 5804408.29 frames. ], batch size: 84, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:09:59,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=449200.0, ans=0.125 2024-09-18 13:10:06,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449200.0, ans=0.1 2024-09-18 13:10:13,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=449240.0, ans=0.125 2024-09-18 13:10:31,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=449280.0, ans=0.2 2024-09-18 13:10:31,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=449280.0, ans=0.125 2024-09-18 13:10:32,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=449280.0, ans=0.1 2024-09-18 13:10:32,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=449280.0, ans=0.0 2024-09-18 13:10:35,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449280.0, ans=0.1 2024-09-18 13:10:47,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=449320.0, ans=0.0 2024-09-18 13:10:59,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=449360.0, ans=0.0 2024-09-18 13:11:01,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=12.0 2024-09-18 13:11:01,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.64 vs. limit=15.0 2024-09-18 13:11:07,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=449360.0, ans=0.1 2024-09-18 13:11:13,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=449360.0, ans=0.2 2024-09-18 13:11:16,099 INFO [train.py:1198] (1/2) Epoch 25, batch 3750, loss[loss=0.2189, ctc_loss=0.1145, cr_loss=0.3291, attn_decoder_loss=0.2232, over 29373.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1277, cr_loss=0.3711, attn_decoder_loss=0.2459, over 5808015.06 frames. ], batch size: 67, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:11:19,566 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:11:31,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=449440.0, ans=0.125 2024-09-18 13:11:35,579 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.392e+01 8.983e+01 9.467e+01 5.174e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 13:11:36,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-09-18 13:11:40,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=449440.0, ans=0.125 2024-09-18 13:11:41,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=449440.0, ans=0.0 2024-09-18 13:12:29,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=449600.0, ans=0.125 2024-09-18 13:12:31,194 INFO [train.py:1198] (1/2) Epoch 25, batch 3800, loss[loss=0.2569, ctc_loss=0.1364, cr_loss=0.3645, attn_decoder_loss=0.2621, over 29650.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1274, cr_loss=0.3704, attn_decoder_loss=0.2455, over 5797762.81 frames. ], batch size: 86, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:12:37,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=449600.0, ans=0.125 2024-09-18 13:12:44,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=449640.0, ans=0.0 2024-09-18 13:12:53,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=449640.0, ans=0.125 2024-09-18 13:13:05,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2024-09-18 13:13:47,360 INFO [train.py:1198] (1/2) Epoch 25, batch 3850, loss[loss=0.2581, ctc_loss=0.1351, cr_loss=0.388, attn_decoder_loss=0.2631, over 29226.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1274, cr_loss=0.3704, attn_decoder_loss=0.2455, over 5811411.39 frames. ], batch size: 100, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:14:06,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.692e+01 9.184e+01 9.971e+01 1.957e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-18 13:14:25,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-09-18 13:14:42,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449920.0, ans=0.1 2024-09-18 13:14:48,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449960.0, ans=0.1 2024-09-18 13:15:02,133 INFO [train.py:1198] (1/2) Epoch 25, batch 3900, loss[loss=0.2481, ctc_loss=0.1239, cr_loss=0.366, attn_decoder_loss=0.2537, over 29654.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1276, cr_loss=0.3702, attn_decoder_loss=0.2457, over 5815723.19 frames. ], batch size: 86, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:15:28,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2024-09-18 13:15:41,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-09-18 13:15:43,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=450080.0, ans=0.0 2024-09-18 13:15:47,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=450120.0, ans=0.125 2024-09-18 13:16:09,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=450160.0, ans=0.125 2024-09-18 13:16:14,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.80 vs. limit=10.0 2024-09-18 13:16:16,560 INFO [train.py:1198] (1/2) Epoch 25, batch 3950, loss[loss=0.2566, ctc_loss=0.1407, cr_loss=0.3944, attn_decoder_loss=0.2608, over 29542.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1273, cr_loss=0.3702, attn_decoder_loss=0.2458, over 5835357.47 frames. ], batch size: 97, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:16:20,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=450200.0, ans=0.0 2024-09-18 13:16:37,517 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.544e+01 9.055e+01 9.627e+01 1.387e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-18 13:17:01,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=450320.0, ans=0.0 2024-09-18 13:17:06,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=22.5 2024-09-18 13:17:11,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=450320.0, ans=0.125 2024-09-18 13:17:11,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=450320.0, ans=0.2 2024-09-18 13:17:23,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=450360.0, ans=0.125 2024-09-18 13:17:26,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=450360.0, ans=0.0 2024-09-18 13:17:32,502 INFO [train.py:1198] (1/2) Epoch 25, batch 4000, loss[loss=0.2278, ctc_loss=0.1197, cr_loss=0.3491, attn_decoder_loss=0.232, over 29488.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1278, cr_loss=0.3705, attn_decoder_loss=0.2458, over 5812345.89 frames. ], batch size: 74, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 13:17:59,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2024-09-18 13:18:18,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=450520.0, ans=0.025 2024-09-18 13:18:33,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=450560.0, ans=0.2 2024-09-18 13:18:45,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=450560.0, ans=0.2 2024-09-18 13:18:48,056 INFO [train.py:1198] (1/2) Epoch 25, batch 4050, loss[loss=0.2645, ctc_loss=0.1609, cr_loss=0.3943, attn_decoder_loss=0.2672, over 19985.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1276, cr_loss=0.3701, attn_decoder_loss=0.2458, over 5796157.98 frames. ], batch size: 209, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:19:08,364 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.851e+01 9.697e+01 1.095e+02 3.076e+02, threshold=1.939e+02, percent-clipped=2.0 2024-09-18 13:19:22,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.31 vs. limit=12.0 2024-09-18 13:19:37,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=450720.0, ans=0.025 2024-09-18 13:19:49,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=450760.0, ans=0.125 2024-09-18 13:19:49,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=450760.0, ans=0.0 2024-09-18 13:19:55,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=450760.0, ans=0.0 2024-09-18 13:19:55,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=450760.0, ans=0.125 2024-09-18 13:20:00,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2024-09-18 13:20:01,363 INFO [train.py:1198] (1/2) Epoch 25, batch 4100, loss[loss=0.2498, ctc_loss=0.1383, cr_loss=0.3961, attn_decoder_loss=0.2534, over 29536.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1276, cr_loss=0.3702, attn_decoder_loss=0.2459, over 5791896.57 frames. ], batch size: 90, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:20:13,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450800.0, ans=0.1 2024-09-18 13:20:14,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=450840.0, ans=0.125 2024-09-18 13:20:19,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=450840.0, ans=0.125 2024-09-18 13:20:22,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=450840.0, ans=0.0 2024-09-18 13:20:28,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.17 vs. limit=15.0 2024-09-18 13:20:33,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=450880.0, ans=0.025 2024-09-18 13:20:37,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-18 13:20:38,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=450880.0, ans=0.125 2024-09-18 13:20:51,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=450920.0, ans=0.125 2024-09-18 13:20:51,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=450920.0, ans=22.5 2024-09-18 13:21:16,046 INFO [train.py:1198] (1/2) Epoch 25, batch 4150, loss[loss=0.2302, ctc_loss=0.1189, cr_loss=0.354, attn_decoder_loss=0.2347, over 29537.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1274, cr_loss=0.3698, attn_decoder_loss=0.2457, over 5798040.81 frames. ], batch size: 77, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:21:19,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=451000.0, ans=0.125 2024-09-18 13:21:23,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=451000.0, ans=0.0 2024-09-18 13:21:32,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=451040.0, ans=0.125 2024-09-18 13:21:36,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.352e+01 8.963e+01 9.819e+01 3.617e+02, threshold=1.793e+02, percent-clipped=2.0 2024-09-18 13:21:41,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=451040.0, ans=0.2 2024-09-18 13:22:03,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=451120.0, ans=0.0 2024-09-18 13:22:07,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=451120.0, ans=0.0 2024-09-18 13:22:18,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=451160.0, ans=0.125 2024-09-18 13:22:30,266 INFO [train.py:1198] (1/2) Epoch 25, batch 4200, loss[loss=0.263, ctc_loss=0.1458, cr_loss=0.3952, attn_decoder_loss=0.2672, over 29470.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1278, cr_loss=0.3709, attn_decoder_loss=0.2461, over 5799972.56 frames. ], batch size: 90, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:22:53,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.52 vs. limit=15.0 2024-09-18 13:23:22,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=451320.0, ans=0.125 2024-09-18 13:23:41,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451360.0, ans=0.1 2024-09-18 13:23:42,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=451360.0, ans=0.125 2024-09-18 13:23:45,271 INFO [train.py:1198] (1/2) Epoch 25, batch 4250, loss[loss=0.231, ctc_loss=0.1168, cr_loss=0.3481, attn_decoder_loss=0.236, over 29489.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1276, cr_loss=0.3704, attn_decoder_loss=0.2464, over 5806422.65 frames. ], batch size: 74, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:23:47,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.90 vs. limit=15.0 2024-09-18 13:24:05,762 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.410e+01 8.848e+01 9.485e+01 3.555e+02, threshold=1.770e+02, percent-clipped=1.0 2024-09-18 13:24:18,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-09-18 13:24:39,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=451520.0, ans=0.0 2024-09-18 13:24:47,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2024-09-18 13:24:48,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=451560.0, ans=0.125 2024-09-18 13:24:51,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=451560.0, ans=0.125 2024-09-18 13:24:59,840 INFO [train.py:1198] (1/2) Epoch 25, batch 4300, loss[loss=0.261, ctc_loss=0.1421, cr_loss=0.4006, attn_decoder_loss=0.2653, over 29516.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1271, cr_loss=0.3691, attn_decoder_loss=0.2465, over 5795780.32 frames. ], batch size: 87, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:25:28,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=451680.0, ans=0.0 2024-09-18 13:25:30,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.28 vs. limit=15.0 2024-09-18 13:26:05,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=451760.0, ans=0.0 2024-09-18 13:26:07,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=451760.0, ans=0.0 2024-09-18 13:26:14,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=451800.0, ans=0.0 2024-09-18 13:26:15,307 INFO [train.py:1198] (1/2) Epoch 25, batch 4350, loss[loss=0.2542, ctc_loss=0.1339, cr_loss=0.3924, attn_decoder_loss=0.2588, over 29476.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1299, cr_loss=0.3754, attn_decoder_loss=0.2498, over 5797801.06 frames. ], batch size: 97, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:26:34,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.38 vs. limit=15.0 2024-09-18 13:26:36,114 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 8.772e+01 9.206e+01 9.719e+01 3.076e+02, threshold=1.841e+02, percent-clipped=2.0 2024-09-18 13:26:41,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-09-18 13:26:42,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=451840.0, ans=0.125 2024-09-18 13:26:46,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=451880.0, ans=0.025 2024-09-18 13:26:51,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-18 13:27:05,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=451920.0, ans=0.1 2024-09-18 13:27:07,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=451920.0, ans=0.0 2024-09-18 13:27:08,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=451920.0, ans=0.2 2024-09-18 13:27:08,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=451920.0, ans=0.125 2024-09-18 13:27:16,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=451960.0, ans=0.0 2024-09-18 13:27:23,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=451960.0, ans=0.1 2024-09-18 13:27:24,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-18 13:27:29,899 INFO [train.py:1198] (1/2) Epoch 25, batch 4400, loss[loss=0.2488, ctc_loss=0.134, cr_loss=0.3881, attn_decoder_loss=0.2529, over 27346.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1314, cr_loss=0.3778, attn_decoder_loss=0.2518, over 5765789.45 frames. ], batch size: 124, lr: 4.40e-03, grad_scale: 16.0 2024-09-18 13:27:56,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=452040.0, ans=0.2 2024-09-18 13:28:00,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=452080.0, ans=0.1 2024-09-18 13:28:12,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=452120.0, ans=10.0 2024-09-18 13:28:18,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=452120.0, ans=0.2 2024-09-18 13:28:22,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.31 vs. limit=15.0 2024-09-18 13:28:24,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=452120.0, ans=0.2 2024-09-18 13:28:30,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=452160.0, ans=0.125 2024-09-18 13:28:44,345 INFO [train.py:1198] (1/2) Epoch 25, batch 4450, loss[loss=0.2657, ctc_loss=0.1607, cr_loss=0.4128, attn_decoder_loss=0.2682, over 20996.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1356, cr_loss=0.3832, attn_decoder_loss=0.2541, over 5572957.68 frames. ], batch size: 209, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:28:53,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=452200.0, ans=0.0 2024-09-18 13:28:58,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452240.0, ans=0.1 2024-09-18 13:29:07,250 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.138e+01 9.104e+01 9.870e+01 1.187e+02 3.111e+02, threshold=1.974e+02, percent-clipped=3.0 2024-09-18 13:29:23,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=8.0 2024-09-18 13:29:25,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452280.0, ans=0.1 2024-09-18 13:29:28,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=452320.0, ans=0.0 2024-09-18 13:29:33,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=452320.0, ans=0.125 2024-09-18 13:29:46,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=452360.0, ans=0.125 2024-09-18 13:30:00,169 INFO [train.py:1198] (1/2) Epoch 25, batch 4500, loss[loss=0.2606, ctc_loss=0.1562, cr_loss=0.3974, attn_decoder_loss=0.2634, over 19792.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1396, cr_loss=0.3863, attn_decoder_loss=0.2564, over 5231045.83 frames. ], batch size: 210, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:30:01,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=12.0 2024-09-18 13:30:18,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=452440.0, ans=0.025 2024-09-18 13:30:23,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-18 13:31:31,900 INFO [train.py:1198] (1/2) Epoch 26, batch 0, loss[loss=0.2164, ctc_loss=0.1106, cr_loss=0.3274, attn_decoder_loss=0.2209, over 29632.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1106, cr_loss=0.3274, attn_decoder_loss=0.2209, over 29632.00 frames. ], batch size: 73, lr: 4.31e-03, grad_scale: 16.0 2024-09-18 13:31:31,901 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 13:31:52,390 INFO [train.py:1230] (1/2) Epoch 26, validation: loss=0.2126, ctc_loss=0.03779, cr_loss=5.994e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-18 13:31:52,391 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 13:31:58,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=452500.0, ans=0.0 2024-09-18 13:32:06,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452540.0, ans=0.1 2024-09-18 13:32:40,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=452620.0, ans=0.0 2024-09-18 13:32:40,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=452620.0, ans=0.1 2024-09-18 13:32:51,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=452660.0, ans=0.0 2024-09-18 13:32:52,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.299e+01 1.068e+02 1.174e+02 2.339e+02, threshold=2.135e+02, percent-clipped=1.0 2024-09-18 13:32:53,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.48 vs. limit=15.0 2024-09-18 13:33:07,811 INFO [train.py:1198] (1/2) Epoch 26, batch 50, loss[loss=0.2145, ctc_loss=0.1051, cr_loss=0.3168, attn_decoder_loss=0.2196, over 29442.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1305, cr_loss=0.3752, attn_decoder_loss=0.2474, over 1267583.64 frames. ], batch size: 70, lr: 4.31e-03, grad_scale: 16.0 2024-09-18 13:33:11,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=452700.0, ans=0.0 2024-09-18 13:33:21,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=452740.0, ans=0.125 2024-09-18 13:33:26,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=452740.0, ans=0.0 2024-09-18 13:33:52,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=452820.0, ans=0.0 2024-09-18 13:34:03,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2024-09-18 13:34:07,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=452860.0, ans=0.125 2024-09-18 13:34:21,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=452860.0, ans=0.0 2024-09-18 13:34:24,116 INFO [train.py:1198] (1/2) Epoch 26, batch 100, loss[loss=0.2258, ctc_loss=0.1187, cr_loss=0.3465, attn_decoder_loss=0.23, over 29524.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.132, cr_loss=0.3794, attn_decoder_loss=0.2498, over 2252577.39 frames. ], batch size: 76, lr: 4.31e-03, grad_scale: 8.0 2024-09-18 13:34:27,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452900.0, ans=0.1 2024-09-18 13:34:48,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=452940.0, ans=0.125 2024-09-18 13:34:55,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=452980.0, ans=0.2 2024-09-18 13:34:58,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=452980.0, ans=0.2 2024-09-18 13:35:00,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=452980.0, ans=0.125 2024-09-18 13:35:01,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452980.0, ans=0.1 2024-09-18 13:35:25,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=453060.0, ans=0.2 2024-09-18 13:35:27,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.544e+01 8.982e+01 9.348e+01 1.241e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-18 13:35:36,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.62 vs. limit=15.0 2024-09-18 13:35:40,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=453060.0, ans=0.125 2024-09-18 13:35:43,441 INFO [train.py:1198] (1/2) Epoch 26, batch 150, loss[loss=0.2075, ctc_loss=0.09893, cr_loss=0.3165, attn_decoder_loss=0.2125, over 29456.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1296, cr_loss=0.376, attn_decoder_loss=0.2474, over 3046642.94 frames. ], batch size: 70, lr: 4.31e-03, grad_scale: 8.0 2024-09-18 13:35:44,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-09-18 13:35:57,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=453140.0, ans=0.125 2024-09-18 13:36:29,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=453220.0, ans=0.0 2024-09-18 13:36:47,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=453260.0, ans=0.125 2024-09-18 13:36:58,963 INFO [train.py:1198] (1/2) Epoch 26, batch 200, loss[loss=0.2513, ctc_loss=0.1345, cr_loss=0.3688, attn_decoder_loss=0.256, over 27384.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1292, cr_loss=0.3752, attn_decoder_loss=0.2465, over 3657986.46 frames. ], batch size: 125, lr: 4.31e-03, grad_scale: 8.0 2024-09-18 13:37:16,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2024-09-18 13:37:27,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=453380.0, ans=0.0 2024-09-18 13:37:30,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=453380.0, ans=0.07 2024-09-18 13:37:42,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=453420.0, ans=0.125 2024-09-18 13:38:00,826 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.772e+01 8.385e+01 8.934e+01 9.482e+01 1.708e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-18 13:38:02,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=453460.0, ans=0.05 2024-09-18 13:38:14,316 INFO [train.py:1198] (1/2) Epoch 26, batch 250, loss[loss=0.2558, ctc_loss=0.1321, cr_loss=0.379, attn_decoder_loss=0.2611, over 29287.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1282, cr_loss=0.3733, attn_decoder_loss=0.2462, over 4140979.26 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:38:47,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453580.0, ans=0.1 2024-09-18 13:38:57,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=453580.0, ans=0.125 2024-09-18 13:39:05,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=453620.0, ans=0.125 2024-09-18 13:39:09,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=453620.0, ans=0.09899494936611666 2024-09-18 13:39:18,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=453660.0, ans=0.025 2024-09-18 13:39:34,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=453700.0, ans=0.125 2024-09-18 13:39:35,290 INFO [train.py:1198] (1/2) Epoch 26, batch 300, loss[loss=0.2509, ctc_loss=0.1299, cr_loss=0.3904, attn_decoder_loss=0.2556, over 29587.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1277, cr_loss=0.3719, attn_decoder_loss=0.246, over 4509074.61 frames. ], batch size: 92, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:40:28,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=453820.0, ans=0.025 2024-09-18 13:40:37,318 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.450e+01 8.925e+01 9.500e+01 1.325e+02, threshold=1.785e+02, percent-clipped=0.0 2024-09-18 13:40:50,742 INFO [train.py:1198] (1/2) Epoch 26, batch 350, loss[loss=0.212, ctc_loss=0.1126, cr_loss=0.3531, attn_decoder_loss=0.2152, over 29327.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1278, cr_loss=0.3716, attn_decoder_loss=0.2463, over 4793470.11 frames. ], batch size: 71, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:42:05,714 INFO [train.py:1198] (1/2) Epoch 26, batch 400, loss[loss=0.2484, ctc_loss=0.1341, cr_loss=0.3927, attn_decoder_loss=0.2524, over 29718.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.127, cr_loss=0.3706, attn_decoder_loss=0.2459, over 5023879.84 frames. ], batch size: 82, lr: 4.30e-03, grad_scale: 16.0 2024-09-18 13:42:19,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=454140.0, ans=0.025 2024-09-18 13:42:53,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=454220.0, ans=0.125 2024-09-18 13:43:03,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=454220.0, ans=0.1 2024-09-18 13:43:07,925 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 8.396e+01 8.968e+01 9.786e+01 1.327e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 13:43:16,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=454260.0, ans=0.125 2024-09-18 13:43:26,144 INFO [train.py:1198] (1/2) Epoch 26, batch 450, loss[loss=0.2394, ctc_loss=0.1263, cr_loss=0.3765, attn_decoder_loss=0.2436, over 29685.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1271, cr_loss=0.3701, attn_decoder_loss=0.2458, over 5185330.31 frames. ], batch size: 83, lr: 4.30e-03, grad_scale: 16.0 2024-09-18 13:43:27,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=454300.0, ans=0.025 2024-09-18 13:43:38,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=454300.0, ans=0.0 2024-09-18 13:44:13,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2024-09-18 13:44:14,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=454420.0, ans=0.0 2024-09-18 13:44:15,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=454420.0, ans=0.125 2024-09-18 13:44:24,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=454420.0, ans=0.125 2024-09-18 13:44:27,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=454460.0, ans=0.125 2024-09-18 13:44:35,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=454460.0, ans=0.125 2024-09-18 13:44:35,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=454460.0, ans=0.125 2024-09-18 13:44:42,488 INFO [train.py:1198] (1/2) Epoch 26, batch 500, loss[loss=0.2564, ctc_loss=0.1434, cr_loss=0.4162, attn_decoder_loss=0.2597, over 29478.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1264, cr_loss=0.3693, attn_decoder_loss=0.2449, over 5328940.65 frames. ], batch size: 94, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:44:44,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454500.0, ans=0.1 2024-09-18 13:44:50,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454500.0, ans=0.1 2024-09-18 13:44:59,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=454540.0, ans=0.125 2024-09-18 13:45:21,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2024-09-18 13:45:26,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.77 vs. limit=15.0 2024-09-18 13:45:28,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=454620.0, ans=0.09899494936611666 2024-09-18 13:45:35,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2024-09-18 13:45:45,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=454660.0, ans=0.0 2024-09-18 13:45:46,385 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.427e+01 8.869e+01 9.503e+01 2.659e+02, threshold=1.774e+02, percent-clipped=2.0 2024-09-18 13:45:52,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=454660.0, ans=0.025 2024-09-18 13:45:58,558 INFO [train.py:1198] (1/2) Epoch 26, batch 550, loss[loss=0.2525, ctc_loss=0.1381, cr_loss=0.3899, attn_decoder_loss=0.2565, over 28714.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1263, cr_loss=0.3691, attn_decoder_loss=0.245, over 5420840.42 frames. ], batch size: 104, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:46:01,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=454700.0, ans=0.2 2024-09-18 13:46:14,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=454740.0, ans=0.125 2024-09-18 13:46:31,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2024-09-18 13:46:34,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-09-18 13:47:16,785 INFO [train.py:1198] (1/2) Epoch 26, batch 600, loss[loss=0.2541, ctc_loss=0.1347, cr_loss=0.3979, attn_decoder_loss=0.2585, over 29286.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1263, cr_loss=0.3691, attn_decoder_loss=0.2451, over 5508757.64 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:47:44,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=454940.0, ans=0.2 2024-09-18 13:48:22,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.526e+01 8.982e+01 9.575e+01 5.252e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-18 13:48:31,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=455060.0, ans=0.125 2024-09-18 13:48:34,596 INFO [train.py:1198] (1/2) Epoch 26, batch 650, loss[loss=0.244, ctc_loss=0.1291, cr_loss=0.3713, attn_decoder_loss=0.2485, over 29777.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1257, cr_loss=0.3676, attn_decoder_loss=0.2445, over 5585849.47 frames. ], batch size: 81, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:48:35,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2024-09-18 13:48:37,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455100.0, ans=0.1 2024-09-18 13:48:39,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=455100.0, ans=0.0 2024-09-18 13:48:59,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=455140.0, ans=0.0 2024-09-18 13:49:03,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=455180.0, ans=0.2 2024-09-18 13:49:37,626 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:49:46,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=455260.0, ans=0.2 2024-09-18 13:49:49,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=455300.0, ans=0.125 2024-09-18 13:49:50,941 INFO [train.py:1198] (1/2) Epoch 26, batch 700, loss[loss=0.229, ctc_loss=0.1145, cr_loss=0.3331, attn_decoder_loss=0.2343, over 29565.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1263, cr_loss=0.3687, attn_decoder_loss=0.2452, over 5634597.71 frames. ], batch size: 76, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:49:56,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-18 13:50:01,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=455300.0, ans=0.125 2024-09-18 13:50:09,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=455340.0, ans=0.125 2024-09-18 13:50:15,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=455340.0, ans=0.125 2024-09-18 13:50:18,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=455340.0, ans=0.0 2024-09-18 13:50:31,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.92 vs. limit=6.0 2024-09-18 13:50:36,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=455420.0, ans=0.025 2024-09-18 13:50:40,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.46 vs. limit=15.0 2024-09-18 13:50:47,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=455420.0, ans=0.125 2024-09-18 13:50:53,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=455460.0, ans=0.125 2024-09-18 13:50:54,634 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.356e+01 8.785e+01 9.330e+01 1.328e+02, threshold=1.757e+02, percent-clipped=0.0 2024-09-18 13:50:55,085 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:51:00,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455460.0, ans=0.1 2024-09-18 13:51:04,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=455460.0, ans=0.0 2024-09-18 13:51:06,765 INFO [train.py:1198] (1/2) Epoch 26, batch 750, loss[loss=0.2427, ctc_loss=0.1251, cr_loss=0.3597, attn_decoder_loss=0.2478, over 29725.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.126, cr_loss=0.3685, attn_decoder_loss=0.2449, over 5674455.42 frames. ], batch size: 82, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:51:07,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=455500.0, ans=0.125 2024-09-18 13:51:39,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=455580.0, ans=0.1 2024-09-18 13:52:00,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=455620.0, ans=0.0 2024-09-18 13:52:14,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455660.0, ans=0.1 2024-09-18 13:52:25,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=455700.0, ans=0.0 2024-09-18 13:52:26,854 INFO [train.py:1198] (1/2) Epoch 26, batch 800, loss[loss=0.2197, ctc_loss=0.1177, cr_loss=0.3493, attn_decoder_loss=0.2233, over 29589.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1261, cr_loss=0.3687, attn_decoder_loss=0.2449, over 5705936.46 frames. ], batch size: 73, lr: 4.29e-03, grad_scale: 16.0 2024-09-18 13:52:30,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=455700.0, ans=0.2 2024-09-18 13:52:31,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=455700.0, ans=0.0 2024-09-18 13:53:12,422 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:53:24,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=455820.0, ans=0.125 2024-09-18 13:53:30,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=455860.0, ans=0.0 2024-09-18 13:53:31,731 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.438e+01 9.008e+01 9.520e+01 4.430e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-18 13:53:42,296 INFO [train.py:1198] (1/2) Epoch 26, batch 850, loss[loss=0.2559, ctc_loss=0.1311, cr_loss=0.3675, attn_decoder_loss=0.2616, over 29732.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1262, cr_loss=0.3687, attn_decoder_loss=0.2449, over 5736299.98 frames. ], batch size: 89, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:54:31,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=456020.0, ans=0.0 2024-09-18 13:54:39,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=456020.0, ans=0.125 2024-09-18 13:54:55,939 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:54:58,545 INFO [train.py:1198] (1/2) Epoch 26, batch 900, loss[loss=0.2314, ctc_loss=0.1166, cr_loss=0.3672, attn_decoder_loss=0.236, over 29638.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1265, cr_loss=0.3692, attn_decoder_loss=0.2451, over 5739524.24 frames. ], batch size: 73, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:55:03,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=456100.0, ans=0.125 2024-09-18 13:55:47,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=456220.0, ans=0.125 2024-09-18 13:55:53,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=456220.0, ans=0.2 2024-09-18 13:55:56,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=456220.0, ans=0.0 2024-09-18 13:56:05,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=456260.0, ans=0.0 2024-09-18 13:56:07,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.650e+01 9.071e+01 9.568e+01 1.657e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-18 13:56:11,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=456260.0, ans=0.0 2024-09-18 13:56:18,480 INFO [train.py:1198] (1/2) Epoch 26, batch 950, loss[loss=0.2232, ctc_loss=0.1091, cr_loss=0.3182, attn_decoder_loss=0.2288, over 29486.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1264, cr_loss=0.3685, attn_decoder_loss=0.2451, over 5741178.81 frames. ], batch size: 74, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:56:42,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=456340.0, ans=0.125 2024-09-18 13:56:55,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2024-09-18 13:57:00,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=456380.0, ans=0.125 2024-09-18 13:57:02,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=456420.0, ans=0.07 2024-09-18 13:57:06,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=456420.0, ans=0.125 2024-09-18 13:57:33,408 INFO [train.py:1198] (1/2) Epoch 26, batch 1000, loss[loss=0.2234, ctc_loss=0.1072, cr_loss=0.326, attn_decoder_loss=0.2291, over 29502.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1274, cr_loss=0.3705, attn_decoder_loss=0.2459, over 5735840.49 frames. ], batch size: 77, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:57:35,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=456500.0, ans=0.125 2024-09-18 13:57:44,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=456500.0, ans=0.2 2024-09-18 13:58:05,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=456580.0, ans=0.0 2024-09-18 13:58:18,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.24 vs. limit=15.0 2024-09-18 13:58:38,768 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.379e+01 9.044e+01 9.595e+01 2.964e+02, threshold=1.809e+02, percent-clipped=3.0 2024-09-18 13:58:40,602 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:58:43,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=456660.0, ans=0.125 2024-09-18 13:58:49,367 INFO [train.py:1198] (1/2) Epoch 26, batch 1050, loss[loss=0.2557, ctc_loss=0.1292, cr_loss=0.3653, attn_decoder_loss=0.2616, over 29704.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1268, cr_loss=0.3696, attn_decoder_loss=0.2453, over 5744590.59 frames. ], batch size: 85, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:58:49,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=456700.0, ans=0.09899494936611666 2024-09-18 13:58:55,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.11 vs. limit=15.0 2024-09-18 13:58:56,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=456700.0, ans=0.125 2024-09-18 13:58:56,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=456700.0, ans=0.125 2024-09-18 13:58:59,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=456700.0, ans=0.04949747468305833 2024-09-18 13:59:10,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=456740.0, ans=0.125 2024-09-18 13:59:56,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.68 vs. limit=10.0 2024-09-18 14:00:10,565 INFO [train.py:1198] (1/2) Epoch 26, batch 1100, loss[loss=0.2329, ctc_loss=0.1286, cr_loss=0.3874, attn_decoder_loss=0.2359, over 29457.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1267, cr_loss=0.3691, attn_decoder_loss=0.2453, over 5756399.94 frames. ], batch size: 78, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 14:00:32,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=456940.0, ans=0.125 2024-09-18 14:00:38,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=456940.0, ans=0.0 2024-09-18 14:00:45,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=456980.0, ans=0.125 2024-09-18 14:00:56,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=457020.0, ans=0.125 2024-09-18 14:01:07,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.50 vs. limit=15.0 2024-09-18 14:01:15,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.599e+01 9.010e+01 9.619e+01 1.920e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-18 14:01:26,680 INFO [train.py:1198] (1/2) Epoch 26, batch 1150, loss[loss=0.2272, ctc_loss=0.1157, cr_loss=0.3536, attn_decoder_loss=0.2318, over 29449.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1268, cr_loss=0.3698, attn_decoder_loss=0.2454, over 5756156.56 frames. ], batch size: 78, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 14:01:28,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=457100.0, ans=0.05 2024-09-18 14:01:41,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-09-18 14:01:53,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.19 vs. limit=22.5 2024-09-18 14:01:54,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=457140.0, ans=0.025 2024-09-18 14:02:07,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=22.5 2024-09-18 14:02:11,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=22.5 2024-09-18 14:02:28,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.26 vs. limit=15.0 2024-09-18 14:02:44,782 INFO [train.py:1198] (1/2) Epoch 26, batch 1200, loss[loss=0.246, ctc_loss=0.123, cr_loss=0.353, attn_decoder_loss=0.2519, over 29667.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1274, cr_loss=0.37, attn_decoder_loss=0.2464, over 5747629.28 frames. ], batch size: 85, lr: 4.29e-03, grad_scale: 16.0 2024-09-18 14:03:01,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=457340.0, ans=0.05 2024-09-18 14:03:10,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=457340.0, ans=0.125 2024-09-18 14:03:19,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=457380.0, ans=0.1 2024-09-18 14:03:20,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-09-18 14:03:40,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=457420.0, ans=0.0 2024-09-18 14:03:43,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=457420.0, ans=0.125 2024-09-18 14:03:44,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=457420.0, ans=0.125 2024-09-18 14:03:46,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=457460.0, ans=0.125 2024-09-18 14:03:53,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.704e+01 9.142e+01 9.758e+01 1.993e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-18 14:04:02,760 INFO [train.py:1198] (1/2) Epoch 26, batch 1250, loss[loss=0.2574, ctc_loss=0.1422, cr_loss=0.411, attn_decoder_loss=0.2611, over 29547.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.128, cr_loss=0.3711, attn_decoder_loss=0.2468, over 5773961.71 frames. ], batch size: 92, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 14:04:03,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=457500.0, ans=0.125 2024-09-18 14:04:16,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=457540.0, ans=0.95 2024-09-18 14:04:35,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.79 vs. limit=10.0 2024-09-18 14:04:39,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=457580.0, ans=0.1 2024-09-18 14:04:41,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=457580.0, ans=0.1 2024-09-18 14:05:19,421 INFO [train.py:1198] (1/2) Epoch 26, batch 1300, loss[loss=0.2549, ctc_loss=0.144, cr_loss=0.4133, attn_decoder_loss=0.258, over 28331.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1277, cr_loss=0.371, attn_decoder_loss=0.2462, over 5777037.24 frames. ], batch size: 111, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 14:05:19,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=457700.0, ans=0.125 2024-09-18 14:05:30,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=457700.0, ans=0.0 2024-09-18 14:05:31,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=457700.0, ans=0.0 2024-09-18 14:05:46,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=457740.0, ans=0.5 2024-09-18 14:05:48,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=457780.0, ans=0.2 2024-09-18 14:05:54,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=457780.0, ans=0.125 2024-09-18 14:06:08,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2024-09-18 14:06:17,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=457820.0, ans=0.025 2024-09-18 14:06:25,985 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.454e+01 9.061e+01 9.465e+01 1.475e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 14:06:33,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2024-09-18 14:06:35,249 INFO [train.py:1198] (1/2) Epoch 26, batch 1350, loss[loss=0.2351, ctc_loss=0.1184, cr_loss=0.3407, attn_decoder_loss=0.2405, over 29749.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.127, cr_loss=0.3698, attn_decoder_loss=0.2458, over 5793844.55 frames. ], batch size: 81, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:06:47,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-09-18 14:06:54,091 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:07:03,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=457940.0, ans=0.2 2024-09-18 14:07:17,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=457980.0, ans=0.125 2024-09-18 14:07:23,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2024-09-18 14:07:28,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=12.0 2024-09-18 14:07:31,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-09-18 14:07:44,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=458060.0, ans=0.0 2024-09-18 14:07:55,051 INFO [train.py:1198] (1/2) Epoch 26, batch 1400, loss[loss=0.2222, ctc_loss=0.1113, cr_loss=0.331, attn_decoder_loss=0.2271, over 29580.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1271, cr_loss=0.3703, attn_decoder_loss=0.2459, over 5805656.10 frames. ], batch size: 69, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:07:55,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=458100.0, ans=0.125 2024-09-18 14:08:04,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=458100.0, ans=0.0 2024-09-18 14:08:25,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-18 14:08:55,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=458260.0, ans=0.1 2024-09-18 14:09:01,515 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.404e+01 8.959e+01 9.350e+01 1.926e+02, threshold=1.792e+02, percent-clipped=1.0 2024-09-18 14:09:01,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=458260.0, ans=0.0 2024-09-18 14:09:10,658 INFO [train.py:1198] (1/2) Epoch 26, batch 1450, loss[loss=0.2536, ctc_loss=0.1365, cr_loss=0.3882, attn_decoder_loss=0.258, over 29418.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1276, cr_loss=0.3712, attn_decoder_loss=0.2464, over 5802225.00 frames. ], batch size: 94, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:09:30,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=458340.0, ans=0.125 2024-09-18 14:09:36,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=458340.0, ans=0.0 2024-09-18 14:09:55,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2024-09-18 14:10:09,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=458420.0, ans=0.95 2024-09-18 14:10:12,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=458460.0, ans=0.125 2024-09-18 14:10:27,093 INFO [train.py:1198] (1/2) Epoch 26, batch 1500, loss[loss=0.2514, ctc_loss=0.1335, cr_loss=0.4025, attn_decoder_loss=0.2556, over 29612.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1275, cr_loss=0.3712, attn_decoder_loss=0.2466, over 5804009.26 frames. ], batch size: 86, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:10:37,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=458500.0, ans=0.035 2024-09-18 14:10:46,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=458540.0, ans=0.1 2024-09-18 14:10:54,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=458540.0, ans=0.125 2024-09-18 14:10:55,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=458540.0, ans=0.0 2024-09-18 14:11:01,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=458580.0, ans=0.2 2024-09-18 14:11:05,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2024-09-18 14:11:38,954 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.611e+01 9.025e+01 9.915e+01 2.823e+02, threshold=1.805e+02, percent-clipped=2.0 2024-09-18 14:11:48,169 INFO [train.py:1198] (1/2) Epoch 26, batch 1550, loss[loss=0.2573, ctc_loss=0.1411, cr_loss=0.4051, attn_decoder_loss=0.2612, over 29533.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1276, cr_loss=0.371, attn_decoder_loss=0.2465, over 5780472.20 frames. ], batch size: 90, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:12:00,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=458700.0, ans=0.05 2024-09-18 14:12:38,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=458820.0, ans=0.1 2024-09-18 14:12:49,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2024-09-18 14:13:01,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2024-09-18 14:13:03,840 INFO [train.py:1198] (1/2) Epoch 26, batch 1600, loss[loss=0.256, ctc_loss=0.1278, cr_loss=0.3636, attn_decoder_loss=0.2622, over 29687.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1277, cr_loss=0.3709, attn_decoder_loss=0.2462, over 5762824.57 frames. ], batch size: 85, lr: 4.28e-03, grad_scale: 16.0 2024-09-18 14:13:16,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=458900.0, ans=0.025 2024-09-18 14:13:28,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=458940.0, ans=0.125 2024-09-18 14:13:49,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=459020.0, ans=0.05 2024-09-18 14:13:54,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=459020.0, ans=0.2 2024-09-18 14:13:54,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=459020.0, ans=0.125 2024-09-18 14:13:58,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=459020.0, ans=0.035 2024-09-18 14:13:58,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=459020.0, ans=0.0 2024-09-18 14:14:03,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=459060.0, ans=0.0 2024-09-18 14:14:04,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=459060.0, ans=0.2 2024-09-18 14:14:04,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=459060.0, ans=0.0 2024-09-18 14:14:09,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=459060.0, ans=0.5 2024-09-18 14:14:09,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=459060.0, ans=0.0 2024-09-18 14:14:12,106 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.327e+01 8.927e+01 9.503e+01 2.372e+02, threshold=1.785e+02, percent-clipped=2.0 2024-09-18 14:14:21,599 INFO [train.py:1198] (1/2) Epoch 26, batch 1650, loss[loss=0.2489, ctc_loss=0.132, cr_loss=0.3761, attn_decoder_loss=0.2535, over 29714.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1275, cr_loss=0.3705, attn_decoder_loss=0.2461, over 5756886.16 frames. ], batch size: 89, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:14:23,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=459100.0, ans=0.125 2024-09-18 14:14:45,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-09-18 14:15:34,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=459260.0, ans=0.125 2024-09-18 14:15:37,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=459260.0, ans=0.0 2024-09-18 14:15:39,933 INFO [train.py:1198] (1/2) Epoch 26, batch 1700, loss[loss=0.2127, ctc_loss=0.1042, cr_loss=0.3153, attn_decoder_loss=0.2177, over 29597.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1269, cr_loss=0.3698, attn_decoder_loss=0.2458, over 5779022.16 frames. ], batch size: 69, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:15:40,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459300.0, ans=0.125 2024-09-18 14:15:41,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=459300.0, ans=0.1 2024-09-18 14:15:56,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=459340.0, ans=0.07 2024-09-18 14:16:01,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=459340.0, ans=0.04949747468305833 2024-09-18 14:16:08,972 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:16:10,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=459380.0, ans=0.125 2024-09-18 14:16:14,981 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:16:20,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=459380.0, ans=0.0 2024-09-18 14:16:40,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=459460.0, ans=0.125 2024-09-18 14:16:45,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=459460.0, ans=0.125 2024-09-18 14:16:48,019 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.426e+01 8.833e+01 9.428e+01 1.268e+02, threshold=1.767e+02, percent-clipped=0.0 2024-09-18 14:16:55,730 INFO [train.py:1198] (1/2) Epoch 26, batch 1750, loss[loss=0.222, ctc_loss=0.1129, cr_loss=0.3385, attn_decoder_loss=0.2266, over 29329.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1264, cr_loss=0.3686, attn_decoder_loss=0.2453, over 5788553.66 frames. ], batch size: 67, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:17:03,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=459500.0, ans=0.0 2024-09-18 14:17:13,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=459540.0, ans=0.125 2024-09-18 14:17:14,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=459540.0, ans=0.025 2024-09-18 14:17:20,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459540.0, ans=0.125 2024-09-18 14:17:23,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=459540.0, ans=0.125 2024-09-18 14:17:25,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=459580.0, ans=0.125 2024-09-18 14:18:01,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=459660.0, ans=0.125 2024-09-18 14:18:11,489 INFO [train.py:1198] (1/2) Epoch 26, batch 1800, loss[loss=0.2443, ctc_loss=0.1288, cr_loss=0.3744, attn_decoder_loss=0.2488, over 29694.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1263, cr_loss=0.3687, attn_decoder_loss=0.2456, over 5791346.14 frames. ], batch size: 83, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:18:45,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=459780.0, ans=0.125 2024-09-18 14:18:57,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459820.0, ans=0.1 2024-09-18 14:19:13,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=459820.0, ans=0.125 2024-09-18 14:19:15,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=459860.0, ans=0.125 2024-09-18 14:19:23,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.569e+01 8.931e+01 9.347e+01 1.247e+02, threshold=1.786e+02, percent-clipped=0.0 2024-09-18 14:19:31,609 INFO [train.py:1198] (1/2) Epoch 26, batch 1850, loss[loss=0.2517, ctc_loss=0.1355, cr_loss=0.413, attn_decoder_loss=0.2554, over 29627.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1264, cr_loss=0.369, attn_decoder_loss=0.2455, over 5798372.87 frames. ], batch size: 86, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:19:49,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459940.0, ans=0.1 2024-09-18 14:19:51,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=459940.0, ans=0.125 2024-09-18 14:19:51,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=459940.0, ans=0.0 2024-09-18 14:19:54,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=459940.0, ans=0.125 2024-09-18 14:20:21,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=460020.0, ans=15.0 2024-09-18 14:20:47,475 INFO [train.py:1198] (1/2) Epoch 26, batch 1900, loss[loss=0.2562, ctc_loss=0.1306, cr_loss=0.3824, attn_decoder_loss=0.2616, over 29714.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1265, cr_loss=0.3695, attn_decoder_loss=0.246, over 5805388.00 frames. ], batch size: 89, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:20:52,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=460100.0, ans=0.2 2024-09-18 14:20:59,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=460100.0, ans=0.2 2024-09-18 14:21:32,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=460220.0, ans=0.125 2024-09-18 14:21:38,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=460220.0, ans=0.125 2024-09-18 14:21:54,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=22.5 2024-09-18 14:21:56,075 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.718e+01 9.273e+01 9.664e+01 1.625e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-18 14:22:00,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=460260.0, ans=0.09899494936611666 2024-09-18 14:22:03,709 INFO [train.py:1198] (1/2) Epoch 26, batch 1950, loss[loss=0.2502, ctc_loss=0.1364, cr_loss=0.3999, attn_decoder_loss=0.254, over 29438.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1273, cr_loss=0.3713, attn_decoder_loss=0.2471, over 5820002.42 frames. ], batch size: 78, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:22:42,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.92 vs. limit=15.0 2024-09-18 14:22:49,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=460420.0, ans=0.025 2024-09-18 14:23:03,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-09-18 14:23:05,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=460420.0, ans=0.0 2024-09-18 14:23:23,259 INFO [train.py:1198] (1/2) Epoch 26, batch 2000, loss[loss=0.2173, ctc_loss=0.1073, cr_loss=0.3262, attn_decoder_loss=0.2223, over 29340.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1277, cr_loss=0.3718, attn_decoder_loss=0.2475, over 5796740.22 frames. ], batch size: 67, lr: 4.27e-03, grad_scale: 16.0 2024-09-18 14:23:25,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=460500.0, ans=0.1 2024-09-18 14:23:38,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=460540.0, ans=0.0 2024-09-18 14:23:48,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-09-18 14:23:49,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=460540.0, ans=0.025 2024-09-18 14:24:04,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=460580.0, ans=0.125 2024-09-18 14:24:21,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460620.0, ans=0.1 2024-09-18 14:24:33,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.701e+01 9.104e+01 9.478e+01 2.564e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-18 14:24:39,264 INFO [train.py:1198] (1/2) Epoch 26, batch 2050, loss[loss=0.2182, ctc_loss=0.1089, cr_loss=0.3312, attn_decoder_loss=0.223, over 29411.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1274, cr_loss=0.3708, attn_decoder_loss=0.2466, over 5787789.93 frames. ], batch size: 70, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:24:44,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=460700.0, ans=0.125 2024-09-18 14:25:06,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=460740.0, ans=0.025 2024-09-18 14:25:37,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=460820.0, ans=0.0 2024-09-18 14:25:55,178 INFO [train.py:1198] (1/2) Epoch 26, batch 2100, loss[loss=0.2436, ctc_loss=0.1215, cr_loss=0.3521, attn_decoder_loss=0.2494, over 29758.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1266, cr_loss=0.3694, attn_decoder_loss=0.2458, over 5800291.75 frames. ], batch size: 81, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:25:59,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.69 vs. limit=15.0 2024-09-18 14:26:29,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=460980.0, ans=0.0 2024-09-18 14:26:47,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=461020.0, ans=0.125 2024-09-18 14:26:49,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=461020.0, ans=0.0 2024-09-18 14:26:54,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=461020.0, ans=0.125 2024-09-18 14:27:08,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.265e+01 8.897e+01 9.459e+01 1.093e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-18 14:27:08,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=461060.0, ans=0.1 2024-09-18 14:27:14,496 INFO [train.py:1198] (1/2) Epoch 26, batch 2150, loss[loss=0.2474, ctc_loss=0.1353, cr_loss=0.3873, attn_decoder_loss=0.2512, over 29430.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1258, cr_loss=0.3682, attn_decoder_loss=0.2452, over 5815491.42 frames. ], batch size: 78, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:27:22,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=461100.0, ans=0.125 2024-09-18 14:27:24,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=461100.0, ans=0.1 2024-09-18 14:27:52,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=461180.0, ans=0.125 2024-09-18 14:28:03,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=461220.0, ans=0.125 2024-09-18 14:28:13,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=461220.0, ans=22.5 2024-09-18 14:28:15,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=461260.0, ans=0.125 2024-09-18 14:28:19,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-09-18 14:28:26,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=461260.0, ans=0.0 2024-09-18 14:28:29,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=461300.0, ans=0.125 2024-09-18 14:28:30,683 INFO [train.py:1198] (1/2) Epoch 26, batch 2200, loss[loss=0.2513, ctc_loss=0.1347, cr_loss=0.4038, attn_decoder_loss=0.2553, over 29633.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.126, cr_loss=0.3682, attn_decoder_loss=0.2452, over 5812762.09 frames. ], batch size: 86, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:28:48,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-09-18 14:28:50,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=461340.0, ans=0.0 2024-09-18 14:28:55,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=461340.0, ans=0.2 2024-09-18 14:29:32,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=10.19 vs. limit=12.0 2024-09-18 14:29:40,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.743e+01 9.086e+01 9.862e+01 3.457e+02, threshold=1.817e+02, percent-clipped=3.0 2024-09-18 14:29:43,847 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:29:45,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=461500.0, ans=0.0 2024-09-18 14:29:46,510 INFO [train.py:1198] (1/2) Epoch 26, batch 2250, loss[loss=0.2402, ctc_loss=0.1229, cr_loss=0.3584, attn_decoder_loss=0.2453, over 29686.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1257, cr_loss=0.3674, attn_decoder_loss=0.2451, over 5811632.87 frames. ], batch size: 82, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:29:48,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=461500.0, ans=0.1 2024-09-18 14:29:50,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=461500.0, ans=6.0 2024-09-18 14:30:11,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=461540.0, ans=0.0 2024-09-18 14:30:11,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=461540.0, ans=0.125 2024-09-18 14:30:19,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=461580.0, ans=0.125 2024-09-18 14:30:37,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=461620.0, ans=0.2 2024-09-18 14:30:42,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2024-09-18 14:30:54,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.15 vs. limit=22.5 2024-09-18 14:31:06,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2024-09-18 14:31:07,027 INFO [train.py:1198] (1/2) Epoch 26, batch 2300, loss[loss=0.214, ctc_loss=0.1036, cr_loss=0.3153, attn_decoder_loss=0.2192, over 29334.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1251, cr_loss=0.3658, attn_decoder_loss=0.2442, over 5800864.24 frames. ], batch size: 71, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:31:08,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=461700.0, ans=0.125 2024-09-18 14:31:11,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=461700.0, ans=0.0 2024-09-18 14:31:12,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-18 14:31:13,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2024-09-18 14:31:17,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=461700.0, ans=0.1 2024-09-18 14:31:22,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=461740.0, ans=0.125 2024-09-18 14:31:50,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.09 vs. limit=10.0 2024-09-18 14:31:51,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=461820.0, ans=0.0 2024-09-18 14:32:14,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=461860.0, ans=0.1 2024-09-18 14:32:16,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.397e+01 8.981e+01 9.856e+01 3.624e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-18 14:32:20,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-09-18 14:32:22,730 INFO [train.py:1198] (1/2) Epoch 26, batch 2350, loss[loss=0.255, ctc_loss=0.1327, cr_loss=0.3833, attn_decoder_loss=0.2601, over 29702.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1253, cr_loss=0.3662, attn_decoder_loss=0.2447, over 5805401.43 frames. ], batch size: 83, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:32:22,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=461900.0, ans=0.125 2024-09-18 14:32:28,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=461900.0, ans=0.125 2024-09-18 14:32:36,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=461940.0, ans=0.125 2024-09-18 14:32:58,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.80 vs. limit=6.0 2024-09-18 14:33:00,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=461980.0, ans=0.125 2024-09-18 14:33:14,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.62 vs. limit=15.0 2024-09-18 14:33:27,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-18 14:33:28,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=462060.0, ans=0.04949747468305833 2024-09-18 14:33:29,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=462060.0, ans=0.0 2024-09-18 14:33:38,605 INFO [train.py:1198] (1/2) Epoch 26, batch 2400, loss[loss=0.2357, ctc_loss=0.1159, cr_loss=0.3401, attn_decoder_loss=0.2414, over 29539.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1262, cr_loss=0.368, attn_decoder_loss=0.2452, over 5808333.32 frames. ], batch size: 76, lr: 4.26e-03, grad_scale: 16.0 2024-09-18 14:34:07,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2024-09-18 14:34:28,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2024-09-18 14:34:29,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=462220.0, ans=0.125 2024-09-18 14:34:36,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=462220.0, ans=0.1 2024-09-18 14:34:36,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=462220.0, ans=0.0 2024-09-18 14:34:51,672 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.579e+01 9.212e+01 9.914e+01 2.760e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-18 14:34:58,422 INFO [train.py:1198] (1/2) Epoch 26, batch 2450, loss[loss=0.2438, ctc_loss=0.1274, cr_loss=0.3741, attn_decoder_loss=0.2484, over 29730.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1268, cr_loss=0.3693, attn_decoder_loss=0.2461, over 5784553.62 frames. ], batch size: 82, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:35:10,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=462300.0, ans=0.0 2024-09-18 14:35:10,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=462300.0, ans=0.125 2024-09-18 14:35:13,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=462340.0, ans=0.125 2024-09-18 14:36:10,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-09-18 14:36:13,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=462500.0, ans=0.0 2024-09-18 14:36:14,449 INFO [train.py:1198] (1/2) Epoch 26, batch 2500, loss[loss=0.243, ctc_loss=0.1254, cr_loss=0.3712, attn_decoder_loss=0.2478, over 29629.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1268, cr_loss=0.3693, attn_decoder_loss=0.2461, over 5794491.21 frames. ], batch size: 86, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:36:20,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=462500.0, ans=0.125 2024-09-18 14:36:25,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=462500.0, ans=0.125 2024-09-18 14:36:25,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2024-09-18 14:36:31,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=462540.0, ans=0.125 2024-09-18 14:37:06,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=462620.0, ans=0.125 2024-09-18 14:37:20,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=462660.0, ans=0.125 2024-09-18 14:37:25,679 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.473e+01 8.987e+01 9.500e+01 1.769e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 14:37:29,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=22.5 2024-09-18 14:37:30,373 INFO [train.py:1198] (1/2) Epoch 26, batch 2550, loss[loss=0.2168, ctc_loss=0.114, cr_loss=0.3536, attn_decoder_loss=0.2204, over 29364.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1265, cr_loss=0.369, attn_decoder_loss=0.2458, over 5797716.86 frames. ], batch size: 67, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:38:08,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=462780.0, ans=0.125 2024-09-18 14:38:27,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=462820.0, ans=0.125 2024-09-18 14:38:45,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=462860.0, ans=0.125 2024-09-18 14:38:48,007 INFO [train.py:1198] (1/2) Epoch 26, batch 2600, loss[loss=0.2415, ctc_loss=0.1231, cr_loss=0.3625, attn_decoder_loss=0.2466, over 29425.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1267, cr_loss=0.3694, attn_decoder_loss=0.2462, over 5795234.69 frames. ], batch size: 78, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:38:55,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=462900.0, ans=0.125 2024-09-18 14:39:01,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.68 vs. limit=22.5 2024-09-18 14:39:13,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=462940.0, ans=0.09899494936611666 2024-09-18 14:39:21,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=462980.0, ans=0.125 2024-09-18 14:39:21,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=462980.0, ans=0.125 2024-09-18 14:39:32,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=462980.0, ans=0.2 2024-09-18 14:39:40,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=463020.0, ans=0.125 2024-09-18 14:39:52,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=463060.0, ans=0.0 2024-09-18 14:39:59,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=463060.0, ans=0.2 2024-09-18 14:40:01,220 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.375e+01 8.942e+01 9.564e+01 2.475e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 14:40:05,704 INFO [train.py:1198] (1/2) Epoch 26, batch 2650, loss[loss=0.2529, ctc_loss=0.1425, cr_loss=0.4007, attn_decoder_loss=0.2563, over 29199.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1267, cr_loss=0.3695, attn_decoder_loss=0.2464, over 5801936.54 frames. ], batch size: 100, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:40:34,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=463180.0, ans=0.125 2024-09-18 14:40:44,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463180.0, ans=0.1 2024-09-18 14:41:00,891 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:41:15,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=6.0 2024-09-18 14:41:19,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=463260.0, ans=0.2 2024-09-18 14:41:23,922 INFO [train.py:1198] (1/2) Epoch 26, batch 2700, loss[loss=0.2458, ctc_loss=0.1243, cr_loss=0.3777, attn_decoder_loss=0.2509, over 29525.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1269, cr_loss=0.3703, attn_decoder_loss=0.2466, over 5795864.43 frames. ], batch size: 87, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:41:27,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=463300.0, ans=0.0 2024-09-18 14:41:49,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=463340.0, ans=0.125 2024-09-18 14:42:08,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=463420.0, ans=0.0 2024-09-18 14:42:26,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=463460.0, ans=0.09899494936611666 2024-09-18 14:42:35,432 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.497e+01 8.933e+01 9.409e+01 1.999e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 14:42:40,227 INFO [train.py:1198] (1/2) Epoch 26, batch 2750, loss[loss=0.2447, ctc_loss=0.1305, cr_loss=0.3787, attn_decoder_loss=0.2489, over 29514.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1262, cr_loss=0.3688, attn_decoder_loss=0.2454, over 5795203.18 frames. ], batch size: 75, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:43:32,714 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:43:40,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=463620.0, ans=0.0 2024-09-18 14:43:42,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=463660.0, ans=0.125 2024-09-18 14:43:43,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=463660.0, ans=15.0 2024-09-18 14:43:57,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463700.0, ans=0.1 2024-09-18 14:43:58,322 INFO [train.py:1198] (1/2) Epoch 26, batch 2800, loss[loss=0.2731, ctc_loss=0.1689, cr_loss=0.4204, attn_decoder_loss=0.2754, over 19580.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1266, cr_loss=0.369, attn_decoder_loss=0.2456, over 5776537.86 frames. ], batch size: 210, lr: 4.26e-03, grad_scale: 16.0 2024-09-18 14:44:16,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=463740.0, ans=0.2 2024-09-18 14:44:40,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=463780.0, ans=0.125 2024-09-18 14:44:48,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-18 14:44:52,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=463820.0, ans=0.125 2024-09-18 14:45:03,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=463860.0, ans=0.2 2024-09-18 14:45:10,995 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.701e+01 9.139e+01 9.864e+01 2.017e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-18 14:45:12,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=463860.0, ans=0.0 2024-09-18 14:45:15,526 INFO [train.py:1198] (1/2) Epoch 26, batch 2850, loss[loss=0.2405, ctc_loss=0.1235, cr_loss=0.3734, attn_decoder_loss=0.2452, over 29510.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1271, cr_loss=0.37, attn_decoder_loss=0.2462, over 5761716.33 frames. ], batch size: 77, lr: 4.26e-03, grad_scale: 16.0 2024-09-18 14:45:40,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=463940.0, ans=0.125 2024-09-18 14:45:40,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=463940.0, ans=0.125 2024-09-18 14:45:49,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=463980.0, ans=0.125 2024-09-18 14:45:50,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=463980.0, ans=0.125 2024-09-18 14:46:11,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=464020.0, ans=0.125 2024-09-18 14:46:11,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=464020.0, ans=0.125 2024-09-18 14:46:16,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-18 14:46:40,975 INFO [train.py:1198] (1/2) Epoch 26, batch 2900, loss[loss=0.2466, ctc_loss=0.1368, cr_loss=0.4046, attn_decoder_loss=0.2498, over 29428.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1279, cr_loss=0.3721, attn_decoder_loss=0.2473, over 5787684.10 frames. ], batch size: 79, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:46:42,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=464100.0, ans=0.125 2024-09-18 14:47:05,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=464140.0, ans=0.05 2024-09-18 14:47:11,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=464180.0, ans=0.0 2024-09-18 14:47:40,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=464260.0, ans=0.0 2024-09-18 14:47:53,451 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.512e+01 9.090e+01 9.867e+01 2.207e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-18 14:47:56,513 INFO [train.py:1198] (1/2) Epoch 26, batch 2950, loss[loss=0.2421, ctc_loss=0.1291, cr_loss=0.3936, attn_decoder_loss=0.2459, over 29502.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1267, cr_loss=0.3693, attn_decoder_loss=0.2459, over 5782938.68 frames. ], batch size: 75, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:47:58,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2024-09-18 14:48:22,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=464340.0, ans=0.0 2024-09-18 14:48:58,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.36 vs. limit=15.0 2024-09-18 14:49:15,212 INFO [train.py:1198] (1/2) Epoch 26, batch 3000, loss[loss=0.233, ctc_loss=0.1145, cr_loss=0.3339, attn_decoder_loss=0.2387, over 29740.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1264, cr_loss=0.3691, attn_decoder_loss=0.2454, over 5782548.70 frames. ], batch size: 81, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:49:15,213 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 14:49:33,742 INFO [train.py:1230] (1/2) Epoch 26, validation: loss=0.2113, ctc_loss=0.03775, cr_loss=5.571e-15, attn_decoder_loss=0.2305, over 944034.00 frames. 2024-09-18 14:49:33,743 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 14:50:04,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=464580.0, ans=0.125 2024-09-18 14:50:21,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=464620.0, ans=0.2 2024-09-18 14:50:24,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=464620.0, ans=0.035 2024-09-18 14:50:35,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=464660.0, ans=0.0 2024-09-18 14:50:40,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.16 vs. limit=15.0 2024-09-18 14:50:45,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.30 vs. limit=15.0 2024-09-18 14:50:49,037 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.504e+01 9.014e+01 9.631e+01 1.549e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-18 14:50:52,202 INFO [train.py:1198] (1/2) Epoch 26, batch 3050, loss[loss=0.2249, ctc_loss=0.1125, cr_loss=0.3476, attn_decoder_loss=0.2297, over 29526.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1266, cr_loss=0.3701, attn_decoder_loss=0.246, over 5776357.31 frames. ], batch size: 76, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:51:03,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=464700.0, ans=0.125 2024-09-18 14:51:35,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=464780.0, ans=0.0 2024-09-18 14:51:37,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.41 vs. limit=15.0 2024-09-18 14:51:41,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=464820.0, ans=0.125 2024-09-18 14:52:08,318 INFO [train.py:1198] (1/2) Epoch 26, batch 3100, loss[loss=0.2667, ctc_loss=0.1513, cr_loss=0.4183, attn_decoder_loss=0.2702, over 29281.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1265, cr_loss=0.3693, attn_decoder_loss=0.2458, over 5776528.83 frames. ], batch size: 100, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:52:22,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.58 vs. limit=22.5 2024-09-18 14:52:45,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=464980.0, ans=0.125 2024-09-18 14:52:47,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.70 vs. limit=22.5 2024-09-18 14:52:49,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=464980.0, ans=0.125 2024-09-18 14:53:10,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=465060.0, ans=0.125 2024-09-18 14:53:17,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=465060.0, ans=0.2 2024-09-18 14:53:23,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.570e+01 9.069e+01 9.533e+01 2.948e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-18 14:53:26,521 INFO [train.py:1198] (1/2) Epoch 26, batch 3150, loss[loss=0.2645, ctc_loss=0.1485, cr_loss=0.4046, attn_decoder_loss=0.2684, over 28875.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1265, cr_loss=0.3687, attn_decoder_loss=0.2459, over 5782187.12 frames. ], batch size: 104, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:54:38,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=465260.0, ans=0.125 2024-09-18 14:54:44,437 INFO [train.py:1198] (1/2) Epoch 26, batch 3200, loss[loss=0.2405, ctc_loss=0.1249, cr_loss=0.3684, attn_decoder_loss=0.2451, over 29420.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1263, cr_loss=0.3681, attn_decoder_loss=0.2454, over 5791930.80 frames. ], batch size: 79, lr: 4.25e-03, grad_scale: 16.0 2024-09-18 14:55:06,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=465340.0, ans=0.125 2024-09-18 14:55:10,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=22.5 2024-09-18 14:55:26,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2024-09-18 14:55:38,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=465420.0, ans=0.05 2024-09-18 14:55:41,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=465420.0, ans=0.125 2024-09-18 14:55:41,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=465420.0, ans=0.0 2024-09-18 14:55:41,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=465420.0, ans=0.1 2024-09-18 14:55:51,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=465460.0, ans=0.1 2024-09-18 14:55:59,029 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.413e+01 8.869e+01 9.551e+01 1.271e+02, threshold=1.774e+02, percent-clipped=0.0 2024-09-18 14:56:00,550 INFO [train.py:1198] (1/2) Epoch 26, batch 3250, loss[loss=0.2562, ctc_loss=0.1412, cr_loss=0.4019, attn_decoder_loss=0.2601, over 29696.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1267, cr_loss=0.3692, attn_decoder_loss=0.2459, over 5798570.07 frames. ], batch size: 84, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:56:04,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=12.0 2024-09-18 14:56:11,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=465500.0, ans=0.025 2024-09-18 14:56:13,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=465500.0, ans=0.1 2024-09-18 14:56:14,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=465540.0, ans=0.125 2024-09-18 14:56:20,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=465540.0, ans=0.125 2024-09-18 14:56:50,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=465620.0, ans=0.5 2024-09-18 14:56:53,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=465620.0, ans=0.125 2024-09-18 14:57:18,688 INFO [train.py:1198] (1/2) Epoch 26, batch 3300, loss[loss=0.2509, ctc_loss=0.1258, cr_loss=0.3711, attn_decoder_loss=0.2565, over 28308.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1262, cr_loss=0.3679, attn_decoder_loss=0.2449, over 5796095.28 frames. ], batch size: 111, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:57:25,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=465700.0, ans=0.0 2024-09-18 14:57:28,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=465700.0, ans=0.2 2024-09-18 14:57:46,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=465740.0, ans=0.0 2024-09-18 14:57:46,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.10 vs. limit=10.0 2024-09-18 14:57:49,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=465780.0, ans=0.07 2024-09-18 14:57:56,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=465780.0, ans=0.125 2024-09-18 14:58:08,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=465820.0, ans=0.2 2024-09-18 14:58:13,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=465820.0, ans=0.125 2024-09-18 14:58:13,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=465820.0, ans=0.5 2024-09-18 14:58:14,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=465820.0, ans=0.0 2024-09-18 14:58:15,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-09-18 14:58:34,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.655e+01 9.163e+01 9.654e+01 2.275e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-18 14:58:36,149 INFO [train.py:1198] (1/2) Epoch 26, batch 3350, loss[loss=0.2546, ctc_loss=0.1353, cr_loss=0.3886, attn_decoder_loss=0.2592, over 28830.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1268, cr_loss=0.369, attn_decoder_loss=0.2457, over 5772760.35 frames. ], batch size: 104, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:58:58,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.81 vs. limit=22.5 2024-09-18 14:59:23,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=466020.0, ans=0.125 2024-09-18 14:59:33,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=466020.0, ans=0.2 2024-09-18 14:59:51,875 INFO [train.py:1198] (1/2) Epoch 26, batch 3400, loss[loss=0.2212, ctc_loss=0.1112, cr_loss=0.349, attn_decoder_loss=0.2257, over 29373.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1269, cr_loss=0.3697, attn_decoder_loss=0.2457, over 5765872.94 frames. ], batch size: 67, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:59:52,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=466100.0, ans=0.0 2024-09-18 15:00:08,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=466140.0, ans=0.125 2024-09-18 15:00:20,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=466180.0, ans=0.05 2024-09-18 15:00:28,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=466180.0, ans=0.0 2024-09-18 15:00:47,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2024-09-18 15:01:08,279 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.379e+01 8.854e+01 9.422e+01 2.123e+02, threshold=1.771e+02, percent-clipped=1.0 2024-09-18 15:01:08,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=466300.0, ans=0.125 2024-09-18 15:01:08,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=466300.0, ans=0.2 2024-09-18 15:01:09,878 INFO [train.py:1198] (1/2) Epoch 26, batch 3450, loss[loss=0.2485, ctc_loss=0.1282, cr_loss=0.3732, attn_decoder_loss=0.2535, over 28194.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1272, cr_loss=0.3708, attn_decoder_loss=0.2461, over 5773841.97 frames. ], batch size: 111, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 15:01:37,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=466340.0, ans=0.125 2024-09-18 15:02:24,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=15.0 2024-09-18 15:02:28,038 INFO [train.py:1198] (1/2) Epoch 26, batch 3500, loss[loss=0.2146, ctc_loss=0.102, cr_loss=0.3253, attn_decoder_loss=0.2199, over 29352.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1268, cr_loss=0.37, attn_decoder_loss=0.2455, over 5775682.97 frames. ], batch size: 71, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:02:32,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=466500.0, ans=0.1 2024-09-18 15:03:35,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-09-18 15:03:36,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=466660.0, ans=0.0 2024-09-18 15:03:36,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=466660.0, ans=0.125 2024-09-18 15:03:40,927 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.579e+01 9.256e+01 9.884e+01 2.781e+02, threshold=1.851e+02, percent-clipped=2.0 2024-09-18 15:03:42,426 INFO [train.py:1198] (1/2) Epoch 26, batch 3550, loss[loss=0.2423, ctc_loss=0.1206, cr_loss=0.3583, attn_decoder_loss=0.2479, over 29714.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1262, cr_loss=0.3689, attn_decoder_loss=0.2454, over 5782854.34 frames. ], batch size: 89, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:03:58,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=466740.0, ans=0.125 2024-09-18 15:04:10,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=466780.0, ans=0.0 2024-09-18 15:04:19,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466780.0, ans=0.1 2024-09-18 15:04:27,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=466820.0, ans=0.125 2024-09-18 15:04:29,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=466820.0, ans=0.0 2024-09-18 15:04:36,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=22.5 2024-09-18 15:04:37,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=466820.0, ans=0.0 2024-09-18 15:04:49,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=466860.0, ans=0.125 2024-09-18 15:04:52,456 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:04:55,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=466900.0, ans=0.125 2024-09-18 15:04:56,570 INFO [train.py:1198] (1/2) Epoch 26, batch 3600, loss[loss=0.23, ctc_loss=0.1167, cr_loss=0.3606, attn_decoder_loss=0.2346, over 29460.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.126, cr_loss=0.3687, attn_decoder_loss=0.2455, over 5791798.57 frames. ], batch size: 77, lr: 4.24e-03, grad_scale: 16.0 2024-09-18 15:04:58,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466900.0, ans=0.1 2024-09-18 15:05:11,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=12.0 2024-09-18 15:05:19,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=466940.0, ans=0.125 2024-09-18 15:05:29,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=466980.0, ans=0.125 2024-09-18 15:05:51,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=467020.0, ans=0.125 2024-09-18 15:05:58,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=467060.0, ans=0.125 2024-09-18 15:06:07,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=15.0 2024-09-18 15:06:12,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 8.525e+01 9.113e+01 9.643e+01 7.477e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-18 15:06:12,900 INFO [train.py:1198] (1/2) Epoch 26, batch 3650, loss[loss=0.2566, ctc_loss=0.1414, cr_loss=0.3943, attn_decoder_loss=0.2607, over 29510.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.126, cr_loss=0.3685, attn_decoder_loss=0.2451, over 5794734.22 frames. ], batch size: 90, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:06:21,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.86 vs. limit=22.5 2024-09-18 15:06:35,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=467140.0, ans=0.0 2024-09-18 15:06:45,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=467180.0, ans=0.0 2024-09-18 15:06:58,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=467220.0, ans=0.125 2024-09-18 15:06:59,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=467220.0, ans=0.0 2024-09-18 15:07:05,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=467220.0, ans=0.5 2024-09-18 15:07:07,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.45 vs. limit=15.0 2024-09-18 15:07:27,485 INFO [train.py:1198] (1/2) Epoch 26, batch 3700, loss[loss=0.2447, ctc_loss=0.1299, cr_loss=0.3591, attn_decoder_loss=0.2495, over 29712.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1259, cr_loss=0.3683, attn_decoder_loss=0.2452, over 5805408.99 frames. ], batch size: 84, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:07:30,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=467300.0, ans=0.025 2024-09-18 15:08:02,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=467380.0, ans=0.125 2024-09-18 15:08:03,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=467380.0, ans=0.0 2024-09-18 15:08:30,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-09-18 15:08:38,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.24 vs. limit=15.0 2024-09-18 15:08:43,721 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.301e+01 8.766e+01 9.365e+01 1.565e+02, threshold=1.753e+02, percent-clipped=0.0 2024-09-18 15:08:43,743 INFO [train.py:1198] (1/2) Epoch 26, batch 3750, loss[loss=0.2182, ctc_loss=0.116, cr_loss=0.3582, attn_decoder_loss=0.2216, over 29364.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1259, cr_loss=0.3686, attn_decoder_loss=0.2448, over 5808658.04 frames. ], batch size: 67, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:08:52,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=467500.0, ans=0.07 2024-09-18 15:08:54,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.17 vs. limit=15.0 2024-09-18 15:08:58,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=467540.0, ans=0.125 2024-09-18 15:09:00,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=467540.0, ans=0.2 2024-09-18 15:09:00,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467540.0, ans=0.1 2024-09-18 15:09:03,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=467540.0, ans=0.5 2024-09-18 15:09:10,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=467540.0, ans=0.125 2024-09-18 15:09:29,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2024-09-18 15:09:58,345 INFO [train.py:1198] (1/2) Epoch 26, batch 3800, loss[loss=0.2562, ctc_loss=0.1351, cr_loss=0.3971, attn_decoder_loss=0.2609, over 29650.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1259, cr_loss=0.3683, attn_decoder_loss=0.2447, over 5799422.46 frames. ], batch size: 86, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:10:22,471 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:10:26,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=467780.0, ans=0.125 2024-09-18 15:10:37,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=467780.0, ans=0.125 2024-09-18 15:10:47,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=467820.0, ans=0.0 2024-09-18 15:10:56,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=467860.0, ans=0.0 2024-09-18 15:11:01,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=467860.0, ans=0.0 2024-09-18 15:11:10,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=467860.0, ans=12.0 2024-09-18 15:11:12,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.723e+01 9.262e+01 9.820e+01 3.411e+02, threshold=1.852e+02, percent-clipped=3.0 2024-09-18 15:11:12,704 INFO [train.py:1198] (1/2) Epoch 26, batch 3850, loss[loss=0.26, ctc_loss=0.1372, cr_loss=0.3884, attn_decoder_loss=0.265, over 29237.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1256, cr_loss=0.3676, attn_decoder_loss=0.2446, over 5813243.99 frames. ], batch size: 100, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:11:25,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-09-18 15:11:44,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.31 vs. limit=12.0 2024-09-18 15:11:53,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=467980.0, ans=0.0 2024-09-18 15:11:58,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=468020.0, ans=0.125 2024-09-18 15:12:29,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=22.5 2024-09-18 15:12:29,506 INFO [train.py:1198] (1/2) Epoch 26, batch 3900, loss[loss=0.2536, ctc_loss=0.1317, cr_loss=0.385, attn_decoder_loss=0.2585, over 29611.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1265, cr_loss=0.3697, attn_decoder_loss=0.2455, over 5818334.34 frames. ], batch size: 86, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:12:37,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-09-18 15:12:40,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=468100.0, ans=0.125 2024-09-18 15:12:46,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=468140.0, ans=0.0 2024-09-18 15:13:06,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=468180.0, ans=0.09899494936611666 2024-09-18 15:13:08,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=468180.0, ans=0.0 2024-09-18 15:13:36,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468260.0, ans=0.1 2024-09-18 15:13:37,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=468260.0, ans=0.2 2024-09-18 15:13:43,697 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.582e+01 9.076e+01 9.520e+01 1.404e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 15:13:43,724 INFO [train.py:1198] (1/2) Epoch 26, batch 3950, loss[loss=0.2691, ctc_loss=0.1544, cr_loss=0.4464, attn_decoder_loss=0.2719, over 29478.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1263, cr_loss=0.3691, attn_decoder_loss=0.2455, over 5837710.77 frames. ], batch size: 97, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:13:52,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-09-18 15:13:53,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.39 vs. limit=22.5 2024-09-18 15:14:10,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=468340.0, ans=0.5 2024-09-18 15:14:29,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=468420.0, ans=0.125 2024-09-18 15:14:47,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=468460.0, ans=0.0 2024-09-18 15:14:57,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=468500.0, ans=0.125 2024-09-18 15:14:58,756 INFO [train.py:1198] (1/2) Epoch 26, batch 4000, loss[loss=0.2248, ctc_loss=0.1082, cr_loss=0.3386, attn_decoder_loss=0.2302, over 29505.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1268, cr_loss=0.3693, attn_decoder_loss=0.2457, over 5813419.49 frames. ], batch size: 74, lr: 4.24e-03, grad_scale: 16.0 2024-09-18 15:15:38,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=468580.0, ans=0.07 2024-09-18 15:15:40,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=468580.0, ans=0.125 2024-09-18 15:15:44,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2024-09-18 15:15:58,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468660.0, ans=0.1 2024-09-18 15:15:59,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=468660.0, ans=10.0 2024-09-18 15:16:07,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468660.0, ans=0.1 2024-09-18 15:16:10,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=468660.0, ans=0.0 2024-09-18 15:16:14,196 INFO [train.py:1198] (1/2) Epoch 26, batch 4050, loss[loss=0.2716, ctc_loss=0.1593, cr_loss=0.397, attn_decoder_loss=0.2752, over 20247.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.127, cr_loss=0.3698, attn_decoder_loss=0.2457, over 5796588.85 frames. ], batch size: 209, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:16:15,586 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.012e+01 8.606e+01 9.122e+01 9.849e+01 6.037e+02, threshold=1.824e+02, percent-clipped=3.0 2024-09-18 15:16:23,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=468700.0, ans=0.125 2024-09-18 15:16:29,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468740.0, ans=0.1 2024-09-18 15:16:46,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=468780.0, ans=0.2 2024-09-18 15:16:53,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=468780.0, ans=0.125 2024-09-18 15:17:19,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.82 vs. limit=5.0 2024-09-18 15:17:24,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=468860.0, ans=0.125 2024-09-18 15:17:28,157 INFO [train.py:1198] (1/2) Epoch 26, batch 4100, loss[loss=0.2517, ctc_loss=0.1401, cr_loss=0.3925, attn_decoder_loss=0.2554, over 29491.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1271, cr_loss=0.3699, attn_decoder_loss=0.2458, over 5791149.29 frames. ], batch size: 90, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:17:51,998 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:17:59,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.28 vs. limit=15.0 2024-09-18 15:18:18,311 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:18:41,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=469100.0, ans=0.0 2024-09-18 15:18:42,763 INFO [train.py:1198] (1/2) Epoch 26, batch 4150, loss[loss=0.2444, ctc_loss=0.1329, cr_loss=0.399, attn_decoder_loss=0.2479, over 29478.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1267, cr_loss=0.3698, attn_decoder_loss=0.2455, over 5797504.78 frames. ], batch size: 77, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:18:44,181 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.459e+01 8.973e+01 9.469e+01 6.878e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-18 15:18:47,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=469100.0, ans=0.125 2024-09-18 15:19:10,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=469180.0, ans=0.0 2024-09-18 15:19:13,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=469180.0, ans=0.125 2024-09-18 15:19:21,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=469180.0, ans=0.035 2024-09-18 15:19:21,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=469180.0, ans=10.0 2024-09-18 15:19:30,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=469220.0, ans=0.125 2024-09-18 15:19:44,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2024-09-18 15:19:44,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=469260.0, ans=0.125 2024-09-18 15:19:52,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=469260.0, ans=0.5 2024-09-18 15:19:56,319 INFO [train.py:1198] (1/2) Epoch 26, batch 4200, loss[loss=0.2527, ctc_loss=0.1395, cr_loss=0.3832, attn_decoder_loss=0.2568, over 29529.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1267, cr_loss=0.3695, attn_decoder_loss=0.2455, over 5800276.75 frames. ], batch size: 90, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:20:02,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=469300.0, ans=0.0 2024-09-18 15:20:07,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=469300.0, ans=0.125 2024-09-18 15:20:31,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=469380.0, ans=0.0 2024-09-18 15:21:08,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=469460.0, ans=0.2 2024-09-18 15:21:10,804 INFO [train.py:1198] (1/2) Epoch 26, batch 4250, loss[loss=0.2264, ctc_loss=0.102, cr_loss=0.3162, attn_decoder_loss=0.2332, over 29531.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.126, cr_loss=0.3682, attn_decoder_loss=0.2454, over 5806122.32 frames. ], batch size: 74, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:21:12,219 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.717e+01 9.053e+01 9.730e+01 2.394e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-18 15:21:13,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=469500.0, ans=0.125 2024-09-18 15:21:16,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=469500.0, ans=0.0 2024-09-18 15:21:32,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=469540.0, ans=0.125 2024-09-18 15:21:33,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.77 vs. limit=22.5 2024-09-18 15:21:43,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=469580.0, ans=0.0 2024-09-18 15:21:55,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=469620.0, ans=0.0 2024-09-18 15:22:11,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.61 vs. limit=10.0 2024-09-18 15:22:20,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469660.0, ans=0.1 2024-09-18 15:22:22,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=469660.0, ans=0.0 2024-09-18 15:22:25,794 INFO [train.py:1198] (1/2) Epoch 26, batch 4300, loss[loss=0.266, ctc_loss=0.1431, cr_loss=0.3987, attn_decoder_loss=0.2708, over 29515.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1257, cr_loss=0.3677, attn_decoder_loss=0.2454, over 5796184.25 frames. ], batch size: 87, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:22:52,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=469740.0, ans=0.0 2024-09-18 15:23:07,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=469780.0, ans=0.2 2024-09-18 15:23:16,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=469820.0, ans=0.2 2024-09-18 15:23:19,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=469820.0, ans=0.125 2024-09-18 15:23:20,885 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:23:27,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469860.0, ans=0.1 2024-09-18 15:23:39,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=469900.0, ans=0.1 2024-09-18 15:23:40,784 INFO [train.py:1198] (1/2) Epoch 26, batch 4350, loss[loss=0.252, ctc_loss=0.1309, cr_loss=0.3882, attn_decoder_loss=0.2568, over 29499.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1285, cr_loss=0.373, attn_decoder_loss=0.2487, over 5798187.90 frames. ], batch size: 97, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:23:42,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.612e+01 9.127e+01 9.671e+01 1.308e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-18 15:23:42,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=469900.0, ans=0.125 2024-09-18 15:24:25,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=470020.0, ans=0.0 2024-09-18 15:24:26,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=470020.0, ans=0.2 2024-09-18 15:24:26,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=470020.0, ans=0.0 2024-09-18 15:24:38,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-18 15:24:42,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=470060.0, ans=0.125 2024-09-18 15:24:47,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=470060.0, ans=0.125 2024-09-18 15:24:54,002 INFO [train.py:1198] (1/2) Epoch 26, batch 4400, loss[loss=0.2556, ctc_loss=0.1429, cr_loss=0.3992, attn_decoder_loss=0.2593, over 27257.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1302, cr_loss=0.3762, attn_decoder_loss=0.251, over 5765612.81 frames. ], batch size: 124, lr: 4.23e-03, grad_scale: 16.0 2024-09-18 15:24:59,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-09-18 15:25:17,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=470140.0, ans=0.0 2024-09-18 15:25:43,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=470220.0, ans=0.125 2024-09-18 15:26:09,414 INFO [train.py:1198] (1/2) Epoch 26, batch 4450, loss[loss=0.2687, ctc_loss=0.1717, cr_loss=0.4097, attn_decoder_loss=0.2704, over 20329.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1344, cr_loss=0.3817, attn_decoder_loss=0.2532, over 5572524.93 frames. ], batch size: 209, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:26:12,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.301e+01 9.167e+01 9.608e+01 1.048e+02 2.652e+02, threshold=1.922e+02, percent-clipped=1.0 2024-09-18 15:26:23,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=470340.0, ans=0.125 2024-09-18 15:26:30,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=470340.0, ans=0.025 2024-09-18 15:26:30,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=470340.0, ans=0.125 2024-09-18 15:26:31,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-09-18 15:26:33,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=470340.0, ans=0.0 2024-09-18 15:26:41,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=470380.0, ans=0.2 2024-09-18 15:26:52,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-09-18 15:26:57,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=470420.0, ans=0.1 2024-09-18 15:27:25,442 INFO [train.py:1198] (1/2) Epoch 26, batch 4500, loss[loss=0.2689, ctc_loss=0.1668, cr_loss=0.4233, attn_decoder_loss=0.2708, over 20306.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1384, cr_loss=0.3842, attn_decoder_loss=0.2554, over 5233377.17 frames. ], batch size: 210, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:27:35,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=470500.0, ans=0.125 2024-09-18 15:27:49,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.82 vs. limit=10.0 2024-09-18 15:28:54,200 INFO [train.py:1198] (1/2) Epoch 27, batch 0, loss[loss=0.2161, ctc_loss=0.1028, cr_loss=0.3285, attn_decoder_loss=0.2214, over 29598.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1028, cr_loss=0.3285, attn_decoder_loss=0.2214, over 29598.00 frames. ], batch size: 73, lr: 4.15e-03, grad_scale: 16.0 2024-09-18 15:28:54,201 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 15:29:04,698 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6813, 4.6660, 4.0043, 4.4543], device='cuda:1') 2024-09-18 15:29:12,733 INFO [train.py:1230] (1/2) Epoch 27, validation: loss=0.2127, ctc_loss=0.03797, cr_loss=5.907e-15, attn_decoder_loss=0.2322, over 944034.00 frames. 2024-09-18 15:29:12,733 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 15:29:14,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=470600.0, ans=0.07 2024-09-18 15:29:22,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.81 vs. limit=22.5 2024-09-18 15:29:35,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2024-09-18 15:29:36,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-09-18 15:29:37,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=470640.0, ans=0.0 2024-09-18 15:29:41,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470680.0, ans=0.1 2024-09-18 15:29:44,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=470680.0, ans=0.125 2024-09-18 15:29:47,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=470680.0, ans=0.125 2024-09-18 15:29:50,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=470680.0, ans=0.125 2024-09-18 15:29:53,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 1.034e+02 1.128e+02 1.240e+02 3.218e+02, threshold=2.256e+02, percent-clipped=3.0 2024-09-18 15:30:01,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=470720.0, ans=0.1 2024-09-18 15:30:03,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.95 vs. limit=15.0 2024-09-18 15:30:05,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=470720.0, ans=0.2 2024-09-18 15:30:14,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=470760.0, ans=0.125 2024-09-18 15:30:14,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=470760.0, ans=0.125 2024-09-18 15:30:28,340 INFO [train.py:1198] (1/2) Epoch 27, batch 50, loss[loss=0.2181, ctc_loss=0.1155, cr_loss=0.3472, attn_decoder_loss=0.2217, over 29436.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1281, cr_loss=0.3714, attn_decoder_loss=0.2466, over 1267229.45 frames. ], batch size: 70, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 15:30:52,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-18 15:30:55,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=470840.0, ans=0.125 2024-09-18 15:30:56,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=470840.0, ans=0.0 2024-09-18 15:31:37,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=470960.0, ans=0.2 2024-09-18 15:31:42,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=15.0 2024-09-18 15:31:47,304 INFO [train.py:1198] (1/2) Epoch 27, batch 100, loss[loss=0.2316, ctc_loss=0.1205, cr_loss=0.3611, attn_decoder_loss=0.2359, over 29529.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1288, cr_loss=0.3737, attn_decoder_loss=0.2483, over 2251682.52 frames. ], batch size: 76, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 15:31:58,318 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:31:59,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=471000.0, ans=0.125 2024-09-18 15:32:03,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2024-09-18 15:32:04,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-09-18 15:32:08,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=471040.0, ans=0.0 2024-09-18 15:32:14,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=471040.0, ans=0.125 2024-09-18 15:32:28,928 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.524e+01 9.170e+01 9.614e+01 1.417e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-18 15:32:45,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=471160.0, ans=0.0 2024-09-18 15:32:53,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=471160.0, ans=0.04949747468305833 2024-09-18 15:33:02,259 INFO [train.py:1198] (1/2) Epoch 27, batch 150, loss[loss=0.2073, ctc_loss=0.1031, cr_loss=0.3248, attn_decoder_loss=0.2117, over 29426.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1266, cr_loss=0.3706, attn_decoder_loss=0.2459, over 3046329.12 frames. ], batch size: 70, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:33:29,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=471240.0, ans=0.2 2024-09-18 15:33:47,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=471320.0, ans=0.0 2024-09-18 15:33:50,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=471320.0, ans=0.125 2024-09-18 15:33:58,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=471320.0, ans=0.0 2024-09-18 15:34:01,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=471360.0, ans=0.2 2024-09-18 15:34:08,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=471360.0, ans=0.1 2024-09-18 15:34:17,454 INFO [train.py:1198] (1/2) Epoch 27, batch 200, loss[loss=0.2523, ctc_loss=0.1383, cr_loss=0.391, attn_decoder_loss=0.2563, over 27527.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1266, cr_loss=0.3709, attn_decoder_loss=0.2453, over 3658330.40 frames. ], batch size: 124, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:34:34,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471440.0, ans=0.1 2024-09-18 15:34:52,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=471480.0, ans=0.0 2024-09-18 15:34:54,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=471480.0, ans=0.07 2024-09-18 15:35:01,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=471480.0, ans=0.125 2024-09-18 15:35:03,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471480.0, ans=0.1 2024-09-18 15:35:04,163 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.473e+01 8.928e+01 9.557e+01 1.148e+02, threshold=1.786e+02, percent-clipped=0.0 2024-09-18 15:35:05,053 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.64 vs. limit=15.0 2024-09-18 15:35:15,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=471520.0, ans=0.125 2024-09-18 15:35:19,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=471520.0, ans=0.0 2024-09-18 15:35:22,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=471560.0, ans=0.125 2024-09-18 15:35:24,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=471560.0, ans=0.0 2024-09-18 15:35:28,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=471560.0, ans=0.2 2024-09-18 15:35:28,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=471560.0, ans=0.125 2024-09-18 15:35:33,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=471560.0, ans=0.2 2024-09-18 15:35:37,445 INFO [train.py:1198] (1/2) Epoch 27, batch 250, loss[loss=0.257, ctc_loss=0.1308, cr_loss=0.3916, attn_decoder_loss=0.2623, over 29224.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1267, cr_loss=0.3706, attn_decoder_loss=0.2452, over 4141619.37 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:35:39,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=471600.0, ans=0.125 2024-09-18 15:35:45,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471600.0, ans=0.1 2024-09-18 15:35:54,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.95 vs. limit=6.0 2024-09-18 15:36:00,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=471640.0, ans=0.125 2024-09-18 15:36:07,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=471680.0, ans=0.025 2024-09-18 15:36:11,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.92 vs. limit=10.0 2024-09-18 15:36:18,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=471680.0, ans=0.0 2024-09-18 15:36:18,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=471680.0, ans=0.125 2024-09-18 15:36:53,037 INFO [train.py:1198] (1/2) Epoch 27, batch 300, loss[loss=0.2571, ctc_loss=0.1432, cr_loss=0.4369, attn_decoder_loss=0.26, over 29518.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.126, cr_loss=0.3693, attn_decoder_loss=0.2451, over 4509049.93 frames. ], batch size: 92, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:36:56,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=471800.0, ans=0.125 2024-09-18 15:37:01,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=471800.0, ans=0.125 2024-09-18 15:37:22,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=471880.0, ans=0.0 2024-09-18 15:37:34,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=471880.0, ans=0.125 2024-09-18 15:37:35,277 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 8.118e+01 8.847e+01 9.359e+01 3.678e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-18 15:37:53,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=471960.0, ans=0.125 2024-09-18 15:37:53,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=471960.0, ans=0.2 2024-09-18 15:38:03,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2024-09-18 15:38:09,245 INFO [train.py:1198] (1/2) Epoch 27, batch 350, loss[loss=0.2182, ctc_loss=0.1071, cr_loss=0.3236, attn_decoder_loss=0.2233, over 29333.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1264, cr_loss=0.3705, attn_decoder_loss=0.2458, over 4795061.13 frames. ], batch size: 71, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:38:12,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=472000.0, ans=0.0 2024-09-18 15:38:32,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=472040.0, ans=0.125 2024-09-18 15:38:34,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.88 vs. limit=15.0 2024-09-18 15:39:10,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.06 vs. limit=15.0 2024-09-18 15:39:29,365 INFO [train.py:1198] (1/2) Epoch 27, batch 400, loss[loss=0.2398, ctc_loss=0.1228, cr_loss=0.3575, attn_decoder_loss=0.2449, over 29720.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1258, cr_loss=0.3693, attn_decoder_loss=0.2453, over 5024362.16 frames. ], batch size: 82, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 15:39:37,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=472200.0, ans=0.125 2024-09-18 15:39:38,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=472200.0, ans=0.0 2024-09-18 15:40:04,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=472280.0, ans=0.125 2024-09-18 15:40:11,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.649e+01 9.100e+01 9.719e+01 1.502e+02, threshold=1.820e+02, percent-clipped=0.0 2024-09-18 15:40:33,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=472360.0, ans=0.125 2024-09-18 15:40:33,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=472360.0, ans=0.0 2024-09-18 15:40:45,444 INFO [train.py:1198] (1/2) Epoch 27, batch 450, loss[loss=0.2484, ctc_loss=0.1256, cr_loss=0.372, attn_decoder_loss=0.2538, over 29713.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1257, cr_loss=0.369, attn_decoder_loss=0.2453, over 5186280.42 frames. ], batch size: 83, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:40:58,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=472400.0, ans=0.0 2024-09-18 15:42:02,026 INFO [train.py:1198] (1/2) Epoch 27, batch 500, loss[loss=0.2622, ctc_loss=0.1437, cr_loss=0.4049, attn_decoder_loss=0.2664, over 29451.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1254, cr_loss=0.3687, attn_decoder_loss=0.2446, over 5329294.61 frames. ], batch size: 94, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:42:14,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=472600.0, ans=0.05 2024-09-18 15:42:25,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.45 vs. limit=15.0 2024-09-18 15:42:41,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=472680.0, ans=0.2 2024-09-18 15:42:49,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=472680.0, ans=0.125 2024-09-18 15:42:50,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.545e+01 8.912e+01 9.466e+01 2.661e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-18 15:42:55,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=472720.0, ans=0.025 2024-09-18 15:43:06,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=472760.0, ans=0.0 2024-09-18 15:43:07,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=472760.0, ans=0.0 2024-09-18 15:43:12,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=472760.0, ans=0.125 2024-09-18 15:43:23,226 INFO [train.py:1198] (1/2) Epoch 27, batch 550, loss[loss=0.2489, ctc_loss=0.1273, cr_loss=0.3724, attn_decoder_loss=0.2541, over 28932.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1259, cr_loss=0.3692, attn_decoder_loss=0.245, over 5422818.18 frames. ], batch size: 104, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:43:25,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=472800.0, ans=0.0 2024-09-18 15:43:32,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=472800.0, ans=0.025 2024-09-18 15:43:34,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=472800.0, ans=0.125 2024-09-18 15:43:38,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=472840.0, ans=0.0 2024-09-18 15:43:52,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.55 vs. limit=15.0 2024-09-18 15:44:09,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-09-18 15:44:15,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=15.0 2024-09-18 15:44:18,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=472920.0, ans=0.2 2024-09-18 15:44:24,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=472960.0, ans=0.1 2024-09-18 15:44:39,413 INFO [train.py:1198] (1/2) Epoch 27, batch 600, loss[loss=0.2597, ctc_loss=0.1334, cr_loss=0.3748, attn_decoder_loss=0.2654, over 29217.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1262, cr_loss=0.3703, attn_decoder_loss=0.2455, over 5508837.98 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:44:39,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=473000.0, ans=0.125 2024-09-18 15:44:46,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=473000.0, ans=15.0 2024-09-18 15:44:48,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=473000.0, ans=0.125 2024-09-18 15:44:50,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473000.0, ans=0.1 2024-09-18 15:44:51,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=473000.0, ans=0.125 2024-09-18 15:45:11,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=473080.0, ans=0.125 2024-09-18 15:45:22,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.416e+01 8.737e+01 9.314e+01 1.829e+02, threshold=1.747e+02, percent-clipped=2.0 2024-09-18 15:45:35,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=473120.0, ans=0.125 2024-09-18 15:45:43,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=473160.0, ans=0.125 2024-09-18 15:45:54,997 INFO [train.py:1198] (1/2) Epoch 27, batch 650, loss[loss=0.2371, ctc_loss=0.1218, cr_loss=0.3659, attn_decoder_loss=0.2418, over 29782.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1253, cr_loss=0.3682, attn_decoder_loss=0.2447, over 5585992.97 frames. ], batch size: 81, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:46:05,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473200.0, ans=0.1 2024-09-18 15:46:18,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=473240.0, ans=0.025 2024-09-18 15:46:22,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=473240.0, ans=0.125 2024-09-18 15:46:28,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2024-09-18 15:46:30,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=473280.0, ans=0.025 2024-09-18 15:46:41,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-18 15:47:15,995 INFO [train.py:1198] (1/2) Epoch 27, batch 700, loss[loss=0.2288, ctc_loss=0.1158, cr_loss=0.344, attn_decoder_loss=0.2337, over 29556.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1252, cr_loss=0.368, attn_decoder_loss=0.245, over 5636312.40 frames. ], batch size: 76, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:47:23,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473400.0, ans=0.1 2024-09-18 15:47:34,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=473440.0, ans=0.05 2024-09-18 15:47:48,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473480.0, ans=0.1 2024-09-18 15:47:51,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=473480.0, ans=10.0 2024-09-18 15:47:55,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=473480.0, ans=0.125 2024-09-18 15:48:00,171 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.368e+01 8.450e+01 9.020e+01 9.619e+01 3.078e+02, threshold=1.804e+02, percent-clipped=1.0 2024-09-18 15:48:06,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=473520.0, ans=0.1 2024-09-18 15:48:32,689 INFO [train.py:1198] (1/2) Epoch 27, batch 750, loss[loss=0.2435, ctc_loss=0.1278, cr_loss=0.3685, attn_decoder_loss=0.2482, over 29714.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1251, cr_loss=0.3675, attn_decoder_loss=0.2446, over 5675850.71 frames. ], batch size: 82, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:48:53,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-09-18 15:49:02,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.50 vs. limit=15.0 2024-09-18 15:49:23,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473720.0, ans=0.1 2024-09-18 15:49:34,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=473760.0, ans=0.125 2024-09-18 15:49:48,907 INFO [train.py:1198] (1/2) Epoch 27, batch 800, loss[loss=0.2254, ctc_loss=0.1141, cr_loss=0.3441, attn_decoder_loss=0.2302, over 29594.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.125, cr_loss=0.3676, attn_decoder_loss=0.2444, over 5705364.10 frames. ], batch size: 73, lr: 4.13e-03, grad_scale: 16.0 2024-09-18 15:49:53,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=473800.0, ans=0.0 2024-09-18 15:50:04,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=473840.0, ans=0.2 2024-09-18 15:50:19,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=473880.0, ans=0.0 2024-09-18 15:50:25,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=473880.0, ans=0.125 2024-09-18 15:50:35,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.485e+01 9.104e+01 9.795e+01 7.519e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-18 15:50:40,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=473920.0, ans=0.0 2024-09-18 15:50:41,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=473920.0, ans=0.09899494936611666 2024-09-18 15:50:43,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=473920.0, ans=0.0 2024-09-18 15:50:56,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=473960.0, ans=0.1 2024-09-18 15:51:09,199 INFO [train.py:1198] (1/2) Epoch 27, batch 850, loss[loss=0.2542, ctc_loss=0.1326, cr_loss=0.3663, attn_decoder_loss=0.2596, over 29716.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1248, cr_loss=0.3673, attn_decoder_loss=0.244, over 5735441.65 frames. ], batch size: 89, lr: 4.13e-03, grad_scale: 16.0 2024-09-18 15:51:11,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-09-18 15:51:15,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.97 vs. limit=22.5 2024-09-18 15:51:19,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=474000.0, ans=0.125 2024-09-18 15:51:31,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474040.0, ans=0.1 2024-09-18 15:51:46,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-09-18 15:51:47,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.40 vs. limit=15.0 2024-09-18 15:52:00,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=474120.0, ans=0.125 2024-09-18 15:52:03,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=474120.0, ans=0.125 2024-09-18 15:52:24,936 INFO [train.py:1198] (1/2) Epoch 27, batch 900, loss[loss=0.2208, ctc_loss=0.1158, cr_loss=0.3524, attn_decoder_loss=0.2246, over 29607.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1254, cr_loss=0.368, attn_decoder_loss=0.2446, over 5740667.02 frames. ], batch size: 73, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:53:06,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=474280.0, ans=0.125 2024-09-18 15:53:10,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.501e+01 8.938e+01 9.467e+01 2.355e+02, threshold=1.788e+02, percent-clipped=2.0 2024-09-18 15:53:17,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.48 vs. limit=10.0 2024-09-18 15:53:31,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=474360.0, ans=0.0 2024-09-18 15:53:41,178 INFO [train.py:1198] (1/2) Epoch 27, batch 950, loss[loss=0.2317, ctc_loss=0.1167, cr_loss=0.3572, attn_decoder_loss=0.2365, over 29500.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1255, cr_loss=0.3684, attn_decoder_loss=0.2449, over 5743591.80 frames. ], batch size: 74, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:54:38,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=474520.0, ans=0.2 2024-09-18 15:55:01,615 INFO [train.py:1198] (1/2) Epoch 27, batch 1000, loss[loss=0.2436, ctc_loss=0.1322, cr_loss=0.3834, attn_decoder_loss=0.2475, over 29505.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1264, cr_loss=0.3705, attn_decoder_loss=0.2458, over 5736826.30 frames. ], batch size: 77, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:55:09,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=474600.0, ans=0.125 2024-09-18 15:55:41,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=474680.0, ans=0.025 2024-09-18 15:55:44,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=474680.0, ans=0.0 2024-09-18 15:55:47,614 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.547e+01 9.112e+01 9.993e+01 2.254e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 15:56:17,918 INFO [train.py:1198] (1/2) Epoch 27, batch 1050, loss[loss=0.2518, ctc_loss=0.129, cr_loss=0.3895, attn_decoder_loss=0.2568, over 29682.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.126, cr_loss=0.369, attn_decoder_loss=0.2453, over 5745954.48 frames. ], batch size: 85, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:56:21,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=474800.0, ans=0.1 2024-09-18 15:56:24,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=474800.0, ans=0.0 2024-09-18 15:56:31,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=474840.0, ans=0.125 2024-09-18 15:56:56,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=474880.0, ans=0.025 2024-09-18 15:57:07,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474920.0, ans=0.1 2024-09-18 15:57:16,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=474920.0, ans=0.125 2024-09-18 15:57:27,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=474960.0, ans=0.125 2024-09-18 15:57:34,517 INFO [train.py:1198] (1/2) Epoch 27, batch 1100, loss[loss=0.2305, ctc_loss=0.1261, cr_loss=0.3723, attn_decoder_loss=0.2338, over 29441.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1261, cr_loss=0.3688, attn_decoder_loss=0.245, over 5757380.28 frames. ], batch size: 78, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:57:41,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.03 vs. limit=22.5 2024-09-18 15:57:47,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=475000.0, ans=0.09899494936611666 2024-09-18 15:57:59,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.17 vs. limit=15.0 2024-09-18 15:58:02,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=475040.0, ans=0.0 2024-09-18 15:58:18,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=475080.0, ans=0.125 2024-09-18 15:58:18,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=475080.0, ans=0.125 2024-09-18 15:58:22,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.448e+01 9.006e+01 9.632e+01 1.338e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-18 15:58:25,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=475120.0, ans=0.025 2024-09-18 15:58:36,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=475160.0, ans=0.125 2024-09-18 15:58:39,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=475160.0, ans=0.0 2024-09-18 15:58:54,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=475200.0, ans=0.0 2024-09-18 15:58:55,772 INFO [train.py:1198] (1/2) Epoch 27, batch 1150, loss[loss=0.2327, ctc_loss=0.1177, cr_loss=0.3564, attn_decoder_loss=0.2376, over 29457.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1258, cr_loss=0.3683, attn_decoder_loss=0.2449, over 5755672.12 frames. ], batch size: 78, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:59:20,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=475240.0, ans=0.05 2024-09-18 15:59:28,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=475280.0, ans=0.04949747468305833 2024-09-18 15:59:29,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=475280.0, ans=0.125 2024-09-18 15:59:56,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=475360.0, ans=0.125 2024-09-18 16:00:11,932 INFO [train.py:1198] (1/2) Epoch 27, batch 1200, loss[loss=0.2474, ctc_loss=0.1304, cr_loss=0.3789, attn_decoder_loss=0.2519, over 29659.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1262, cr_loss=0.3689, attn_decoder_loss=0.2454, over 5747835.19 frames. ], batch size: 85, lr: 4.12e-03, grad_scale: 16.0 2024-09-18 16:00:18,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=475400.0, ans=0.025 2024-09-18 16:00:32,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=475440.0, ans=0.2 2024-09-18 16:00:39,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=475440.0, ans=0.0 2024-09-18 16:00:48,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2024-09-18 16:00:55,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.96 vs. limit=15.0 2024-09-18 16:00:59,251 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.652e+01 9.107e+01 9.727e+01 1.637e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-18 16:01:09,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.60 vs. limit=15.0 2024-09-18 16:01:19,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=475560.0, ans=0.2 2024-09-18 16:01:27,920 INFO [train.py:1198] (1/2) Epoch 27, batch 1250, loss[loss=0.2568, ctc_loss=0.1412, cr_loss=0.3903, attn_decoder_loss=0.261, over 29527.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1262, cr_loss=0.3694, attn_decoder_loss=0.2458, over 5774800.39 frames. ], batch size: 92, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:01:32,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=475600.0, ans=0.125 2024-09-18 16:01:35,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=475600.0, ans=0.0 2024-09-18 16:01:56,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-18 16:02:06,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=22.5 2024-09-18 16:02:44,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=475760.0, ans=0.125 2024-09-18 16:02:48,706 INFO [train.py:1198] (1/2) Epoch 27, batch 1300, loss[loss=0.2472, ctc_loss=0.1275, cr_loss=0.3627, attn_decoder_loss=0.2524, over 28244.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1258, cr_loss=0.3683, attn_decoder_loss=0.2451, over 5778878.71 frames. ], batch size: 111, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:03:12,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.92 vs. limit=22.5 2024-09-18 16:03:22,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=475880.0, ans=0.125 2024-09-18 16:03:35,952 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.380e+01 8.992e+01 9.418e+01 1.555e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-18 16:03:37,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=475920.0, ans=0.09899494936611666 2024-09-18 16:03:42,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=475920.0, ans=0.125 2024-09-18 16:03:54,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=475960.0, ans=0.0 2024-09-18 16:04:05,245 INFO [train.py:1198] (1/2) Epoch 27, batch 1350, loss[loss=0.2371, ctc_loss=0.1236, cr_loss=0.3735, attn_decoder_loss=0.2414, over 29765.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.125, cr_loss=0.3673, attn_decoder_loss=0.2447, over 5796055.22 frames. ], batch size: 81, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:04:44,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=476080.0, ans=0.125 2024-09-18 16:04:47,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=476080.0, ans=0.125 2024-09-18 16:04:53,598 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:04:59,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=476120.0, ans=0.2 2024-09-18 16:05:02,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=476120.0, ans=0.125 2024-09-18 16:05:19,586 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:05:20,670 INFO [train.py:1198] (1/2) Epoch 27, batch 1400, loss[loss=0.2016, ctc_loss=0.09796, cr_loss=0.2896, attn_decoder_loss=0.2067, over 29593.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1246, cr_loss=0.3665, attn_decoder_loss=0.2443, over 5807347.44 frames. ], batch size: 69, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:05:31,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=476200.0, ans=0.2 2024-09-18 16:05:54,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=476280.0, ans=0.0 2024-09-18 16:06:09,969 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.342e+01 8.774e+01 9.500e+01 1.505e+02, threshold=1.755e+02, percent-clipped=0.0 2024-09-18 16:06:38,574 INFO [train.py:1198] (1/2) Epoch 27, batch 1450, loss[loss=0.2582, ctc_loss=0.1351, cr_loss=0.3779, attn_decoder_loss=0.2635, over 29443.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1249, cr_loss=0.3673, attn_decoder_loss=0.2447, over 5804388.85 frames. ], batch size: 94, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:06:43,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2024-09-18 16:07:19,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=476480.0, ans=0.125 2024-09-18 16:07:20,840 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:07:25,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=476520.0, ans=0.125 2024-09-18 16:07:41,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=476560.0, ans=0.125 2024-09-18 16:07:43,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=476560.0, ans=0.125 2024-09-18 16:07:56,828 INFO [train.py:1198] (1/2) Epoch 27, batch 1500, loss[loss=0.2515, ctc_loss=0.1348, cr_loss=0.3851, attn_decoder_loss=0.2559, over 29621.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1251, cr_loss=0.3676, attn_decoder_loss=0.2451, over 5804455.49 frames. ], batch size: 86, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:08:00,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=476600.0, ans=0.125 2024-09-18 16:08:31,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=476680.0, ans=0.0 2024-09-18 16:08:43,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=476720.0, ans=0.125 2024-09-18 16:08:44,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.578e+01 9.265e+01 1.012e+02 4.469e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-18 16:08:53,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-18 16:08:58,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=476760.0, ans=0.02 2024-09-18 16:09:13,770 INFO [train.py:1198] (1/2) Epoch 27, batch 1550, loss[loss=0.2594, ctc_loss=0.1415, cr_loss=0.4161, attn_decoder_loss=0.2633, over 29525.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1256, cr_loss=0.3685, attn_decoder_loss=0.2453, over 5780870.76 frames. ], batch size: 90, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:09:18,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476800.0, ans=0.1 2024-09-18 16:09:20,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=476800.0, ans=0.125 2024-09-18 16:09:20,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-18 16:09:34,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=476840.0, ans=10.0 2024-09-18 16:09:38,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476840.0, ans=0.1 2024-09-18 16:10:01,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=476920.0, ans=0.07 2024-09-18 16:10:04,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2024-09-18 16:10:31,969 INFO [train.py:1198] (1/2) Epoch 27, batch 1600, loss[loss=0.2466, ctc_loss=0.1224, cr_loss=0.3724, attn_decoder_loss=0.2521, over 29665.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1257, cr_loss=0.3681, attn_decoder_loss=0.2453, over 5762906.09 frames. ], batch size: 85, lr: 4.12e-03, grad_scale: 16.0 2024-09-18 16:10:32,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=477000.0, ans=0.025 2024-09-18 16:10:44,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=477000.0, ans=0.0 2024-09-18 16:10:48,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=477040.0, ans=0.025 2024-09-18 16:11:04,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=477080.0, ans=0.125 2024-09-18 16:11:11,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=477080.0, ans=0.125 2024-09-18 16:11:11,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=477080.0, ans=0.0 2024-09-18 16:11:11,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.92 vs. limit=10.0 2024-09-18 16:11:23,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.546e+01 9.000e+01 9.569e+01 2.285e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-18 16:11:35,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=477160.0, ans=0.125 2024-09-18 16:11:40,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=477160.0, ans=0.5 2024-09-18 16:11:41,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=477160.0, ans=0.0 2024-09-18 16:11:50,241 INFO [train.py:1198] (1/2) Epoch 27, batch 1650, loss[loss=0.2477, ctc_loss=0.1207, cr_loss=0.3631, attn_decoder_loss=0.2538, over 29698.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1254, cr_loss=0.3679, attn_decoder_loss=0.2452, over 5758617.80 frames. ], batch size: 89, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:11:50,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=477200.0, ans=0.0 2024-09-18 16:12:22,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477280.0, ans=0.1 2024-09-18 16:12:37,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=477320.0, ans=0.125 2024-09-18 16:12:42,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=477320.0, ans=0.125 2024-09-18 16:12:51,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=477360.0, ans=0.125 2024-09-18 16:13:05,982 INFO [train.py:1198] (1/2) Epoch 27, batch 1700, loss[loss=0.2155, ctc_loss=0.1111, cr_loss=0.3541, attn_decoder_loss=0.2192, over 29573.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1253, cr_loss=0.3678, attn_decoder_loss=0.245, over 5781040.25 frames. ], batch size: 69, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:13:17,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=477400.0, ans=0.0 2024-09-18 16:13:54,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=477520.0, ans=0.125 2024-09-18 16:13:54,260 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:13:55,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=477520.0, ans=6.0 2024-09-18 16:13:56,895 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.516e+01 9.095e+01 9.729e+01 1.325e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-18 16:14:04,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=477520.0, ans=0.125 2024-09-18 16:14:18,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=477560.0, ans=0.125 2024-09-18 16:14:19,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=477560.0, ans=0.025 2024-09-18 16:14:24,683 INFO [train.py:1198] (1/2) Epoch 27, batch 1750, loss[loss=0.2096, ctc_loss=0.1047, cr_loss=0.3449, attn_decoder_loss=0.2136, over 29355.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1251, cr_loss=0.3674, attn_decoder_loss=0.2446, over 5788557.74 frames. ], batch size: 67, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:14:36,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.69 vs. limit=15.0 2024-09-18 16:14:38,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=22.5 2024-09-18 16:14:50,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=477640.0, ans=0.0 2024-09-18 16:15:00,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=477680.0, ans=0.0 2024-09-18 16:15:20,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=477720.0, ans=0.05 2024-09-18 16:15:22,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.68 vs. limit=15.0 2024-09-18 16:15:23,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=477720.0, ans=0.0 2024-09-18 16:15:29,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=477760.0, ans=0.2 2024-09-18 16:15:30,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=477760.0, ans=0.025 2024-09-18 16:15:42,639 INFO [train.py:1198] (1/2) Epoch 27, batch 1800, loss[loss=0.2515, ctc_loss=0.1292, cr_loss=0.3581, attn_decoder_loss=0.2571, over 29705.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1256, cr_loss=0.3682, attn_decoder_loss=0.2449, over 5792434.05 frames. ], batch size: 83, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:16:02,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=477840.0, ans=0.125 2024-09-18 16:16:08,223 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.27 vs. limit=22.5 2024-09-18 16:16:30,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=477920.0, ans=0.125 2024-09-18 16:16:31,638 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.306e+01 8.965e+01 9.478e+01 1.194e+02, threshold=1.793e+02, percent-clipped=0.0 2024-09-18 16:16:59,480 INFO [train.py:1198] (1/2) Epoch 27, batch 1850, loss[loss=0.2472, ctc_loss=0.1387, cr_loss=0.3929, attn_decoder_loss=0.2505, over 29638.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1258, cr_loss=0.3689, attn_decoder_loss=0.245, over 5797158.60 frames. ], batch size: 86, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:17:22,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=478040.0, ans=0.125 2024-09-18 16:17:30,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478080.0, ans=0.1 2024-09-18 16:17:42,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.96 vs. limit=10.0 2024-09-18 16:17:43,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478080.0, ans=0.1 2024-09-18 16:18:03,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.01 vs. limit=10.0 2024-09-18 16:18:17,768 INFO [train.py:1198] (1/2) Epoch 27, batch 1900, loss[loss=0.2509, ctc_loss=0.123, cr_loss=0.3819, attn_decoder_loss=0.2567, over 29717.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1257, cr_loss=0.3688, attn_decoder_loss=0.2452, over 5805758.40 frames. ], batch size: 89, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:18:32,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2024-09-18 16:19:08,922 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.594e+01 9.103e+01 9.777e+01 2.715e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-18 16:19:10,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=478320.0, ans=0.125 2024-09-18 16:19:15,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478320.0, ans=0.1 2024-09-18 16:19:21,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.06 vs. limit=12.0 2024-09-18 16:19:22,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=478360.0, ans=0.125 2024-09-18 16:19:25,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=478360.0, ans=0.2 2024-09-18 16:19:36,745 INFO [train.py:1198] (1/2) Epoch 27, batch 1950, loss[loss=0.2367, ctc_loss=0.1281, cr_loss=0.3866, attn_decoder_loss=0.2402, over 29457.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1261, cr_loss=0.3701, attn_decoder_loss=0.2462, over 5820875.41 frames. ], batch size: 78, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:19:49,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=478400.0, ans=0.07 2024-09-18 16:20:10,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=478480.0, ans=0.0 2024-09-18 16:20:27,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478520.0, ans=0.1 2024-09-18 16:20:37,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478560.0, ans=0.1 2024-09-18 16:20:38,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-18 16:20:40,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=478560.0, ans=0.0 2024-09-18 16:20:40,754 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:20:43,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=478560.0, ans=0.0 2024-09-18 16:20:46,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=478560.0, ans=0.125 2024-09-18 16:20:52,626 INFO [train.py:1198] (1/2) Epoch 27, batch 2000, loss[loss=0.2212, ctc_loss=0.116, cr_loss=0.3631, attn_decoder_loss=0.2248, over 29327.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1263, cr_loss=0.3704, attn_decoder_loss=0.2465, over 5797698.36 frames. ], batch size: 67, lr: 4.11e-03, grad_scale: 16.0 2024-09-18 16:21:05,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=478600.0, ans=0.125 2024-09-18 16:21:15,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=478640.0, ans=0.0 2024-09-18 16:21:18,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=478640.0, ans=0.0 2024-09-18 16:21:45,120 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.586e+01 9.013e+01 9.702e+01 5.300e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-18 16:21:50,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=478720.0, ans=0.0 2024-09-18 16:22:01,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.44 vs. limit=15.0 2024-09-18 16:22:10,911 INFO [train.py:1198] (1/2) Epoch 27, batch 2050, loss[loss=0.2152, ctc_loss=0.1063, cr_loss=0.3415, attn_decoder_loss=0.2198, over 29424.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1257, cr_loss=0.369, attn_decoder_loss=0.2455, over 5789286.56 frames. ], batch size: 70, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:22:24,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=478840.0, ans=0.0 2024-09-18 16:22:29,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478840.0, ans=0.0 2024-09-18 16:23:07,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=478920.0, ans=0.0 2024-09-18 16:23:28,901 INFO [train.py:1198] (1/2) Epoch 27, batch 2100, loss[loss=0.2325, ctc_loss=0.1125, cr_loss=0.3312, attn_decoder_loss=0.2385, over 29749.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1251, cr_loss=0.3677, attn_decoder_loss=0.245, over 5801425.48 frames. ], batch size: 81, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:23:33,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=479000.0, ans=0.0 2024-09-18 16:23:41,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=479000.0, ans=0.025 2024-09-18 16:23:49,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=479040.0, ans=0.1 2024-09-18 16:24:04,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=479080.0, ans=0.025 2024-09-18 16:24:09,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=479080.0, ans=0.0 2024-09-18 16:24:11,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=479080.0, ans=0.125 2024-09-18 16:24:11,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=479080.0, ans=0.125 2024-09-18 16:24:17,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=479120.0, ans=0.125 2024-09-18 16:24:18,656 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.253e+01 8.787e+01 9.429e+01 1.232e+02, threshold=1.757e+02, percent-clipped=0.0 2024-09-18 16:24:24,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479120.0, ans=0.1 2024-09-18 16:24:45,029 INFO [train.py:1198] (1/2) Epoch 27, batch 2150, loss[loss=0.2396, ctc_loss=0.129, cr_loss=0.3592, attn_decoder_loss=0.2439, over 29458.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1244, cr_loss=0.3667, attn_decoder_loss=0.2443, over 5816658.47 frames. ], batch size: 78, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:24:52,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=479200.0, ans=0.1 2024-09-18 16:24:53,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-18 16:25:02,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-09-18 16:25:14,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=479280.0, ans=0.0 2024-09-18 16:25:15,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=479280.0, ans=0.125 2024-09-18 16:25:20,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=479280.0, ans=0.0 2024-09-18 16:25:28,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=479280.0, ans=0.0 2024-09-18 16:25:33,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=479320.0, ans=0.125 2024-09-18 16:25:38,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=479320.0, ans=0.125 2024-09-18 16:26:03,691 INFO [train.py:1198] (1/2) Epoch 27, batch 2200, loss[loss=0.2449, ctc_loss=0.1268, cr_loss=0.3869, attn_decoder_loss=0.2494, over 29642.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1247, cr_loss=0.3675, attn_decoder_loss=0.2445, over 5814103.22 frames. ], batch size: 86, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:26:05,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=479400.0, ans=0.0 2024-09-18 16:26:08,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=479400.0, ans=0.125 2024-09-18 16:26:12,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-09-18 16:26:17,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=479440.0, ans=0.0 2024-09-18 16:26:20,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=479440.0, ans=0.0 2024-09-18 16:26:47,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2024-09-18 16:26:55,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.471e+01 9.024e+01 9.757e+01 3.508e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-18 16:27:07,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.69 vs. limit=10.0 2024-09-18 16:27:12,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=479560.0, ans=0.0 2024-09-18 16:27:21,639 INFO [train.py:1198] (1/2) Epoch 27, batch 2250, loss[loss=0.2383, ctc_loss=0.1199, cr_loss=0.3535, attn_decoder_loss=0.2436, over 29716.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1243, cr_loss=0.3664, attn_decoder_loss=0.2442, over 5811035.29 frames. ], batch size: 82, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:27:32,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=479600.0, ans=0.0 2024-09-18 16:27:52,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2024-09-18 16:27:52,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-09-18 16:27:58,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=479680.0, ans=0.125 2024-09-18 16:28:07,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=479720.0, ans=0.09899494936611666 2024-09-18 16:28:36,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=479800.0, ans=0.125 2024-09-18 16:28:37,668 INFO [train.py:1198] (1/2) Epoch 27, batch 2300, loss[loss=0.2064, ctc_loss=0.09569, cr_loss=0.2951, attn_decoder_loss=0.2122, over 29313.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1238, cr_loss=0.365, attn_decoder_loss=0.2434, over 5798199.23 frames. ], batch size: 71, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:28:40,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=479800.0, ans=0.125 2024-09-18 16:28:52,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=479840.0, ans=0.0 2024-09-18 16:28:54,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479840.0, ans=0.1 2024-09-18 16:29:09,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479880.0, ans=0.1 2024-09-18 16:29:29,679 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.211e+01 8.889e+01 9.358e+01 1.563e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-18 16:30:02,926 INFO [train.py:1198] (1/2) Epoch 27, batch 2350, loss[loss=0.2496, ctc_loss=0.1287, cr_loss=0.3779, attn_decoder_loss=0.2547, over 29695.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1238, cr_loss=0.3652, attn_decoder_loss=0.2436, over 5805045.60 frames. ], batch size: 83, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:30:39,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=480080.0, ans=0.025 2024-09-18 16:30:59,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=480120.0, ans=0.0 2024-09-18 16:31:04,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=480160.0, ans=0.2 2024-09-18 16:31:07,322 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:31:10,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=480160.0, ans=0.125 2024-09-18 16:31:18,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=480160.0, ans=0.125 2024-09-18 16:31:20,883 INFO [train.py:1198] (1/2) Epoch 27, batch 2400, loss[loss=0.2256, ctc_loss=0.1051, cr_loss=0.3161, attn_decoder_loss=0.2319, over 29522.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1242, cr_loss=0.3657, attn_decoder_loss=0.2442, over 5808337.44 frames. ], batch size: 76, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 16:31:27,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480200.0, ans=0.1 2024-09-18 16:31:30,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=480200.0, ans=0.0 2024-09-18 16:31:33,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=480200.0, ans=10.0 2024-09-18 16:31:37,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480240.0, ans=0.1 2024-09-18 16:31:48,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480240.0, ans=0.1 2024-09-18 16:31:51,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=480280.0, ans=0.125 2024-09-18 16:32:08,314 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:32:09,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=480320.0, ans=0.125 2024-09-18 16:32:12,453 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.717e+01 9.101e+01 9.636e+01 2.464e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-18 16:32:17,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=480320.0, ans=0.125 2024-09-18 16:32:18,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=480320.0, ans=0.1 2024-09-18 16:32:36,817 INFO [train.py:1198] (1/2) Epoch 27, batch 2450, loss[loss=0.2415, ctc_loss=0.1216, cr_loss=0.3558, attn_decoder_loss=0.2469, over 29707.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.125, cr_loss=0.3676, attn_decoder_loss=0.2451, over 5784891.60 frames. ], batch size: 82, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:32:38,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=480400.0, ans=0.0 2024-09-18 16:32:40,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=480400.0, ans=0.0 2024-09-18 16:33:07,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=480480.0, ans=0.125 2024-09-18 16:33:15,823 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:33:17,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-09-18 16:33:18,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=480480.0, ans=0.0 2024-09-18 16:33:20,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2024-09-18 16:33:24,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=480520.0, ans=0.1 2024-09-18 16:33:54,968 INFO [train.py:1198] (1/2) Epoch 27, batch 2500, loss[loss=0.2486, ctc_loss=0.1224, cr_loss=0.3585, attn_decoder_loss=0.2547, over 29612.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1253, cr_loss=0.368, attn_decoder_loss=0.2453, over 5795223.34 frames. ], batch size: 86, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:33:59,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=480600.0, ans=0.125 2024-09-18 16:34:10,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=480640.0, ans=0.125 2024-09-18 16:34:12,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=480640.0, ans=0.125 2024-09-18 16:34:16,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=480640.0, ans=0.0 2024-09-18 16:34:20,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=480640.0, ans=10.0 2024-09-18 16:34:27,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=480680.0, ans=0.025 2024-09-18 16:34:36,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=480680.0, ans=0.125 2024-09-18 16:34:39,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480720.0, ans=0.1 2024-09-18 16:34:49,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.457e+01 8.825e+01 9.370e+01 1.600e+02, threshold=1.765e+02, percent-clipped=0.0 2024-09-18 16:34:49,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=480720.0, ans=0.0 2024-09-18 16:35:14,344 INFO [train.py:1198] (1/2) Epoch 27, batch 2550, loss[loss=0.2113, ctc_loss=0.1051, cr_loss=0.338, attn_decoder_loss=0.2156, over 29328.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1249, cr_loss=0.3673, attn_decoder_loss=0.2452, over 5796272.37 frames. ], batch size: 67, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:35:16,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=480800.0, ans=0.0 2024-09-18 16:35:38,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=480840.0, ans=0.09899494936611666 2024-09-18 16:35:52,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=480880.0, ans=0.125 2024-09-18 16:35:58,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.29 vs. limit=15.0 2024-09-18 16:36:04,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=480920.0, ans=0.0 2024-09-18 16:36:15,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=480960.0, ans=0.1 2024-09-18 16:36:16,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=480960.0, ans=0.125 2024-09-18 16:36:30,364 INFO [train.py:1198] (1/2) Epoch 27, batch 2600, loss[loss=0.2282, ctc_loss=0.1086, cr_loss=0.3415, attn_decoder_loss=0.2339, over 29455.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1256, cr_loss=0.3688, attn_decoder_loss=0.2457, over 5792770.20 frames. ], batch size: 78, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:36:45,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=481040.0, ans=0.125 2024-09-18 16:37:13,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-09-18 16:37:16,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=481080.0, ans=0.0 2024-09-18 16:37:16,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=481080.0, ans=0.125 2024-09-18 16:37:19,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=481120.0, ans=0.125 2024-09-18 16:37:24,591 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.374e+01 8.989e+01 9.651e+01 1.905e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-18 16:37:48,735 INFO [train.py:1198] (1/2) Epoch 27, batch 2650, loss[loss=0.2491, ctc_loss=0.1344, cr_loss=0.3934, attn_decoder_loss=0.2531, over 29232.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1253, cr_loss=0.3686, attn_decoder_loss=0.2457, over 5799802.02 frames. ], batch size: 100, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:38:01,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2024-09-18 16:38:14,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=481240.0, ans=0.0 2024-09-18 16:38:19,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=481280.0, ans=0.0 2024-09-18 16:38:19,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=481280.0, ans=0.125 2024-09-18 16:38:28,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=481280.0, ans=0.0 2024-09-18 16:38:37,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=481320.0, ans=0.125 2024-09-18 16:38:59,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=481360.0, ans=0.0 2024-09-18 16:39:06,863 INFO [train.py:1198] (1/2) Epoch 27, batch 2700, loss[loss=0.2528, ctc_loss=0.1338, cr_loss=0.3861, attn_decoder_loss=0.2575, over 29557.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1257, cr_loss=0.3696, attn_decoder_loss=0.2461, over 5796489.76 frames. ], batch size: 87, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:39:25,414 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:39:26,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=481440.0, ans=0.125 2024-09-18 16:39:43,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2024-09-18 16:39:51,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=12.0 2024-09-18 16:39:54,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-09-18 16:39:55,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=481520.0, ans=0.125 2024-09-18 16:39:58,202 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.531e+01 8.958e+01 9.495e+01 1.703e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-18 16:40:09,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=481560.0, ans=0.0 2024-09-18 16:40:23,155 INFO [train.py:1198] (1/2) Epoch 27, batch 2750, loss[loss=0.2317, ctc_loss=0.1197, cr_loss=0.364, attn_decoder_loss=0.2361, over 29525.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.125, cr_loss=0.368, attn_decoder_loss=0.2451, over 5795040.48 frames. ], batch size: 75, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:41:15,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=481720.0, ans=0.125 2024-09-18 16:41:17,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.74 vs. limit=6.0 2024-09-18 16:41:26,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=481760.0, ans=0.125 2024-09-18 16:41:41,656 INFO [train.py:1198] (1/2) Epoch 27, batch 2800, loss[loss=0.2671, ctc_loss=0.1607, cr_loss=0.3726, attn_decoder_loss=0.2707, over 20255.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1257, cr_loss=0.3694, attn_decoder_loss=0.2453, over 5776227.21 frames. ], batch size: 210, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 16:41:44,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=481800.0, ans=0.125 2024-09-18 16:42:09,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481840.0, ans=0.1 2024-09-18 16:42:09,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=481840.0, ans=0.0 2024-09-18 16:42:15,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=481880.0, ans=0.0 2024-09-18 16:42:21,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=481880.0, ans=0.2 2024-09-18 16:42:22,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=481880.0, ans=0.0 2024-09-18 16:42:35,141 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.581e+01 9.268e+01 9.879e+01 2.017e+02, threshold=1.854e+02, percent-clipped=1.0 2024-09-18 16:42:43,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=481960.0, ans=0.125 2024-09-18 16:42:47,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=481960.0, ans=0.125 2024-09-18 16:42:56,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=481960.0, ans=0.0 2024-09-18 16:42:59,623 INFO [train.py:1198] (1/2) Epoch 27, batch 2850, loss[loss=0.2263, ctc_loss=0.1146, cr_loss=0.3374, attn_decoder_loss=0.2312, over 29502.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1262, cr_loss=0.3704, attn_decoder_loss=0.2459, over 5760515.06 frames. ], batch size: 77, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 16:43:10,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=482000.0, ans=0.125 2024-09-18 16:43:14,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.91 vs. limit=10.0 2024-09-18 16:43:53,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=482120.0, ans=0.125 2024-09-18 16:43:55,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=482120.0, ans=0.125 2024-09-18 16:44:15,379 INFO [train.py:1198] (1/2) Epoch 27, batch 2900, loss[loss=0.2341, ctc_loss=0.1175, cr_loss=0.3614, attn_decoder_loss=0.2391, over 29396.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1269, cr_loss=0.3721, attn_decoder_loss=0.2471, over 5786207.06 frames. ], batch size: 79, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:44:21,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=482200.0, ans=0.125 2024-09-18 16:44:32,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=482240.0, ans=0.0 2024-09-18 16:44:35,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.70 vs. limit=15.0 2024-09-18 16:45:12,315 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.546e+01 8.987e+01 9.686e+01 7.083e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 16:45:30,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=482360.0, ans=0.0 2024-09-18 16:45:33,848 INFO [train.py:1198] (1/2) Epoch 27, batch 2950, loss[loss=0.2451, ctc_loss=0.136, cr_loss=0.3817, attn_decoder_loss=0.2488, over 29526.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1259, cr_loss=0.37, attn_decoder_loss=0.2457, over 5779987.02 frames. ], batch size: 75, lr: 4.09e-03, grad_scale: 4.0 2024-09-18 16:45:35,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=482400.0, ans=0.125 2024-09-18 16:45:48,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2024-09-18 16:45:49,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=482440.0, ans=0.125 2024-09-18 16:46:00,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=482440.0, ans=0.0 2024-09-18 16:46:07,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=482480.0, ans=0.0 2024-09-18 16:46:52,466 INFO [train.py:1198] (1/2) Epoch 27, batch 3000, loss[loss=0.2464, ctc_loss=0.1314, cr_loss=0.3779, attn_decoder_loss=0.2508, over 29753.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.126, cr_loss=0.3701, attn_decoder_loss=0.2457, over 5781573.22 frames. ], batch size: 81, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:46:52,466 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 16:47:10,903 INFO [train.py:1230] (1/2) Epoch 27, validation: loss=0.212, ctc_loss=0.03868, cr_loss=6.15e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 16:47:10,903 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 16:47:12,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=482600.0, ans=0.0 2024-09-18 16:47:20,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=482600.0, ans=0.1 2024-09-18 16:47:23,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=482600.0, ans=0.2 2024-09-18 16:47:26,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=482640.0, ans=0.125 2024-09-18 16:47:59,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=482720.0, ans=0.125 2024-09-18 16:48:05,508 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.646e+01 9.161e+01 1.019e+02 2.247e+02, threshold=1.832e+02, percent-clipped=1.0 2024-09-18 16:48:16,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2024-09-18 16:48:26,847 INFO [train.py:1198] (1/2) Epoch 27, batch 3050, loss[loss=0.2287, ctc_loss=0.1164, cr_loss=0.3488, attn_decoder_loss=0.2334, over 29563.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1262, cr_loss=0.3705, attn_decoder_loss=0.246, over 5775753.16 frames. ], batch size: 76, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:48:40,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=482800.0, ans=0.05 2024-09-18 16:48:40,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.38 vs. limit=6.0 2024-09-18 16:48:45,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-18 16:49:14,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2024-09-18 16:49:44,764 INFO [train.py:1198] (1/2) Epoch 27, batch 3100, loss[loss=0.2613, ctc_loss=0.1365, cr_loss=0.3948, attn_decoder_loss=0.2664, over 29188.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1259, cr_loss=0.3697, attn_decoder_loss=0.2457, over 5775941.98 frames. ], batch size: 100, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:49:47,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=483000.0, ans=0.0 2024-09-18 16:49:51,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=483000.0, ans=0.1 2024-09-18 16:49:52,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=483000.0, ans=0.125 2024-09-18 16:50:03,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483040.0, ans=0.1 2024-09-18 16:50:09,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=483040.0, ans=0.07 2024-09-18 16:50:31,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=483120.0, ans=0.125 2024-09-18 16:50:41,515 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.612e+01 9.047e+01 9.758e+01 3.006e+02, threshold=1.809e+02, percent-clipped=2.0 2024-09-18 16:51:03,296 INFO [train.py:1198] (1/2) Epoch 27, batch 3150, loss[loss=0.2528, ctc_loss=0.1315, cr_loss=0.3715, attn_decoder_loss=0.258, over 28933.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1258, cr_loss=0.3691, attn_decoder_loss=0.2457, over 5783033.53 frames. ], batch size: 104, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:51:06,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=483200.0, ans=0.0 2024-09-18 16:51:19,047 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:51:33,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=483280.0, ans=0.0 2024-09-18 16:51:35,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-09-18 16:52:13,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=483360.0, ans=0.125 2024-09-18 16:52:18,912 INFO [train.py:1198] (1/2) Epoch 27, batch 3200, loss[loss=0.2394, ctc_loss=0.129, cr_loss=0.3739, attn_decoder_loss=0.2434, over 29406.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1249, cr_loss=0.3676, attn_decoder_loss=0.2448, over 5792672.68 frames. ], batch size: 79, lr: 4.09e-03, grad_scale: 16.0 2024-09-18 16:52:20,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=483400.0, ans=0.2 2024-09-18 16:52:36,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=483440.0, ans=0.125 2024-09-18 16:52:56,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483480.0, ans=0.1 2024-09-18 16:53:16,079 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.478e+01 8.969e+01 9.595e+01 1.807e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 16:53:17,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=483520.0, ans=0.125 2024-09-18 16:53:37,328 INFO [train.py:1198] (1/2) Epoch 27, batch 3250, loss[loss=0.2521, ctc_loss=0.1407, cr_loss=0.3994, attn_decoder_loss=0.2556, over 29717.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1253, cr_loss=0.3687, attn_decoder_loss=0.2454, over 5800067.13 frames. ], batch size: 84, lr: 4.09e-03, grad_scale: 16.0 2024-09-18 16:53:37,706 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:53:45,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=483600.0, ans=0.035 2024-09-18 16:54:03,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=483640.0, ans=0.125 2024-09-18 16:54:05,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=483680.0, ans=0.125 2024-09-18 16:54:06,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=483680.0, ans=0.125 2024-09-18 16:54:16,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=483680.0, ans=0.1 2024-09-18 16:54:27,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=483720.0, ans=0.125 2024-09-18 16:54:54,878 INFO [train.py:1198] (1/2) Epoch 27, batch 3300, loss[loss=0.2528, ctc_loss=0.133, cr_loss=0.3975, attn_decoder_loss=0.2572, over 28320.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1246, cr_loss=0.3675, attn_decoder_loss=0.2444, over 5797612.27 frames. ], batch size: 111, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:54:58,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=483800.0, ans=0.1 2024-09-18 16:55:29,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.47 vs. limit=15.0 2024-09-18 16:55:41,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=483920.0, ans=0.125 2024-09-18 16:55:46,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-09-18 16:55:48,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.05 vs. limit=15.0 2024-09-18 16:55:50,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.485e+01 9.035e+01 9.621e+01 1.592e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-18 16:55:51,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=483920.0, ans=0.2 2024-09-18 16:56:00,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.99 vs. limit=10.0 2024-09-18 16:56:05,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-18 16:56:10,661 INFO [train.py:1198] (1/2) Epoch 27, batch 3350, loss[loss=0.2531, ctc_loss=0.1339, cr_loss=0.3954, attn_decoder_loss=0.2575, over 28849.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1253, cr_loss=0.3687, attn_decoder_loss=0.2451, over 5774269.46 frames. ], batch size: 104, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:56:28,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=484040.0, ans=0.0 2024-09-18 16:56:28,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.48 vs. limit=15.0 2024-09-18 16:56:32,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484040.0, ans=0.1 2024-09-18 16:56:34,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484040.0, ans=0.1 2024-09-18 16:56:36,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-09-18 16:56:41,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2024-09-18 16:56:48,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2024-09-18 16:56:52,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=484080.0, ans=12.0 2024-09-18 16:56:57,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=484120.0, ans=0.125 2024-09-18 16:57:09,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=484120.0, ans=0.125 2024-09-18 16:57:18,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=484160.0, ans=0.025 2024-09-18 16:57:22,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484160.0, ans=0.1 2024-09-18 16:57:24,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.37 vs. limit=15.0 2024-09-18 16:57:28,520 INFO [train.py:1198] (1/2) Epoch 27, batch 3400, loss[loss=0.2157, ctc_loss=0.1073, cr_loss=0.3257, attn_decoder_loss=0.2205, over 29346.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1257, cr_loss=0.3691, attn_decoder_loss=0.2451, over 5767530.63 frames. ], batch size: 67, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:57:28,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=484200.0, ans=0.125 2024-09-18 16:57:32,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2024-09-18 16:57:42,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=484240.0, ans=0.0 2024-09-18 16:58:10,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484280.0, ans=0.1 2024-09-18 16:58:12,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=484280.0, ans=0.0 2024-09-18 16:58:19,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=484320.0, ans=0.125 2024-09-18 16:58:24,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-09-18 16:58:26,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.585e+01 9.028e+01 9.662e+01 1.590e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-18 16:58:44,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-09-18 16:58:46,440 INFO [train.py:1198] (1/2) Epoch 27, batch 3450, loss[loss=0.2446, ctc_loss=0.1253, cr_loss=0.3676, attn_decoder_loss=0.2497, over 28381.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1259, cr_loss=0.369, attn_decoder_loss=0.2455, over 5775558.26 frames. ], batch size: 111, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:59:27,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=484480.0, ans=0.2 2024-09-18 16:59:38,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484520.0, ans=0.1 2024-09-18 16:59:56,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=484560.0, ans=0.1 2024-09-18 17:00:04,105 INFO [train.py:1198] (1/2) Epoch 27, batch 3500, loss[loss=0.2178, ctc_loss=0.1106, cr_loss=0.3461, attn_decoder_loss=0.222, over 29329.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1254, cr_loss=0.3679, attn_decoder_loss=0.2449, over 5776342.14 frames. ], batch size: 71, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 17:00:59,675 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.562e+01 8.977e+01 9.669e+01 2.220e+02, threshold=1.795e+02, percent-clipped=2.0 2024-09-18 17:01:03,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=484760.0, ans=0.125 2024-09-18 17:01:19,573 INFO [train.py:1198] (1/2) Epoch 27, batch 3550, loss[loss=0.2363, ctc_loss=0.115, cr_loss=0.3448, attn_decoder_loss=0.2421, over 29721.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1253, cr_loss=0.3675, attn_decoder_loss=0.2448, over 5782457.47 frames. ], batch size: 89, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:01:19,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=484800.0, ans=0.125 2024-09-18 17:01:54,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=484880.0, ans=0.125 2024-09-18 17:02:01,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484880.0, ans=0.1 2024-09-18 17:02:33,745 INFO [train.py:1198] (1/2) Epoch 27, batch 3600, loss[loss=0.2266, ctc_loss=0.1145, cr_loss=0.3368, attn_decoder_loss=0.2316, over 29525.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1252, cr_loss=0.3671, attn_decoder_loss=0.2448, over 5791483.40 frames. ], batch size: 77, lr: 4.08e-03, grad_scale: 16.0 2024-09-18 17:02:41,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-18 17:02:44,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=485000.0, ans=0.125 2024-09-18 17:02:58,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=485040.0, ans=0.2 2024-09-18 17:03:26,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=485120.0, ans=0.0 2024-09-18 17:03:28,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=485120.0, ans=0.125 2024-09-18 17:03:30,862 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.400e+01 9.013e+01 9.523e+01 1.334e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-18 17:03:35,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=485160.0, ans=0.0 2024-09-18 17:03:48,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=485200.0, ans=0.125 2024-09-18 17:03:50,134 INFO [train.py:1198] (1/2) Epoch 27, batch 3650, loss[loss=0.2567, ctc_loss=0.1367, cr_loss=0.3907, attn_decoder_loss=0.2614, over 29491.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1244, cr_loss=0.3658, attn_decoder_loss=0.2441, over 5793614.67 frames. ], batch size: 90, lr: 4.08e-03, grad_scale: 16.0 2024-09-18 17:03:51,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=485200.0, ans=0.025 2024-09-18 17:03:53,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-09-18 17:03:56,455 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:03:57,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=485200.0, ans=0.0 2024-09-18 17:03:58,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-09-18 17:04:08,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=485240.0, ans=0.0 2024-09-18 17:04:08,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=485240.0, ans=0.2 2024-09-18 17:04:45,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=485320.0, ans=0.05 2024-09-18 17:04:47,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=485320.0, ans=0.035 2024-09-18 17:04:49,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.82 vs. limit=10.0 2024-09-18 17:04:54,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=485360.0, ans=0.125 2024-09-18 17:05:04,962 INFO [train.py:1198] (1/2) Epoch 27, batch 3700, loss[loss=0.2454, ctc_loss=0.1173, cr_loss=0.3548, attn_decoder_loss=0.2518, over 29726.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1244, cr_loss=0.3659, attn_decoder_loss=0.2442, over 5804283.70 frames. ], batch size: 84, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:05:09,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=485400.0, ans=0.125 2024-09-18 17:05:12,055 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.63 vs. limit=10.0 2024-09-18 17:05:30,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485440.0, ans=0.1 2024-09-18 17:05:57,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=485520.0, ans=0.125 2024-09-18 17:06:01,399 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.031e+01 8.546e+01 8.927e+01 9.450e+01 1.781e+02, threshold=1.785e+02, percent-clipped=0.0 2024-09-18 17:06:04,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=485560.0, ans=0.0 2024-09-18 17:06:04,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485560.0, ans=0.1 2024-09-18 17:06:12,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.99 vs. limit=22.5 2024-09-18 17:06:21,379 INFO [train.py:1198] (1/2) Epoch 27, batch 3750, loss[loss=0.2114, ctc_loss=0.1035, cr_loss=0.3175, attn_decoder_loss=0.2163, over 29401.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1243, cr_loss=0.3658, attn_decoder_loss=0.2442, over 5807937.38 frames. ], batch size: 67, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:06:21,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=485600.0, ans=0.125 2024-09-18 17:06:35,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=485640.0, ans=0.0 2024-09-18 17:06:46,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=485640.0, ans=0.125 2024-09-18 17:06:48,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=485640.0, ans=0.0 2024-09-18 17:06:58,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=485680.0, ans=0.0 2024-09-18 17:07:24,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-09-18 17:07:30,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.77 vs. limit=10.0 2024-09-18 17:07:33,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=485760.0, ans=0.0 2024-09-18 17:07:35,761 INFO [train.py:1198] (1/2) Epoch 27, batch 3800, loss[loss=0.2508, ctc_loss=0.1326, cr_loss=0.3712, attn_decoder_loss=0.2557, over 29626.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1243, cr_loss=0.3659, attn_decoder_loss=0.244, over 5798653.12 frames. ], batch size: 86, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:07:41,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2024-09-18 17:07:41,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=485800.0, ans=0.125 2024-09-18 17:07:51,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=485840.0, ans=0.125 2024-09-18 17:07:52,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-09-18 17:07:54,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.74 vs. limit=22.5 2024-09-18 17:08:07,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=485880.0, ans=0.125 2024-09-18 17:08:08,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=485880.0, ans=0.0 2024-09-18 17:08:09,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2024-09-18 17:08:15,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.27 vs. limit=15.0 2024-09-18 17:08:25,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.39 vs. limit=15.0 2024-09-18 17:08:32,466 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.550e+01 9.227e+01 9.705e+01 1.468e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-18 17:08:50,202 INFO [train.py:1198] (1/2) Epoch 27, batch 3850, loss[loss=0.2601, ctc_loss=0.1474, cr_loss=0.4142, attn_decoder_loss=0.2634, over 29234.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1246, cr_loss=0.3669, attn_decoder_loss=0.2442, over 5813055.44 frames. ], batch size: 100, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:08:50,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=486000.0, ans=0.125 2024-09-18 17:08:51,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=486000.0, ans=0.125 2024-09-18 17:09:03,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=486040.0, ans=0.0 2024-09-18 17:09:12,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=486040.0, ans=0.125 2024-09-18 17:09:13,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=12.0 2024-09-18 17:09:26,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=22.5 2024-09-18 17:09:42,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=486120.0, ans=0.0 2024-09-18 17:09:45,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=486120.0, ans=0.125 2024-09-18 17:10:06,039 INFO [train.py:1198] (1/2) Epoch 27, batch 3900, loss[loss=0.254, ctc_loss=0.149, cr_loss=0.3904, attn_decoder_loss=0.257, over 29639.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1251, cr_loss=0.3676, attn_decoder_loss=0.2447, over 5817375.15 frames. ], batch size: 86, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:10:06,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=486200.0, ans=0.125 2024-09-18 17:10:09,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=486200.0, ans=0.125 2024-09-18 17:10:24,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-09-18 17:10:29,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=22.5 2024-09-18 17:10:43,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=486280.0, ans=0.125 2024-09-18 17:10:50,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=486320.0, ans=0.125 2024-09-18 17:10:52,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=486320.0, ans=0.125 2024-09-18 17:10:59,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=486320.0, ans=0.125 2024-09-18 17:11:02,351 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.580e+01 9.073e+01 9.587e+01 1.534e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 17:11:07,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=486360.0, ans=0.1 2024-09-18 17:11:20,657 INFO [train.py:1198] (1/2) Epoch 27, batch 3950, loss[loss=0.2493, ctc_loss=0.1283, cr_loss=0.3799, attn_decoder_loss=0.2543, over 29470.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1245, cr_loss=0.367, attn_decoder_loss=0.2445, over 5836493.97 frames. ], batch size: 97, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:11:22,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486400.0, ans=0.1 2024-09-18 17:12:35,436 INFO [train.py:1198] (1/2) Epoch 27, batch 4000, loss[loss=0.2216, ctc_loss=0.1076, cr_loss=0.3257, attn_decoder_loss=0.2271, over 29498.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1243, cr_loss=0.3661, attn_decoder_loss=0.2443, over 5814672.95 frames. ], batch size: 74, lr: 4.08e-03, grad_scale: 16.0 2024-09-18 17:13:08,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=486680.0, ans=0.025 2024-09-18 17:13:20,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=486720.0, ans=0.125 2024-09-18 17:13:20,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=15.0 2024-09-18 17:13:29,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-18 17:13:33,393 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.740e+01 9.217e+01 9.696e+01 1.612e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-18 17:13:49,523 INFO [train.py:1198] (1/2) Epoch 27, batch 4050, loss[loss=0.2639, ctc_loss=0.1583, cr_loss=0.3983, attn_decoder_loss=0.2668, over 20601.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1247, cr_loss=0.3672, attn_decoder_loss=0.2443, over 5798167.30 frames. ], batch size: 209, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:14:08,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=486840.0, ans=0.125 2024-09-18 17:14:21,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=486880.0, ans=0.125 2024-09-18 17:14:32,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=486880.0, ans=0.07 2024-09-18 17:14:39,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486920.0, ans=0.125 2024-09-18 17:15:02,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-09-18 17:15:04,284 INFO [train.py:1198] (1/2) Epoch 27, batch 4100, loss[loss=0.2579, ctc_loss=0.146, cr_loss=0.4174, attn_decoder_loss=0.2611, over 29519.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1248, cr_loss=0.3671, attn_decoder_loss=0.2446, over 5793858.33 frames. ], batch size: 90, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:15:31,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487040.0, ans=0.1 2024-09-18 17:15:43,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-09-18 17:15:54,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=487120.0, ans=0.0 2024-09-18 17:15:56,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487120.0, ans=0.1 2024-09-18 17:16:03,236 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.410e+01 8.915e+01 9.592e+01 1.452e+02, threshold=1.783e+02, percent-clipped=0.0 2024-09-18 17:16:19,986 INFO [train.py:1198] (1/2) Epoch 27, batch 4150, loss[loss=0.2342, ctc_loss=0.1148, cr_loss=0.336, attn_decoder_loss=0.24, over 29489.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1245, cr_loss=0.3666, attn_decoder_loss=0.2444, over 5799344.74 frames. ], batch size: 77, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:16:27,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=487200.0, ans=0.2 2024-09-18 17:16:32,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=487200.0, ans=0.125 2024-09-18 17:16:39,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=487240.0, ans=0.0 2024-09-18 17:16:49,711 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:17:13,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=487320.0, ans=0.125 2024-09-18 17:17:20,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=487360.0, ans=10.0 2024-09-18 17:17:21,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.14 vs. limit=15.0 2024-09-18 17:17:25,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2024-09-18 17:17:33,693 INFO [train.py:1198] (1/2) Epoch 27, batch 4200, loss[loss=0.2598, ctc_loss=0.1402, cr_loss=0.3986, attn_decoder_loss=0.2643, over 29492.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1246, cr_loss=0.3669, attn_decoder_loss=0.2447, over 5800722.82 frames. ], batch size: 90, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:17:39,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=487400.0, ans=0.025 2024-09-18 17:17:47,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=487440.0, ans=0.5 2024-09-18 17:18:01,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=487480.0, ans=0.025 2024-09-18 17:18:22,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=487520.0, ans=0.2 2024-09-18 17:18:32,352 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.514e+01 8.963e+01 9.288e+01 3.975e+02, threshold=1.793e+02, percent-clipped=1.0 2024-09-18 17:18:39,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=487560.0, ans=0.0 2024-09-18 17:18:48,509 INFO [train.py:1198] (1/2) Epoch 27, batch 4250, loss[loss=0.2259, ctc_loss=0.1127, cr_loss=0.3523, attn_decoder_loss=0.2306, over 29526.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1248, cr_loss=0.3675, attn_decoder_loss=0.245, over 5806534.73 frames. ], batch size: 74, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:18:53,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=487600.0, ans=0.2 2024-09-18 17:19:07,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=487640.0, ans=10.0 2024-09-18 17:19:22,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=487680.0, ans=0.125 2024-09-18 17:19:54,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=15.0 2024-09-18 17:20:02,919 INFO [train.py:1198] (1/2) Epoch 27, batch 4300, loss[loss=0.249, ctc_loss=0.1312, cr_loss=0.3815, attn_decoder_loss=0.2536, over 29558.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1246, cr_loss=0.3671, attn_decoder_loss=0.2453, over 5795568.59 frames. ], batch size: 87, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:20:05,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2024-09-18 17:20:25,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=487840.0, ans=0.0 2024-09-18 17:20:38,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487880.0, ans=0.1 2024-09-18 17:20:40,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=487880.0, ans=0.125 2024-09-18 17:21:00,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.751e+01 9.154e+01 9.778e+01 2.419e+02, threshold=1.831e+02, percent-clipped=1.0 2024-09-18 17:21:17,484 INFO [train.py:1198] (1/2) Epoch 27, batch 4350, loss[loss=0.2564, ctc_loss=0.1407, cr_loss=0.411, attn_decoder_loss=0.2601, over 29532.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1276, cr_loss=0.3733, attn_decoder_loss=0.2484, over 5798007.27 frames. ], batch size: 97, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:21:19,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=488000.0, ans=0.1 2024-09-18 17:21:31,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=488040.0, ans=0.0 2024-09-18 17:22:03,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=488120.0, ans=0.125 2024-09-18 17:22:06,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=488120.0, ans=0.09899494936611666 2024-09-18 17:22:32,283 INFO [train.py:1198] (1/2) Epoch 27, batch 4400, loss[loss=0.2586, ctc_loss=0.1447, cr_loss=0.4201, attn_decoder_loss=0.2619, over 27306.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1288, cr_loss=0.3756, attn_decoder_loss=0.2505, over 5769497.88 frames. ], batch size: 124, lr: 4.07e-03, grad_scale: 16.0 2024-09-18 17:22:32,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=488200.0, ans=0.0 2024-09-18 17:22:37,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=488200.0, ans=0.07 2024-09-18 17:23:00,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=488280.0, ans=0.0 2024-09-18 17:23:06,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=488280.0, ans=0.0 2024-09-18 17:23:19,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=488320.0, ans=0.125 2024-09-18 17:23:24,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2024-09-18 17:23:29,593 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 8.897e+01 9.375e+01 9.833e+01 4.108e+02, threshold=1.875e+02, percent-clipped=1.0 2024-09-18 17:23:46,266 INFO [train.py:1198] (1/2) Epoch 27, batch 4450, loss[loss=0.2534, ctc_loss=0.152, cr_loss=0.3836, attn_decoder_loss=0.2562, over 19340.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1329, cr_loss=0.3813, attn_decoder_loss=0.2527, over 5574693.70 frames. ], batch size: 209, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:23:59,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488400.0, ans=0.1 2024-09-18 17:24:10,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=488440.0, ans=0.125 2024-09-18 17:24:13,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=488440.0, ans=0.0 2024-09-18 17:24:13,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=488440.0, ans=0.0 2024-09-18 17:24:24,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=488480.0, ans=0.2 2024-09-18 17:24:46,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-09-18 17:25:02,147 INFO [train.py:1198] (1/2) Epoch 27, batch 4500, loss[loss=0.2536, ctc_loss=0.1458, cr_loss=0.3727, attn_decoder_loss=0.2573, over 20008.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1367, cr_loss=0.3836, attn_decoder_loss=0.2548, over 5238559.51 frames. ], batch size: 209, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:25:10,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-09-18 17:26:25,150 INFO [train.py:1198] (1/2) Epoch 28, batch 0, loss[loss=0.2079, ctc_loss=0.0912, cr_loss=0.2905, attn_decoder_loss=0.2144, over 29629.00 frames. ], tot_loss[loss=0.2079, ctc_loss=0.0912, cr_loss=0.2905, attn_decoder_loss=0.2144, over 29629.00 frames. ], batch size: 73, lr: 3.99e-03, grad_scale: 16.0 2024-09-18 17:26:25,151 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 17:26:45,477 INFO [train.py:1230] (1/2) Epoch 28, validation: loss=0.2131, ctc_loss=0.0377, cr_loss=5.605e-15, attn_decoder_loss=0.2326, over 944034.00 frames. 2024-09-18 17:26:45,478 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 17:27:09,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 1.052e+02 1.136e+02 1.230e+02 3.342e+02, threshold=2.271e+02, percent-clipped=3.0 2024-09-18 17:27:14,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=488780.0, ans=0.0 2024-09-18 17:27:25,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=488780.0, ans=0.0 2024-09-18 17:27:36,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=488820.0, ans=0.125 2024-09-18 17:27:50,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2024-09-18 17:27:55,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=488860.0, ans=0.0 2024-09-18 17:27:59,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=15.0 2024-09-18 17:28:01,694 INFO [train.py:1198] (1/2) Epoch 28, batch 50, loss[loss=0.2212, ctc_loss=0.1178, cr_loss=0.34, attn_decoder_loss=0.2251, over 29423.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1264, cr_loss=0.3705, attn_decoder_loss=0.2462, over 1269186.12 frames. ], batch size: 70, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:28:16,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.18 vs. limit=22.5 2024-09-18 17:28:23,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=488940.0, ans=0.2 2024-09-18 17:28:24,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=488940.0, ans=0.0 2024-09-18 17:28:25,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.63 vs. limit=22.5 2024-09-18 17:28:32,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=488980.0, ans=0.125 2024-09-18 17:28:53,905 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-09-18 17:28:55,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=489020.0, ans=0.125 2024-09-18 17:28:56,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=489020.0, ans=0.2 2024-09-18 17:29:01,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=489060.0, ans=0.0 2024-09-18 17:29:02,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=489060.0, ans=0.125 2024-09-18 17:29:16,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=489100.0, ans=0.0 2024-09-18 17:29:17,677 INFO [train.py:1198] (1/2) Epoch 28, batch 100, loss[loss=0.229, ctc_loss=0.1195, cr_loss=0.3537, attn_decoder_loss=0.2334, over 29546.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1276, cr_loss=0.3724, attn_decoder_loss=0.2478, over 2253009.81 frames. ], batch size: 76, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:29:41,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.514e+01 8.987e+01 9.639e+01 1.687e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-18 17:29:45,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.63 vs. limit=15.0 2024-09-18 17:29:59,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=489180.0, ans=0.0 2024-09-18 17:30:17,671 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:30:22,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=489260.0, ans=0.2 2024-09-18 17:30:36,905 INFO [train.py:1198] (1/2) Epoch 28, batch 150, loss[loss=0.214, ctc_loss=0.1047, cr_loss=0.3399, attn_decoder_loss=0.2186, over 29429.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1259, cr_loss=0.3698, attn_decoder_loss=0.246, over 3047772.47 frames. ], batch size: 70, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:31:08,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=489380.0, ans=0.07 2024-09-18 17:31:17,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=489380.0, ans=0.125 2024-09-18 17:31:20,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-18 17:31:38,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=489460.0, ans=0.125 2024-09-18 17:31:47,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=489460.0, ans=0.07 2024-09-18 17:31:52,057 INFO [train.py:1198] (1/2) Epoch 28, batch 200, loss[loss=0.2589, ctc_loss=0.1458, cr_loss=0.3986, attn_decoder_loss=0.2626, over 27298.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1245, cr_loss=0.3669, attn_decoder_loss=0.2446, over 3660224.47 frames. ], batch size: 124, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:31:56,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=489500.0, ans=0.0 2024-09-18 17:32:01,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=489500.0, ans=0.05 2024-09-18 17:32:03,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=489500.0, ans=0.125 2024-09-18 17:32:09,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=489540.0, ans=0.125 2024-09-18 17:32:16,582 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.292e+01 9.011e+01 9.460e+01 1.346e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-18 17:32:35,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=489580.0, ans=0.2 2024-09-18 17:33:01,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=489660.0, ans=0.025 2024-09-18 17:33:01,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-09-18 17:33:08,527 INFO [train.py:1198] (1/2) Epoch 28, batch 250, loss[loss=0.2545, ctc_loss=0.1319, cr_loss=0.3922, attn_decoder_loss=0.2595, over 29229.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.125, cr_loss=0.3684, attn_decoder_loss=0.2448, over 4141771.30 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:33:09,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2024-09-18 17:33:24,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.38 vs. limit=22.5 2024-09-18 17:33:27,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-09-18 17:33:33,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=489740.0, ans=0.04949747468305833 2024-09-18 17:33:40,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=489780.0, ans=0.09899494936611666 2024-09-18 17:33:54,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=489820.0, ans=0.125 2024-09-18 17:33:55,179 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-09-18 17:34:24,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=489860.0, ans=0.2 2024-09-18 17:34:26,611 INFO [train.py:1198] (1/2) Epoch 28, batch 300, loss[loss=0.2547, ctc_loss=0.1357, cr_loss=0.3971, attn_decoder_loss=0.2591, over 29531.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1245, cr_loss=0.3677, attn_decoder_loss=0.2447, over 4510720.49 frames. ], batch size: 92, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:34:28,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=489900.0, ans=0.125 2024-09-18 17:34:29,179 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.66 vs. limit=10.0 2024-09-18 17:34:44,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=489940.0, ans=0.125 2024-09-18 17:34:45,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=489940.0, ans=0.0 2024-09-18 17:34:53,037 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.453e+01 8.832e+01 9.524e+01 1.905e+02, threshold=1.766e+02, percent-clipped=1.0 2024-09-18 17:34:55,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=489940.0, ans=0.1 2024-09-18 17:34:58,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489980.0, ans=0.1 2024-09-18 17:35:14,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=490020.0, ans=0.0 2024-09-18 17:35:25,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=490020.0, ans=0.125 2024-09-18 17:35:37,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=490060.0, ans=0.0 2024-09-18 17:35:44,880 INFO [train.py:1198] (1/2) Epoch 28, batch 350, loss[loss=0.2181, ctc_loss=0.108, cr_loss=0.325, attn_decoder_loss=0.2231, over 29312.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1249, cr_loss=0.3682, attn_decoder_loss=0.2451, over 4795275.60 frames. ], batch size: 71, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:36:12,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=490140.0, ans=0.1 2024-09-18 17:36:18,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.86 vs. limit=22.5 2024-09-18 17:36:45,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=490260.0, ans=0.05 2024-09-18 17:36:59,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=490300.0, ans=0.125 2024-09-18 17:37:00,257 INFO [train.py:1198] (1/2) Epoch 28, batch 400, loss[loss=0.2494, ctc_loss=0.1284, cr_loss=0.3819, attn_decoder_loss=0.2544, over 29686.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1244, cr_loss=0.3676, attn_decoder_loss=0.2446, over 5023762.82 frames. ], batch size: 82, lr: 3.99e-03, grad_scale: 16.0 2024-09-18 17:37:08,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=490300.0, ans=0.1 2024-09-18 17:37:26,415 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.632e+01 9.035e+01 9.717e+01 2.941e+02, threshold=1.807e+02, percent-clipped=3.0 2024-09-18 17:37:35,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.28 vs. limit=15.0 2024-09-18 17:37:39,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=490380.0, ans=0.125 2024-09-18 17:37:47,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-09-18 17:37:49,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.16 vs. limit=15.0 2024-09-18 17:37:57,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=490420.0, ans=0.1 2024-09-18 17:38:16,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=490460.0, ans=0.0 2024-09-18 17:38:19,618 INFO [train.py:1198] (1/2) Epoch 28, batch 450, loss[loss=0.2457, ctc_loss=0.1162, cr_loss=0.3501, attn_decoder_loss=0.2523, over 29696.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1245, cr_loss=0.368, attn_decoder_loss=0.2448, over 5186573.05 frames. ], batch size: 83, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:38:27,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=490500.0, ans=0.125 2024-09-18 17:38:29,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=490500.0, ans=0.125 2024-09-18 17:38:35,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.66 vs. limit=15.0 2024-09-18 17:38:45,428 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:38:46,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=490540.0, ans=0.0 2024-09-18 17:39:02,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=490580.0, ans=0.0 2024-09-18 17:39:09,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=12.0 2024-09-18 17:39:11,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=490620.0, ans=0.125 2024-09-18 17:39:32,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=490660.0, ans=0.125 2024-09-18 17:39:38,407 INFO [train.py:1198] (1/2) Epoch 28, batch 500, loss[loss=0.2533, ctc_loss=0.1368, cr_loss=0.4122, attn_decoder_loss=0.2571, over 29455.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1243, cr_loss=0.3679, attn_decoder_loss=0.2443, over 5328520.11 frames. ], batch size: 94, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:39:44,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=22.5 2024-09-18 17:39:54,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.63 vs. limit=15.0 2024-09-18 17:39:55,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=490740.0, ans=0.0 2024-09-18 17:40:04,212 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.478e+01 8.864e+01 9.440e+01 1.535e+02, threshold=1.773e+02, percent-clipped=0.0 2024-09-18 17:40:08,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=490780.0, ans=0.2 2024-09-18 17:40:23,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=12.0 2024-09-18 17:40:47,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=490860.0, ans=0.125 2024-09-18 17:40:48,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-09-18 17:40:53,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=490900.0, ans=0.125 2024-09-18 17:40:54,416 INFO [train.py:1198] (1/2) Epoch 28, batch 550, loss[loss=0.249, ctc_loss=0.1337, cr_loss=0.3969, attn_decoder_loss=0.253, over 28820.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1244, cr_loss=0.3678, attn_decoder_loss=0.2441, over 5421725.20 frames. ], batch size: 104, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:41:20,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=490940.0, ans=0.07 2024-09-18 17:41:37,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=490980.0, ans=0.125 2024-09-18 17:41:46,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-09-18 17:42:12,518 INFO [train.py:1198] (1/2) Epoch 28, batch 600, loss[loss=0.2519, ctc_loss=0.1323, cr_loss=0.3673, attn_decoder_loss=0.257, over 29150.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1246, cr_loss=0.3684, attn_decoder_loss=0.2443, over 5508713.94 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:42:24,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=491100.0, ans=0.125 2024-09-18 17:42:40,162 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.281e+01 8.877e+01 9.486e+01 1.809e+02, threshold=1.775e+02, percent-clipped=1.0 2024-09-18 17:42:40,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=491140.0, ans=0.125 2024-09-18 17:43:00,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=491220.0, ans=0.2 2024-09-18 17:43:07,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=491220.0, ans=0.0 2024-09-18 17:43:22,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=491260.0, ans=0.0 2024-09-18 17:43:29,932 INFO [train.py:1198] (1/2) Epoch 28, batch 650, loss[loss=0.2403, ctc_loss=0.1224, cr_loss=0.3806, attn_decoder_loss=0.245, over 29760.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1235, cr_loss=0.3665, attn_decoder_loss=0.2435, over 5585993.74 frames. ], batch size: 81, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:43:40,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2024-09-18 17:43:48,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=491340.0, ans=0.125 2024-09-18 17:43:48,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=491340.0, ans=0.125 2024-09-18 17:44:22,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=491420.0, ans=0.125 2024-09-18 17:44:24,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=491420.0, ans=0.0 2024-09-18 17:44:37,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=491460.0, ans=0.125 2024-09-18 17:44:43,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=491460.0, ans=10.0 2024-09-18 17:44:46,046 INFO [train.py:1198] (1/2) Epoch 28, batch 700, loss[loss=0.2304, ctc_loss=0.125, cr_loss=0.384, attn_decoder_loss=0.2336, over 29550.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1239, cr_loss=0.3673, attn_decoder_loss=0.244, over 5637340.24 frames. ], batch size: 76, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:45:01,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=491540.0, ans=0.125 2024-09-18 17:45:01,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=491540.0, ans=0.0 2024-09-18 17:45:11,730 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.262e+01 8.777e+01 9.267e+01 2.724e+02, threshold=1.755e+02, percent-clipped=1.0 2024-09-18 17:45:12,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2024-09-18 17:45:27,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=491580.0, ans=0.125 2024-09-18 17:45:27,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-18 17:45:39,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=491620.0, ans=0.125 2024-09-18 17:46:01,803 INFO [train.py:1198] (1/2) Epoch 28, batch 750, loss[loss=0.2395, ctc_loss=0.1159, cr_loss=0.3564, attn_decoder_loss=0.2454, over 29713.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1233, cr_loss=0.3658, attn_decoder_loss=0.2436, over 5676222.58 frames. ], batch size: 82, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:46:16,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=491700.0, ans=0.1 2024-09-18 17:46:23,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=491740.0, ans=0.125 2024-09-18 17:47:20,365 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:47:21,496 INFO [train.py:1198] (1/2) Epoch 28, batch 800, loss[loss=0.229, ctc_loss=0.1184, cr_loss=0.3488, attn_decoder_loss=0.2335, over 29585.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1236, cr_loss=0.3661, attn_decoder_loss=0.2436, over 5705857.23 frames. ], batch size: 73, lr: 3.98e-03, grad_scale: 16.0 2024-09-18 17:47:21,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=491900.0, ans=0.0 2024-09-18 17:47:36,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=491940.0, ans=0.0 2024-09-18 17:47:47,342 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.491e+01 9.037e+01 9.523e+01 1.873e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-18 17:47:53,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=491980.0, ans=0.025 2024-09-18 17:48:00,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491980.0, ans=0.1 2024-09-18 17:48:01,751 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:48:32,930 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:48:35,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=492100.0, ans=0.125 2024-09-18 17:48:37,061 INFO [train.py:1198] (1/2) Epoch 28, batch 850, loss[loss=0.2446, ctc_loss=0.1267, cr_loss=0.3665, attn_decoder_loss=0.2495, over 29694.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1232, cr_loss=0.3654, attn_decoder_loss=0.2432, over 5734472.33 frames. ], batch size: 89, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:48:41,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=492100.0, ans=0.2 2024-09-18 17:48:44,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=492100.0, ans=0.125 2024-09-18 17:49:01,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=492140.0, ans=0.125 2024-09-18 17:49:10,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=492180.0, ans=0.07 2024-09-18 17:49:11,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.51 vs. limit=15.0 2024-09-18 17:49:21,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=492220.0, ans=0.05 2024-09-18 17:49:40,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=492260.0, ans=0.125 2024-09-18 17:49:52,768 INFO [train.py:1198] (1/2) Epoch 28, batch 900, loss[loss=0.2253, ctc_loss=0.1086, cr_loss=0.3485, attn_decoder_loss=0.2305, over 29635.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1243, cr_loss=0.367, attn_decoder_loss=0.2439, over 5739622.64 frames. ], batch size: 73, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:49:54,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492300.0, ans=0.1 2024-09-18 17:50:04,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=492300.0, ans=0.0 2024-09-18 17:50:11,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=492340.0, ans=0.0 2024-09-18 17:50:21,990 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.505e+01 9.006e+01 9.829e+01 2.830e+02, threshold=1.801e+02, percent-clipped=3.0 2024-09-18 17:50:22,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=492340.0, ans=0.125 2024-09-18 17:50:26,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=492380.0, ans=0.0 2024-09-18 17:50:31,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492380.0, ans=0.1 2024-09-18 17:50:49,062 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:50:53,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=492420.0, ans=0.0 2024-09-18 17:51:12,925 INFO [train.py:1198] (1/2) Epoch 28, batch 950, loss[loss=0.2211, ctc_loss=0.1092, cr_loss=0.3477, attn_decoder_loss=0.2258, over 29491.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1241, cr_loss=0.3665, attn_decoder_loss=0.244, over 5742979.59 frames. ], batch size: 74, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:51:16,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492500.0, ans=0.1 2024-09-18 17:51:32,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=492540.0, ans=0.125 2024-09-18 17:51:48,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=492580.0, ans=0.125 2024-09-18 17:52:00,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=492620.0, ans=0.04949747468305833 2024-09-18 17:52:08,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2024-09-18 17:52:13,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492660.0, ans=0.1 2024-09-18 17:52:28,280 INFO [train.py:1198] (1/2) Epoch 28, batch 1000, loss[loss=0.2345, ctc_loss=0.1224, cr_loss=0.364, attn_decoder_loss=0.2388, over 29508.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1249, cr_loss=0.3678, attn_decoder_loss=0.2446, over 5736631.94 frames. ], batch size: 77, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:52:34,568 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:52:55,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.563e+01 9.173e+01 1.012e+02 1.591e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-18 17:52:59,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=492780.0, ans=0.0 2024-09-18 17:53:00,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=22.5 2024-09-18 17:53:13,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=492820.0, ans=0.125 2024-09-18 17:53:46,490 INFO [train.py:1198] (1/2) Epoch 28, batch 1050, loss[loss=0.2415, ctc_loss=0.1213, cr_loss=0.3617, attn_decoder_loss=0.2468, over 29673.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1247, cr_loss=0.3674, attn_decoder_loss=0.2443, over 5742484.27 frames. ], batch size: 85, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:54:09,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=492940.0, ans=0.125 2024-09-18 17:54:11,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=492940.0, ans=0.125 2024-09-18 17:54:18,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=492980.0, ans=0.125 2024-09-18 17:54:34,719 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:54:35,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.27 vs. limit=10.0 2024-09-18 17:54:43,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493020.0, ans=0.1 2024-09-18 17:54:53,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.20 vs. limit=22.5 2024-09-18 17:54:57,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=493060.0, ans=0.125 2024-09-18 17:55:01,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=493060.0, ans=0.0 2024-09-18 17:55:01,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=493060.0, ans=0.0 2024-09-18 17:55:04,302 INFO [train.py:1198] (1/2) Epoch 28, batch 1100, loss[loss=0.2447, ctc_loss=0.1299, cr_loss=0.3748, attn_decoder_loss=0.2492, over 29467.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1244, cr_loss=0.3668, attn_decoder_loss=0.244, over 5755702.29 frames. ], batch size: 78, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:55:12,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2024-09-18 17:55:31,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.310e+01 8.930e+01 9.558e+01 2.939e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-18 17:55:35,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=493180.0, ans=0.0 2024-09-18 17:55:51,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493220.0, ans=0.125 2024-09-18 17:55:56,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=493220.0, ans=0.2 2024-09-18 17:56:04,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=493260.0, ans=0.0 2024-09-18 17:56:10,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=493260.0, ans=0.2 2024-09-18 17:56:20,581 INFO [train.py:1198] (1/2) Epoch 28, batch 1150, loss[loss=0.2304, ctc_loss=0.1152, cr_loss=0.3329, attn_decoder_loss=0.2358, over 29466.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1243, cr_loss=0.3667, attn_decoder_loss=0.2441, over 5754142.05 frames. ], batch size: 78, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:56:22,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=493300.0, ans=0.0 2024-09-18 17:56:48,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.93 vs. limit=22.5 2024-09-18 17:56:59,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2024-09-18 17:57:13,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.51 vs. limit=15.0 2024-09-18 17:57:23,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.99 vs. limit=15.0 2024-09-18 17:57:36,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=12.0 2024-09-18 17:57:38,535 INFO [train.py:1198] (1/2) Epoch 28, batch 1200, loss[loss=0.2468, ctc_loss=0.1262, cr_loss=0.377, attn_decoder_loss=0.2518, over 29699.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1248, cr_loss=0.3676, attn_decoder_loss=0.2449, over 5747663.92 frames. ], batch size: 85, lr: 3.97e-03, grad_scale: 16.0 2024-09-18 17:57:39,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=12.0 2024-09-18 17:57:41,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=493500.0, ans=10.0 2024-09-18 17:57:46,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=493500.0, ans=0.0 2024-09-18 17:58:07,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.554e+01 9.030e+01 9.625e+01 2.213e+02, threshold=1.806e+02, percent-clipped=2.0 2024-09-18 17:58:13,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=493580.0, ans=0.125 2024-09-18 17:58:15,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=493580.0, ans=0.0 2024-09-18 17:58:28,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=493620.0, ans=0.0 2024-09-18 17:58:28,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=493620.0, ans=0.125 2024-09-18 17:58:31,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=493620.0, ans=0.0 2024-09-18 17:58:46,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=493660.0, ans=0.0 2024-09-18 17:58:52,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=22.5 2024-09-18 17:58:56,907 INFO [train.py:1198] (1/2) Epoch 28, batch 1250, loss[loss=0.2634, ctc_loss=0.157, cr_loss=0.4291, attn_decoder_loss=0.2657, over 29522.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1251, cr_loss=0.3682, attn_decoder_loss=0.2453, over 5775671.88 frames. ], batch size: 92, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 17:58:57,227 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:59:00,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=493700.0, ans=0.0 2024-09-18 17:59:36,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=493780.0, ans=0.125 2024-09-18 17:59:45,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493820.0, ans=0.1 2024-09-18 17:59:49,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=493820.0, ans=0.2 2024-09-18 17:59:50,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2024-09-18 17:59:52,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2024-09-18 17:59:55,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=493820.0, ans=0.0 2024-09-18 18:00:01,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=493860.0, ans=0.07 2024-09-18 18:00:13,086 INFO [train.py:1198] (1/2) Epoch 28, batch 1300, loss[loss=0.2464, ctc_loss=0.1224, cr_loss=0.3698, attn_decoder_loss=0.252, over 28233.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1246, cr_loss=0.3668, attn_decoder_loss=0.2446, over 5779068.15 frames. ], batch size: 111, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:00:18,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=493900.0, ans=0.1 2024-09-18 18:00:41,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-09-18 18:00:41,987 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.590e+01 9.154e+01 9.575e+01 1.829e+02, threshold=1.831e+02, percent-clipped=1.0 2024-09-18 18:00:43,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=493980.0, ans=0.1 2024-09-18 18:00:51,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=493980.0, ans=0.05 2024-09-18 18:01:03,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=494020.0, ans=0.07 2024-09-18 18:01:04,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=494020.0, ans=0.0 2024-09-18 18:01:20,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=494060.0, ans=0.5 2024-09-18 18:01:23,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=494060.0, ans=0.1 2024-09-18 18:01:26,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=494060.0, ans=0.125 2024-09-18 18:01:29,055 INFO [train.py:1198] (1/2) Epoch 28, batch 1350, loss[loss=0.2369, ctc_loss=0.1155, cr_loss=0.3503, attn_decoder_loss=0.2426, over 29764.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1238, cr_loss=0.3652, attn_decoder_loss=0.244, over 5797705.98 frames. ], batch size: 81, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:01:49,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=494140.0, ans=0.125 2024-09-18 18:01:51,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=494140.0, ans=0.1 2024-09-18 18:01:54,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=494140.0, ans=0.025 2024-09-18 18:02:08,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=494180.0, ans=0.125 2024-09-18 18:02:10,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=494180.0, ans=0.125 2024-09-18 18:02:16,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=494220.0, ans=0.025 2024-09-18 18:02:22,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=494220.0, ans=0.125 2024-09-18 18:02:22,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=494220.0, ans=0.125 2024-09-18 18:02:24,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=494220.0, ans=0.0 2024-09-18 18:02:25,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=494220.0, ans=0.1 2024-09-18 18:02:29,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=494260.0, ans=0.125 2024-09-18 18:02:39,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=494260.0, ans=0.125 2024-09-18 18:02:48,564 INFO [train.py:1198] (1/2) Epoch 28, batch 1400, loss[loss=0.2165, ctc_loss=0.1074, cr_loss=0.3405, attn_decoder_loss=0.2211, over 29575.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1233, cr_loss=0.3644, attn_decoder_loss=0.2438, over 5808779.94 frames. ], batch size: 69, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:03:02,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=494340.0, ans=0.125 2024-09-18 18:03:17,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.548e+01 9.065e+01 9.786e+01 1.272e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-18 18:03:39,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=494420.0, ans=0.1 2024-09-18 18:04:04,980 INFO [train.py:1198] (1/2) Epoch 28, batch 1450, loss[loss=0.2458, ctc_loss=0.1251, cr_loss=0.3866, attn_decoder_loss=0.2506, over 29451.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1236, cr_loss=0.365, attn_decoder_loss=0.2445, over 5805400.88 frames. ], batch size: 94, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:04:22,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=494540.0, ans=0.025 2024-09-18 18:04:31,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=494540.0, ans=0.125 2024-09-18 18:04:35,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=494580.0, ans=0.0 2024-09-18 18:05:20,934 INFO [train.py:1198] (1/2) Epoch 28, batch 1500, loss[loss=0.2466, ctc_loss=0.1211, cr_loss=0.3791, attn_decoder_loss=0.2521, over 29635.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1239, cr_loss=0.3661, attn_decoder_loss=0.2449, over 5806076.65 frames. ], batch size: 86, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:05:28,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=494700.0, ans=0.0 2024-09-18 18:05:31,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=494700.0, ans=0.2 2024-09-18 18:05:35,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=494700.0, ans=0.125 2024-09-18 18:05:45,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.34 vs. limit=15.0 2024-09-18 18:05:50,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.08 vs. limit=15.0 2024-09-18 18:05:52,388 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.636e+01 9.142e+01 9.701e+01 7.436e+02, threshold=1.828e+02, percent-clipped=2.0 2024-09-18 18:06:12,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=494820.0, ans=0.2 2024-09-18 18:06:13,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=494820.0, ans=15.0 2024-09-18 18:06:26,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=494860.0, ans=0.125 2024-09-18 18:06:31,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=494860.0, ans=0.0 2024-09-18 18:06:41,531 INFO [train.py:1198] (1/2) Epoch 28, batch 1550, loss[loss=0.2544, ctc_loss=0.1359, cr_loss=0.3965, attn_decoder_loss=0.2588, over 29498.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1241, cr_loss=0.3664, attn_decoder_loss=0.245, over 5783306.24 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:06:45,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=494900.0, ans=0.125 2024-09-18 18:06:47,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.09 vs. limit=15.0 2024-09-18 18:07:04,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=494940.0, ans=0.0 2024-09-18 18:07:45,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=495060.0, ans=0.125 2024-09-18 18:07:57,364 INFO [train.py:1198] (1/2) Epoch 28, batch 1600, loss[loss=0.2451, ctc_loss=0.1248, cr_loss=0.3584, attn_decoder_loss=0.2505, over 29696.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1245, cr_loss=0.3672, attn_decoder_loss=0.2452, over 5766661.01 frames. ], batch size: 85, lr: 3.97e-03, grad_scale: 16.0 2024-09-18 18:08:18,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=495140.0, ans=0.1 2024-09-18 18:08:27,526 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.529e+01 9.034e+01 9.836e+01 1.943e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-18 18:08:43,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=495220.0, ans=0.1 2024-09-18 18:09:05,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=495260.0, ans=0.0 2024-09-18 18:09:09,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=495260.0, ans=0.05 2024-09-18 18:09:12,765 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:09:14,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=495300.0, ans=0.125 2024-09-18 18:09:14,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=495300.0, ans=0.07 2024-09-18 18:09:15,393 INFO [train.py:1198] (1/2) Epoch 28, batch 1650, loss[loss=0.2527, ctc_loss=0.1338, cr_loss=0.3903, attn_decoder_loss=0.2573, over 29720.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1244, cr_loss=0.3668, attn_decoder_loss=0.2449, over 5759866.33 frames. ], batch size: 89, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:09:18,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=495300.0, ans=0.025 2024-09-18 18:09:29,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=495340.0, ans=0.035 2024-09-18 18:09:32,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=495340.0, ans=0.125 2024-09-18 18:09:32,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=495340.0, ans=0.125 2024-09-18 18:09:41,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=495340.0, ans=0.0 2024-09-18 18:10:12,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=495420.0, ans=0.2 2024-09-18 18:10:33,326 INFO [train.py:1198] (1/2) Epoch 28, batch 1700, loss[loss=0.2078, ctc_loss=0.09849, cr_loss=0.3024, attn_decoder_loss=0.2132, over 29580.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1239, cr_loss=0.3661, attn_decoder_loss=0.2444, over 5781027.76 frames. ], batch size: 69, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:10:47,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=495540.0, ans=0.125 2024-09-18 18:10:52,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.56 vs. limit=15.0 2024-09-18 18:11:03,300 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.597e+01 9.283e+01 9.916e+01 1.626e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-18 18:11:11,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=495580.0, ans=0.015 2024-09-18 18:11:14,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=495580.0, ans=0.0 2024-09-18 18:11:19,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=495620.0, ans=15.0 2024-09-18 18:11:31,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=495620.0, ans=0.5 2024-09-18 18:11:40,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=15.0 2024-09-18 18:11:45,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495660.0, ans=0.1 2024-09-18 18:11:49,159 INFO [train.py:1198] (1/2) Epoch 28, batch 1750, loss[loss=0.2178, ctc_loss=0.1093, cr_loss=0.3272, attn_decoder_loss=0.2226, over 29324.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1236, cr_loss=0.3658, attn_decoder_loss=0.244, over 5788304.45 frames. ], batch size: 67, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:11:57,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=495700.0, ans=0.0 2024-09-18 18:12:09,475 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:12:21,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=495780.0, ans=0.025 2024-09-18 18:12:43,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-09-18 18:12:45,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=495820.0, ans=0.025 2024-09-18 18:12:50,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=495860.0, ans=0.125 2024-09-18 18:12:53,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=495860.0, ans=0.125 2024-09-18 18:13:07,141 INFO [train.py:1198] (1/2) Epoch 28, batch 1800, loss[loss=0.2519, ctc_loss=0.137, cr_loss=0.389, attn_decoder_loss=0.256, over 29696.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1237, cr_loss=0.366, attn_decoder_loss=0.2441, over 5792081.88 frames. ], batch size: 83, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:13:10,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495900.0, ans=0.1 2024-09-18 18:13:37,655 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.359e+01 8.858e+01 9.396e+01 1.273e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-18 18:13:39,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495980.0, ans=0.1 2024-09-18 18:14:11,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=496020.0, ans=0.0 2024-09-18 18:14:15,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=496060.0, ans=0.0 2024-09-18 18:14:23,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=496060.0, ans=0.0 2024-09-18 18:14:30,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-18 18:14:32,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=496100.0, ans=15.0 2024-09-18 18:14:32,844 INFO [train.py:1198] (1/2) Epoch 28, batch 1850, loss[loss=0.2448, ctc_loss=0.1247, cr_loss=0.364, attn_decoder_loss=0.2501, over 29616.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1237, cr_loss=0.366, attn_decoder_loss=0.244, over 5796005.54 frames. ], batch size: 86, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:14:52,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=496140.0, ans=0.125 2024-09-18 18:14:56,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496140.0, ans=0.1 2024-09-18 18:14:57,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496140.0, ans=0.1 2024-09-18 18:15:09,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=496180.0, ans=0.125 2024-09-18 18:15:47,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.16 vs. limit=22.5 2024-09-18 18:15:48,300 INFO [train.py:1198] (1/2) Epoch 28, batch 1900, loss[loss=0.2454, ctc_loss=0.1199, cr_loss=0.362, attn_decoder_loss=0.2512, over 29714.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1238, cr_loss=0.3667, attn_decoder_loss=0.2444, over 5803831.89 frames. ], batch size: 89, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:15:57,677 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:16:08,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-18 18:16:10,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.24 vs. limit=15.0 2024-09-18 18:16:18,826 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.544e+01 9.072e+01 9.391e+01 1.587e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-18 18:16:23,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=496380.0, ans=0.125 2024-09-18 18:16:36,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2024-09-18 18:16:49,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=496460.0, ans=0.125 2024-09-18 18:17:06,240 INFO [train.py:1198] (1/2) Epoch 28, batch 1950, loss[loss=0.239, ctc_loss=0.1264, cr_loss=0.3774, attn_decoder_loss=0.2431, over 29428.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1243, cr_loss=0.3674, attn_decoder_loss=0.2454, over 5818419.89 frames. ], batch size: 78, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:17:06,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=496500.0, ans=0.125 2024-09-18 18:17:34,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-09-18 18:17:47,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=496580.0, ans=0.1 2024-09-18 18:17:48,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=496580.0, ans=0.125 2024-09-18 18:18:03,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=496620.0, ans=0.125 2024-09-18 18:18:06,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496660.0, ans=0.0 2024-09-18 18:18:24,149 INFO [train.py:1198] (1/2) Epoch 28, batch 2000, loss[loss=0.2111, ctc_loss=0.09839, cr_loss=0.3154, attn_decoder_loss=0.2166, over 29319.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1248, cr_loss=0.3683, attn_decoder_loss=0.2458, over 5795025.13 frames. ], batch size: 67, lr: 3.96e-03, grad_scale: 16.0 2024-09-18 18:18:51,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=496740.0, ans=0.0 2024-09-18 18:18:55,933 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.591e+01 9.006e+01 9.471e+01 1.475e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-18 18:18:57,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=496780.0, ans=0.125 2024-09-18 18:19:04,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496780.0, ans=0.1 2024-09-18 18:19:08,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=496820.0, ans=0.125 2024-09-18 18:19:13,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=22.5 2024-09-18 18:19:14,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=496820.0, ans=0.0 2024-09-18 18:19:23,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=496860.0, ans=0.125 2024-09-18 18:19:35,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=496860.0, ans=0.1 2024-09-18 18:19:39,996 INFO [train.py:1198] (1/2) Epoch 28, batch 2050, loss[loss=0.2179, ctc_loss=0.1112, cr_loss=0.3489, attn_decoder_loss=0.222, over 29413.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1243, cr_loss=0.3671, attn_decoder_loss=0.2448, over 5787032.22 frames. ], batch size: 70, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:19:40,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496900.0, ans=0.1 2024-09-18 18:19:44,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.36 vs. limit=22.5 2024-09-18 18:19:55,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-09-18 18:19:56,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=496940.0, ans=0.125 2024-09-18 18:20:01,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496940.0, ans=0.0 2024-09-18 18:20:12,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=496980.0, ans=0.2 2024-09-18 18:20:36,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=497020.0, ans=0.125 2024-09-18 18:20:43,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.75 vs. limit=12.0 2024-09-18 18:20:44,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=497060.0, ans=0.0 2024-09-18 18:20:58,295 INFO [train.py:1198] (1/2) Epoch 28, batch 2100, loss[loss=0.2352, ctc_loss=0.1153, cr_loss=0.3585, attn_decoder_loss=0.2406, over 29777.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1235, cr_loss=0.366, attn_decoder_loss=0.2443, over 5799557.41 frames. ], batch size: 81, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:21:20,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-18 18:21:26,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.36 vs. limit=15.0 2024-09-18 18:21:29,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.428e+01 8.818e+01 9.232e+01 1.075e+02, threshold=1.764e+02, percent-clipped=0.0 2024-09-18 18:21:39,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=497180.0, ans=0.0 2024-09-18 18:21:45,094 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:21:48,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=497220.0, ans=0.125 2024-09-18 18:21:58,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=497260.0, ans=0.2 2024-09-18 18:21:58,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=497260.0, ans=10.0 2024-09-18 18:22:01,907 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:22:13,507 INFO [train.py:1198] (1/2) Epoch 28, batch 2150, loss[loss=0.2338, ctc_loss=0.1287, cr_loss=0.3727, attn_decoder_loss=0.2372, over 29454.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1234, cr_loss=0.366, attn_decoder_loss=0.2438, over 5815110.88 frames. ], batch size: 78, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:22:15,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=497300.0, ans=0.125 2024-09-18 18:22:23,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=497300.0, ans=0.125 2024-09-18 18:22:33,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.07 vs. limit=22.5 2024-09-18 18:22:55,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=497380.0, ans=0.0 2024-09-18 18:23:31,648 INFO [train.py:1198] (1/2) Epoch 28, batch 2200, loss[loss=0.2452, ctc_loss=0.1246, cr_loss=0.3676, attn_decoder_loss=0.2504, over 29630.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1239, cr_loss=0.3668, attn_decoder_loss=0.2441, over 5812322.45 frames. ], batch size: 86, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:23:39,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=497500.0, ans=0.125 2024-09-18 18:24:03,362 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.572e+01 8.974e+01 9.491e+01 1.804e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-18 18:24:03,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=497580.0, ans=0.125 2024-09-18 18:24:17,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=497620.0, ans=0.1 2024-09-18 18:24:28,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=497620.0, ans=0.0 2024-09-18 18:24:43,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=497660.0, ans=0.125 2024-09-18 18:24:47,970 INFO [train.py:1198] (1/2) Epoch 28, batch 2250, loss[loss=0.238, ctc_loss=0.119, cr_loss=0.3683, attn_decoder_loss=0.243, over 29689.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1238, cr_loss=0.3668, attn_decoder_loss=0.2441, over 5811907.94 frames. ], batch size: 82, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:24:52,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=497700.0, ans=0.125 2024-09-18 18:24:53,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=497700.0, ans=0.0 2024-09-18 18:25:11,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-18 18:25:26,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=497780.0, ans=0.025 2024-09-18 18:25:31,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.68 vs. limit=6.0 2024-09-18 18:25:52,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=497860.0, ans=0.125 2024-09-18 18:26:04,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=497900.0, ans=0.0 2024-09-18 18:26:04,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=497900.0, ans=0.125 2024-09-18 18:26:05,876 INFO [train.py:1198] (1/2) Epoch 28, batch 2300, loss[loss=0.2246, ctc_loss=0.1168, cr_loss=0.3539, attn_decoder_loss=0.2288, over 29367.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.123, cr_loss=0.3651, attn_decoder_loss=0.2431, over 5797830.95 frames. ], batch size: 71, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:26:30,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=497940.0, ans=0.125 2024-09-18 18:26:30,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=497940.0, ans=0.125 2024-09-18 18:26:39,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.383e+01 8.665e+01 9.441e+01 6.698e+02, threshold=1.733e+02, percent-clipped=3.0 2024-09-18 18:26:41,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=497980.0, ans=0.2 2024-09-18 18:26:47,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=497980.0, ans=0.2 2024-09-18 18:26:51,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=497980.0, ans=15.0 2024-09-18 18:26:59,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=498020.0, ans=0.09899494936611666 2024-09-18 18:27:23,866 INFO [train.py:1198] (1/2) Epoch 28, batch 2350, loss[loss=0.2505, ctc_loss=0.1286, cr_loss=0.3686, attn_decoder_loss=0.2558, over 29689.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1236, cr_loss=0.3663, attn_decoder_loss=0.2437, over 5803769.13 frames. ], batch size: 83, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:27:37,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=498140.0, ans=10.0 2024-09-18 18:28:15,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=498220.0, ans=0.025 2024-09-18 18:28:17,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=498220.0, ans=0.125 2024-09-18 18:28:27,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498260.0, ans=0.1 2024-09-18 18:28:39,737 INFO [train.py:1198] (1/2) Epoch 28, batch 2400, loss[loss=0.2398, ctc_loss=0.1326, cr_loss=0.4071, attn_decoder_loss=0.2426, over 29549.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1241, cr_loss=0.3673, attn_decoder_loss=0.2439, over 5808055.09 frames. ], batch size: 76, lr: 3.96e-03, grad_scale: 16.0 2024-09-18 18:28:46,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=498300.0, ans=0.0 2024-09-18 18:28:48,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=498300.0, ans=0.0 2024-09-18 18:28:51,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-18 18:28:55,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=498340.0, ans=0.0 2024-09-18 18:29:08,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=498340.0, ans=0.125 2024-09-18 18:29:15,195 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.714e+01 9.180e+01 9.673e+01 2.821e+02, threshold=1.836e+02, percent-clipped=1.0 2024-09-18 18:29:23,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=498380.0, ans=0.125 2024-09-18 18:29:40,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2024-09-18 18:29:47,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498460.0, ans=0.1 2024-09-18 18:29:57,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=498500.0, ans=0.0 2024-09-18 18:29:58,168 INFO [train.py:1198] (1/2) Epoch 28, batch 2450, loss[loss=0.2342, ctc_loss=0.1121, cr_loss=0.3482, attn_decoder_loss=0.24, over 29690.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1247, cr_loss=0.3684, attn_decoder_loss=0.245, over 5785716.44 frames. ], batch size: 82, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:30:10,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.41 vs. limit=10.0 2024-09-18 18:30:11,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=498540.0, ans=0.025 2024-09-18 18:30:36,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=498580.0, ans=0.0 2024-09-18 18:30:50,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=498620.0, ans=0.0 2024-09-18 18:31:09,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=498660.0, ans=0.05 2024-09-18 18:31:13,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=498660.0, ans=0.2 2024-09-18 18:31:13,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=498660.0, ans=0.125 2024-09-18 18:31:16,314 INFO [train.py:1198] (1/2) Epoch 28, batch 2500, loss[loss=0.2499, ctc_loss=0.1318, cr_loss=0.3893, attn_decoder_loss=0.2544, over 29648.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.125, cr_loss=0.3693, attn_decoder_loss=0.2454, over 5795987.86 frames. ], batch size: 86, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:31:22,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=498700.0, ans=0.0 2024-09-18 18:31:33,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.64 vs. limit=15.0 2024-09-18 18:31:40,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=498740.0, ans=0.125 2024-09-18 18:31:49,773 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.525e+01 9.051e+01 9.521e+01 3.075e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-18 18:31:51,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=498780.0, ans=0.125 2024-09-18 18:31:57,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=498780.0, ans=0.125 2024-09-18 18:32:05,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=498820.0, ans=0.0 2024-09-18 18:32:05,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=15.0 2024-09-18 18:32:09,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=498820.0, ans=0.0 2024-09-18 18:32:11,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=498820.0, ans=0.09899494936611666 2024-09-18 18:32:27,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2024-09-18 18:32:32,442 INFO [train.py:1198] (1/2) Epoch 28, batch 2550, loss[loss=0.2229, ctc_loss=0.1147, cr_loss=0.3584, attn_decoder_loss=0.227, over 29361.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1245, cr_loss=0.3686, attn_decoder_loss=0.2453, over 5799672.95 frames. ], batch size: 67, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:32:46,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=498900.0, ans=15.0 2024-09-18 18:32:48,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=498940.0, ans=0.1 2024-09-18 18:32:49,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-09-18 18:32:54,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=498940.0, ans=0.1 2024-09-18 18:33:01,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.32 vs. limit=15.0 2024-09-18 18:33:06,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=498980.0, ans=0.125 2024-09-18 18:33:12,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498980.0, ans=0.1 2024-09-18 18:33:33,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2024-09-18 18:33:35,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=499060.0, ans=0.125 2024-09-18 18:33:40,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=499060.0, ans=0.125 2024-09-18 18:33:50,491 INFO [train.py:1198] (1/2) Epoch 28, batch 2600, loss[loss=0.2341, ctc_loss=0.117, cr_loss=0.3657, attn_decoder_loss=0.239, over 29460.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1248, cr_loss=0.3695, attn_decoder_loss=0.2456, over 5796640.35 frames. ], batch size: 78, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:34:05,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=499140.0, ans=0.025 2024-09-18 18:34:14,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=499140.0, ans=0.125 2024-09-18 18:34:25,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.719e+01 9.111e+01 9.618e+01 2.208e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 18:35:01,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=499260.0, ans=12.0 2024-09-18 18:35:03,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499260.0, ans=0.1 2024-09-18 18:35:07,774 INFO [train.py:1198] (1/2) Epoch 28, batch 2650, loss[loss=0.2497, ctc_loss=0.1302, cr_loss=0.3948, attn_decoder_loss=0.2542, over 29221.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1246, cr_loss=0.3691, attn_decoder_loss=0.2454, over 5802825.22 frames. ], batch size: 100, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:35:09,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-09-18 18:35:18,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499300.0, ans=0.1 2024-09-18 18:35:45,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=499380.0, ans=0.5 2024-09-18 18:35:57,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.20 vs. limit=5.0 2024-09-18 18:36:11,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=499460.0, ans=0.125 2024-09-18 18:36:20,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=499460.0, ans=0.95 2024-09-18 18:36:25,506 INFO [train.py:1198] (1/2) Epoch 28, batch 2700, loss[loss=0.2433, ctc_loss=0.1248, cr_loss=0.378, attn_decoder_loss=0.2481, over 29532.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1249, cr_loss=0.3688, attn_decoder_loss=0.2457, over 5797877.95 frames. ], batch size: 87, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:36:28,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2024-09-18 18:36:33,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=499500.0, ans=0.5 2024-09-18 18:36:35,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=499500.0, ans=0.0 2024-09-18 18:36:58,799 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.414e+01 8.942e+01 9.601e+01 1.842e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 18:37:07,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=499580.0, ans=0.125 2024-09-18 18:37:08,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2024-09-18 18:37:30,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=499660.0, ans=0.125 2024-09-18 18:37:34,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=499660.0, ans=12.0 2024-09-18 18:37:38,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=499660.0, ans=0.0 2024-09-18 18:37:41,523 INFO [train.py:1198] (1/2) Epoch 28, batch 2750, loss[loss=0.2243, ctc_loss=0.1129, cr_loss=0.3571, attn_decoder_loss=0.2288, over 29511.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1238, cr_loss=0.3665, attn_decoder_loss=0.2443, over 5795126.33 frames. ], batch size: 75, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:37:53,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=499700.0, ans=0.025 2024-09-18 18:37:56,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=499740.0, ans=0.125 2024-09-18 18:38:09,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=499740.0, ans=0.0 2024-09-18 18:38:21,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=499780.0, ans=0.125 2024-09-18 18:38:27,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=499820.0, ans=0.0 2024-09-18 18:38:27,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499820.0, ans=0.1 2024-09-18 18:38:38,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=499820.0, ans=0.0 2024-09-18 18:38:55,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=499860.0, ans=0.2 2024-09-18 18:38:59,698 INFO [train.py:1198] (1/2) Epoch 28, batch 2800, loss[loss=0.2594, ctc_loss=0.1523, cr_loss=0.3722, attn_decoder_loss=0.263, over 20837.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1242, cr_loss=0.367, attn_decoder_loss=0.2446, over 5778017.09 frames. ], batch size: 210, lr: 3.95e-03, grad_scale: 16.0 2024-09-18 18:39:13,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=499940.0, ans=0.1 2024-09-18 18:39:22,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=499940.0, ans=0.04949747468305833 2024-09-18 18:39:34,535 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.662e+01 9.200e+01 9.823e+01 1.916e+02, threshold=1.840e+02, percent-clipped=1.0 2024-09-18 18:39:38,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=499980.0, ans=0.125 2024-09-18 18:39:47,630 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:39:56,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=500020.0, ans=0.0 2024-09-18 18:40:18,057 INFO [train.py:1198] (1/2) Epoch 28, batch 2850, loss[loss=0.2362, ctc_loss=0.1221, cr_loss=0.3577, attn_decoder_loss=0.2409, over 29509.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1247, cr_loss=0.3683, attn_decoder_loss=0.245, over 5762991.16 frames. ], batch size: 77, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:40:19,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=500100.0, ans=0.0 2024-09-18 18:40:19,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=500100.0, ans=0.0 2024-09-18 18:40:25,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2024-09-18 18:40:34,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=500140.0, ans=0.0 2024-09-18 18:40:39,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=500140.0, ans=0.125 2024-09-18 18:40:41,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=500140.0, ans=0.0 2024-09-18 18:40:51,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=500180.0, ans=0.125 2024-09-18 18:40:57,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500180.0, ans=0.1 2024-09-18 18:41:19,302 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:41:25,280 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:41:31,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=500260.0, ans=0.125 2024-09-18 18:41:34,004 INFO [train.py:1198] (1/2) Epoch 28, batch 2900, loss[loss=0.2276, ctc_loss=0.1064, cr_loss=0.3227, attn_decoder_loss=0.2338, over 29410.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.125, cr_loss=0.369, attn_decoder_loss=0.2458, over 5788043.91 frames. ], batch size: 79, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:41:50,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=500340.0, ans=0.0 2024-09-18 18:42:10,949 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.571e+01 8.982e+01 9.611e+01 1.691e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-18 18:42:34,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.46 vs. limit=22.5 2024-09-18 18:42:43,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=500460.0, ans=0.0 2024-09-18 18:42:51,877 INFO [train.py:1198] (1/2) Epoch 28, batch 2950, loss[loss=0.2283, ctc_loss=0.1149, cr_loss=0.3628, attn_decoder_loss=0.2329, over 29521.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1241, cr_loss=0.3669, attn_decoder_loss=0.2442, over 5783268.06 frames. ], batch size: 75, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:43:09,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.60 vs. limit=10.0 2024-09-18 18:43:18,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=500540.0, ans=0.025 2024-09-18 18:43:34,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=500580.0, ans=0.125 2024-09-18 18:43:39,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=500620.0, ans=0.07 2024-09-18 18:44:01,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=12.0 2024-09-18 18:44:02,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=500660.0, ans=0.125 2024-09-18 18:44:10,170 INFO [train.py:1198] (1/2) Epoch 28, batch 3000, loss[loss=0.2362, ctc_loss=0.1179, cr_loss=0.3651, attn_decoder_loss=0.2413, over 29766.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1245, cr_loss=0.3675, attn_decoder_loss=0.2445, over 5783513.05 frames. ], batch size: 81, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:44:10,171 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 18:44:28,709 INFO [train.py:1230] (1/2) Epoch 28, validation: loss=0.2115, ctc_loss=0.03821, cr_loss=5.852e-15, attn_decoder_loss=0.2307, over 944034.00 frames. 2024-09-18 18:44:28,709 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 18:44:32,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=500700.0, ans=15.0 2024-09-18 18:44:35,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=500700.0, ans=0.0 2024-09-18 18:44:55,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2024-09-18 18:44:56,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=500740.0, ans=0.0 2024-09-18 18:45:03,652 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.580e+01 9.034e+01 9.618e+01 2.130e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-18 18:45:03,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=500780.0, ans=0.2 2024-09-18 18:45:05,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=500780.0, ans=0.035 2024-09-18 18:45:05,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=500780.0, ans=0.125 2024-09-18 18:45:13,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=500820.0, ans=0.125 2024-09-18 18:45:37,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=500860.0, ans=0.125 2024-09-18 18:45:45,085 INFO [train.py:1198] (1/2) Epoch 28, batch 3050, loss[loss=0.2278, ctc_loss=0.12, cr_loss=0.3598, attn_decoder_loss=0.2318, over 29515.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1251, cr_loss=0.3686, attn_decoder_loss=0.2452, over 5776117.17 frames. ], batch size: 76, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:45:50,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=500900.0, ans=0.125 2024-09-18 18:46:01,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500940.0, ans=0.1 2024-09-18 18:46:05,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=500940.0, ans=0.02 2024-09-18 18:46:22,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=500980.0, ans=0.0 2024-09-18 18:46:34,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=501020.0, ans=0.125 2024-09-18 18:46:41,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-18 18:46:48,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=501060.0, ans=0.125 2024-09-18 18:46:49,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=501060.0, ans=0.125 2024-09-18 18:46:52,976 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:47:02,937 INFO [train.py:1198] (1/2) Epoch 28, batch 3100, loss[loss=0.2504, ctc_loss=0.1322, cr_loss=0.3974, attn_decoder_loss=0.2547, over 29265.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1249, cr_loss=0.368, attn_decoder_loss=0.2449, over 5776181.88 frames. ], batch size: 100, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:47:03,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=501100.0, ans=0.2 2024-09-18 18:47:19,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=501140.0, ans=0.125 2024-09-18 18:47:25,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=501140.0, ans=0.0 2024-09-18 18:47:37,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.481e+01 8.983e+01 9.463e+01 1.324e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-18 18:48:16,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=501260.0, ans=0.0 2024-09-18 18:48:20,795 INFO [train.py:1198] (1/2) Epoch 28, batch 3150, loss[loss=0.2527, ctc_loss=0.1306, cr_loss=0.3752, attn_decoder_loss=0.2579, over 28812.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1248, cr_loss=0.3676, attn_decoder_loss=0.2451, over 5782972.90 frames. ], batch size: 104, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:48:33,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=501300.0, ans=0.125 2024-09-18 18:48:42,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=501340.0, ans=0.0 2024-09-18 18:48:52,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2024-09-18 18:49:15,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.87 vs. limit=15.0 2024-09-18 18:49:16,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=501420.0, ans=0.0 2024-09-18 18:49:20,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=22.5 2024-09-18 18:49:30,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=501460.0, ans=0.125 2024-09-18 18:49:36,029 INFO [train.py:1198] (1/2) Epoch 28, batch 3200, loss[loss=0.2332, ctc_loss=0.1152, cr_loss=0.358, attn_decoder_loss=0.2384, over 29405.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1239, cr_loss=0.3662, attn_decoder_loss=0.2443, over 5794577.09 frames. ], batch size: 79, lr: 3.94e-03, grad_scale: 16.0 2024-09-18 18:49:40,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=501500.0, ans=0.125 2024-09-18 18:49:43,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=501500.0, ans=0.2 2024-09-18 18:49:49,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=501500.0, ans=0.125 2024-09-18 18:50:03,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=501540.0, ans=0.125 2024-09-18 18:50:10,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=501580.0, ans=0.5 2024-09-18 18:50:13,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.202e+01 8.510e+01 8.995e+01 9.300e+01 1.777e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-18 18:50:15,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=501580.0, ans=0.2 2024-09-18 18:50:21,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=501580.0, ans=0.1 2024-09-18 18:50:29,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=501620.0, ans=0.125 2024-09-18 18:50:46,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.60 vs. limit=5.0 2024-09-18 18:50:54,450 INFO [train.py:1198] (1/2) Epoch 28, batch 3250, loss[loss=0.2468, ctc_loss=0.1303, cr_loss=0.3751, attn_decoder_loss=0.2514, over 29696.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1241, cr_loss=0.3667, attn_decoder_loss=0.2447, over 5799617.10 frames. ], batch size: 84, lr: 3.94e-03, grad_scale: 16.0 2024-09-18 18:50:59,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=501700.0, ans=0.125 2024-09-18 18:51:05,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=501700.0, ans=0.025 2024-09-18 18:51:19,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-18 18:51:33,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=501780.0, ans=0.125 2024-09-18 18:51:36,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=501780.0, ans=10.0 2024-09-18 18:51:38,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=501820.0, ans=0.0 2024-09-18 18:51:50,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=501820.0, ans=0.125 2024-09-18 18:51:52,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=501820.0, ans=0.125 2024-09-18 18:52:02,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=501860.0, ans=0.035 2024-09-18 18:52:08,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=501860.0, ans=0.0 2024-09-18 18:52:10,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=501900.0, ans=0.125 2024-09-18 18:52:10,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=501900.0, ans=0.04949747468305833 2024-09-18 18:52:11,769 INFO [train.py:1198] (1/2) Epoch 28, batch 3300, loss[loss=0.25, ctc_loss=0.1257, cr_loss=0.3617, attn_decoder_loss=0.2558, over 28134.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1234, cr_loss=0.3648, attn_decoder_loss=0.2438, over 5795912.93 frames. ], batch size: 111, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:52:19,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=501900.0, ans=0.125 2024-09-18 18:52:48,162 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.472e+01 9.021e+01 9.788e+01 2.409e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-18 18:53:21,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=502060.0, ans=0.05 2024-09-18 18:53:27,376 INFO [train.py:1198] (1/2) Epoch 28, batch 3350, loss[loss=0.2515, ctc_loss=0.1286, cr_loss=0.3686, attn_decoder_loss=0.257, over 28867.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.124, cr_loss=0.3658, attn_decoder_loss=0.2446, over 5772854.75 frames. ], batch size: 104, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:53:30,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=502100.0, ans=0.125 2024-09-18 18:53:30,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=502100.0, ans=10.0 2024-09-18 18:53:35,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=502100.0, ans=0.125 2024-09-18 18:53:55,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=502140.0, ans=0.0 2024-09-18 18:54:03,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=502180.0, ans=0.0 2024-09-18 18:54:13,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=502220.0, ans=0.125 2024-09-18 18:54:35,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.20 vs. limit=15.0 2024-09-18 18:54:38,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=502260.0, ans=0.125 2024-09-18 18:54:45,497 INFO [train.py:1198] (1/2) Epoch 28, batch 3400, loss[loss=0.2174, ctc_loss=0.1074, cr_loss=0.3431, attn_decoder_loss=0.2219, over 29320.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1244, cr_loss=0.3664, attn_decoder_loss=0.2446, over 5766049.52 frames. ], batch size: 67, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:54:48,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=502300.0, ans=0.0 2024-09-18 18:55:11,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=502340.0, ans=0.0 2024-09-18 18:55:18,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=502380.0, ans=0.2 2024-09-18 18:55:21,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.459e+01 8.977e+01 9.782e+01 2.197e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-18 18:55:56,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=502460.0, ans=0.0 2024-09-18 18:56:03,376 INFO [train.py:1198] (1/2) Epoch 28, batch 3450, loss[loss=0.2444, ctc_loss=0.1236, cr_loss=0.3519, attn_decoder_loss=0.25, over 28351.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1244, cr_loss=0.3664, attn_decoder_loss=0.2448, over 5773875.15 frames. ], batch size: 111, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:56:11,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.49 vs. limit=15.0 2024-09-18 18:56:21,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=502540.0, ans=0.05 2024-09-18 18:56:25,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-09-18 18:56:30,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=502540.0, ans=0.125 2024-09-18 18:56:51,740 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:57:17,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=502700.0, ans=0.0 2024-09-18 18:57:18,867 INFO [train.py:1198] (1/2) Epoch 28, batch 3500, loss[loss=0.2195, ctc_loss=0.1116, cr_loss=0.3483, attn_decoder_loss=0.2238, over 29308.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1242, cr_loss=0.3664, attn_decoder_loss=0.2443, over 5774866.36 frames. ], batch size: 71, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:57:19,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=502700.0, ans=0.2 2024-09-18 18:57:20,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.21 vs. limit=15.0 2024-09-18 18:57:23,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=502700.0, ans=0.125 2024-09-18 18:57:51,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=502780.0, ans=0.1 2024-09-18 18:57:57,152 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.602e+01 9.014e+01 9.488e+01 1.440e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-18 18:57:58,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502780.0, ans=0.1 2024-09-18 18:58:01,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=502780.0, ans=0.0 2024-09-18 18:58:24,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=502860.0, ans=0.025 2024-09-18 18:58:27,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=502860.0, ans=0.125 2024-09-18 18:58:35,925 INFO [train.py:1198] (1/2) Epoch 28, batch 3550, loss[loss=0.2545, ctc_loss=0.1263, cr_loss=0.3725, attn_decoder_loss=0.2605, over 29705.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1238, cr_loss=0.3656, attn_decoder_loss=0.2442, over 5780797.96 frames. ], batch size: 89, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:58:37,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=502900.0, ans=0.1 2024-09-18 18:58:43,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=502900.0, ans=0.125 2024-09-18 18:58:53,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=502940.0, ans=0.0 2024-09-18 18:59:13,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=502980.0, ans=0.125 2024-09-18 18:59:14,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=502980.0, ans=0.125 2024-09-18 18:59:37,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=503060.0, ans=0.125 2024-09-18 18:59:50,237 INFO [train.py:1198] (1/2) Epoch 28, batch 3600, loss[loss=0.2289, ctc_loss=0.1235, cr_loss=0.3634, attn_decoder_loss=0.2326, over 29496.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.124, cr_loss=0.3664, attn_decoder_loss=0.2445, over 5790468.06 frames. ], batch size: 77, lr: 3.94e-03, grad_scale: 16.0 2024-09-18 18:59:50,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=503100.0, ans=0.1 2024-09-18 18:59:53,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=503100.0, ans=0.0 2024-09-18 19:00:07,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-09-18 19:00:20,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=503180.0, ans=0.125 2024-09-18 19:00:24,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=503180.0, ans=0.2 2024-09-18 19:00:26,000 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.358e+01 8.868e+01 9.352e+01 4.010e+02, threshold=1.774e+02, percent-clipped=1.0 2024-09-18 19:00:35,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-09-18 19:00:38,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=22.5 2024-09-18 19:00:39,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=503220.0, ans=6.0 2024-09-18 19:00:51,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=503260.0, ans=0.125 2024-09-18 19:01:05,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=503300.0, ans=0.0 2024-09-18 19:01:07,134 INFO [train.py:1198] (1/2) Epoch 28, batch 3650, loss[loss=0.2584, ctc_loss=0.1446, cr_loss=0.4188, attn_decoder_loss=0.2618, over 29531.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1232, cr_loss=0.3646, attn_decoder_loss=0.2436, over 5793261.80 frames. ], batch size: 90, lr: 3.94e-03, grad_scale: 16.0 2024-09-18 19:01:25,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=503340.0, ans=0.125 2024-09-18 19:01:37,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503380.0, ans=0.1 2024-09-18 19:01:43,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=503380.0, ans=0.125 2024-09-18 19:01:44,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=503380.0, ans=0.125 2024-09-18 19:01:49,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=503380.0, ans=0.2 2024-09-18 19:01:50,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=503420.0, ans=0.125 2024-09-18 19:01:53,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=503420.0, ans=0.125 2024-09-18 19:02:01,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=503420.0, ans=0.1 2024-09-18 19:02:04,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.78 vs. limit=22.5 2024-09-18 19:02:13,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=503460.0, ans=0.025 2024-09-18 19:02:14,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=503460.0, ans=0.125 2024-09-18 19:02:15,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=503460.0, ans=0.2 2024-09-18 19:02:21,715 INFO [train.py:1198] (1/2) Epoch 28, batch 3700, loss[loss=0.2541, ctc_loss=0.1333, cr_loss=0.3971, attn_decoder_loss=0.2587, over 29706.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1234, cr_loss=0.3656, attn_decoder_loss=0.2438, over 5804097.45 frames. ], batch size: 84, lr: 3.93e-03, grad_scale: 16.0 2024-09-18 19:02:53,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=503580.0, ans=0.0 2024-09-18 19:02:57,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=503580.0, ans=0.125 2024-09-18 19:02:58,673 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.604e+01 9.187e+01 9.989e+01 2.860e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-18 19:03:00,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=503580.0, ans=0.1 2024-09-18 19:03:06,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=503620.0, ans=0.0 2024-09-18 19:03:14,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=22.5 2024-09-18 19:03:22,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=503660.0, ans=0.125 2024-09-18 19:03:27,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=503660.0, ans=0.125 2024-09-18 19:03:35,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2024-09-18 19:03:36,039 INFO [train.py:1198] (1/2) Epoch 28, batch 3750, loss[loss=0.2134, ctc_loss=0.1057, cr_loss=0.3278, attn_decoder_loss=0.218, over 29314.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1235, cr_loss=0.3664, attn_decoder_loss=0.2437, over 5807841.41 frames. ], batch size: 67, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:03:45,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=503700.0, ans=0.125 2024-09-18 19:04:52,049 INFO [train.py:1198] (1/2) Epoch 28, batch 3800, loss[loss=0.2479, ctc_loss=0.1271, cr_loss=0.3849, attn_decoder_loss=0.2527, over 29633.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.123, cr_loss=0.3647, attn_decoder_loss=0.2431, over 5799284.49 frames. ], batch size: 86, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:04:53,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=503900.0, ans=0.125 2024-09-18 19:05:14,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=503940.0, ans=0.0 2024-09-18 19:05:28,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=503980.0, ans=0.125 2024-09-18 19:05:29,966 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.413e+01 8.933e+01 9.626e+01 3.409e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 19:05:30,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=503980.0, ans=0.125 2024-09-18 19:05:39,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2024-09-18 19:05:39,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=22.5 2024-09-18 19:05:45,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=504020.0, ans=0.1 2024-09-18 19:05:45,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.86 vs. limit=12.0 2024-09-18 19:05:51,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.21 vs. limit=10.0 2024-09-18 19:06:07,076 INFO [train.py:1198] (1/2) Epoch 28, batch 3850, loss[loss=0.2586, ctc_loss=0.1447, cr_loss=0.4031, attn_decoder_loss=0.2623, over 29223.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.123, cr_loss=0.3645, attn_decoder_loss=0.2433, over 5813708.20 frames. ], batch size: 100, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:06:07,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=504100.0, ans=0.125 2024-09-18 19:06:27,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=504140.0, ans=0.0 2024-09-18 19:06:31,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=504140.0, ans=0.2 2024-09-18 19:07:09,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=504260.0, ans=0.0 2024-09-18 19:07:15,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504260.0, ans=0.1 2024-09-18 19:07:23,008 INFO [train.py:1198] (1/2) Epoch 28, batch 3900, loss[loss=0.2464, ctc_loss=0.1307, cr_loss=0.369, attn_decoder_loss=0.2511, over 29637.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1232, cr_loss=0.3651, attn_decoder_loss=0.2438, over 5817987.83 frames. ], batch size: 86, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:07:58,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=504380.0, ans=0.0 2024-09-18 19:07:59,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=504380.0, ans=0.125 2024-09-18 19:08:00,190 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.446e+01 8.921e+01 9.410e+01 1.233e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-18 19:08:20,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.81 vs. limit=15.0 2024-09-18 19:08:32,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=504460.0, ans=0.125 2024-09-18 19:08:36,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=504500.0, ans=0.125 2024-09-18 19:08:37,160 INFO [train.py:1198] (1/2) Epoch 28, batch 3950, loss[loss=0.2624, ctc_loss=0.1376, cr_loss=0.3905, attn_decoder_loss=0.2676, over 29438.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1226, cr_loss=0.3645, attn_decoder_loss=0.2435, over 5837027.72 frames. ], batch size: 97, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:08:43,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=504500.0, ans=0.025 2024-09-18 19:09:11,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=504580.0, ans=0.125 2024-09-18 19:09:52,201 INFO [train.py:1198] (1/2) Epoch 28, batch 4000, loss[loss=0.2223, ctc_loss=0.1123, cr_loss=0.3413, attn_decoder_loss=0.2269, over 29510.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1227, cr_loss=0.3641, attn_decoder_loss=0.2435, over 5812478.16 frames. ], batch size: 74, lr: 3.93e-03, grad_scale: 16.0 2024-09-18 19:10:09,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.38 vs. limit=10.0 2024-09-18 19:10:23,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=504780.0, ans=0.0 2024-09-18 19:10:24,957 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:10:29,483 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.096e+01 8.633e+01 9.036e+01 9.608e+01 3.784e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-18 19:10:29,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=504780.0, ans=0.125 2024-09-18 19:10:38,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=504820.0, ans=0.125 2024-09-18 19:11:08,050 INFO [train.py:1198] (1/2) Epoch 28, batch 4050, loss[loss=0.2563, ctc_loss=0.1441, cr_loss=0.3738, attn_decoder_loss=0.2605, over 20036.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1224, cr_loss=0.363, attn_decoder_loss=0.2432, over 5796491.62 frames. ], batch size: 209, lr: 3.93e-03, grad_scale: 16.0 2024-09-18 19:11:08,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=504900.0, ans=0.035 2024-09-18 19:11:10,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=12.0 2024-09-18 19:11:14,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2024-09-18 19:11:23,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=504940.0, ans=0.95 2024-09-18 19:11:46,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=504980.0, ans=0.2 2024-09-18 19:11:51,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.96 vs. limit=22.5 2024-09-18 19:11:56,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=505020.0, ans=0.0 2024-09-18 19:12:02,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505020.0, ans=0.125 2024-09-18 19:12:19,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=505060.0, ans=0.5 2024-09-18 19:12:21,989 INFO [train.py:1198] (1/2) Epoch 28, batch 4100, loss[loss=0.2519, ctc_loss=0.1323, cr_loss=0.3864, attn_decoder_loss=0.2566, over 29504.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1229, cr_loss=0.3638, attn_decoder_loss=0.2434, over 5790646.35 frames. ], batch size: 90, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:12:41,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=505140.0, ans=0.2 2024-09-18 19:12:42,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=505140.0, ans=0.07 2024-09-18 19:12:50,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=505180.0, ans=22.5 2024-09-18 19:13:00,182 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.509e+01 9.171e+01 9.842e+01 2.303e+02, threshold=1.834e+02, percent-clipped=2.0 2024-09-18 19:13:24,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=505260.0, ans=0.125 2024-09-18 19:13:34,720 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:13:35,877 INFO [train.py:1198] (1/2) Epoch 28, batch 4150, loss[loss=0.2385, ctc_loss=0.1383, cr_loss=0.3911, attn_decoder_loss=0.241, over 29502.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.123, cr_loss=0.3638, attn_decoder_loss=0.2431, over 5796199.78 frames. ], batch size: 77, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:13:46,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505300.0, ans=0.1 2024-09-18 19:14:04,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-09-18 19:14:12,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=505380.0, ans=0.0 2024-09-18 19:14:17,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.95 vs. limit=22.5 2024-09-18 19:14:26,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-09-18 19:14:27,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=505420.0, ans=0.125 2024-09-18 19:14:33,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=505420.0, ans=0.0 2024-09-18 19:14:34,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=505460.0, ans=0.2 2024-09-18 19:14:39,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.14 vs. limit=15.0 2024-09-18 19:14:47,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.29 vs. limit=22.5 2024-09-18 19:14:49,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=505500.0, ans=0.125 2024-09-18 19:14:50,776 INFO [train.py:1198] (1/2) Epoch 28, batch 4200, loss[loss=0.2422, ctc_loss=0.1224, cr_loss=0.3449, attn_decoder_loss=0.2479, over 29541.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1232, cr_loss=0.3644, attn_decoder_loss=0.2437, over 5798770.80 frames. ], batch size: 90, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:14:55,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=505500.0, ans=0.125 2024-09-18 19:15:30,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.561e+01 9.045e+01 9.717e+01 1.244e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-18 19:15:32,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=505580.0, ans=0.125 2024-09-18 19:15:33,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=505580.0, ans=0.0 2024-09-18 19:15:36,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=505620.0, ans=0.1 2024-09-18 19:15:47,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505620.0, ans=0.1 2024-09-18 19:15:51,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=505660.0, ans=0.125 2024-09-18 19:15:55,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=505660.0, ans=0.04949747468305833 2024-09-18 19:16:06,008 INFO [train.py:1198] (1/2) Epoch 28, batch 4250, loss[loss=0.2155, ctc_loss=0.1003, cr_loss=0.3096, attn_decoder_loss=0.2214, over 29523.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1231, cr_loss=0.3642, attn_decoder_loss=0.2438, over 5805381.32 frames. ], batch size: 74, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:16:12,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=505700.0, ans=0.125 2024-09-18 19:16:15,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=505700.0, ans=0.125 2024-09-18 19:17:03,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=505860.0, ans=0.125 2024-09-18 19:17:03,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=505860.0, ans=0.0 2024-09-18 19:17:19,606 INFO [train.py:1198] (1/2) Epoch 28, batch 4300, loss[loss=0.2406, ctc_loss=0.1207, cr_loss=0.347, attn_decoder_loss=0.2463, over 29556.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1231, cr_loss=0.3642, attn_decoder_loss=0.244, over 5794703.08 frames. ], batch size: 87, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:17:44,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=505940.0, ans=0.125 2024-09-18 19:17:55,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.13 vs. limit=15.0 2024-09-18 19:17:57,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=505980.0, ans=0.2 2024-09-18 19:17:58,958 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.600e+01 9.054e+01 9.453e+01 1.609e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-18 19:18:08,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=506020.0, ans=0.0 2024-09-18 19:18:22,380 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:18:29,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-18 19:18:35,187 INFO [train.py:1198] (1/2) Epoch 28, batch 4350, loss[loss=0.2556, ctc_loss=0.133, cr_loss=0.3794, attn_decoder_loss=0.2607, over 29492.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1262, cr_loss=0.3706, attn_decoder_loss=0.2475, over 5797219.96 frames. ], batch size: 97, lr: 3.92e-03, grad_scale: 8.0 2024-09-18 19:18:41,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=506100.0, ans=0.1 2024-09-18 19:18:59,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=506140.0, ans=0.125 2024-09-18 19:19:07,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=506180.0, ans=0.1 2024-09-18 19:19:18,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=506220.0, ans=0.125 2024-09-18 19:19:18,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.72 vs. limit=15.0 2024-09-18 19:19:21,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=506220.0, ans=0.1 2024-09-18 19:19:25,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=506220.0, ans=0.0 2024-09-18 19:19:27,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506220.0, ans=0.1 2024-09-18 19:19:38,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=506260.0, ans=0.125 2024-09-18 19:19:40,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=506260.0, ans=0.125 2024-09-18 19:19:48,761 INFO [train.py:1198] (1/2) Epoch 28, batch 4400, loss[loss=0.2571, ctc_loss=0.1418, cr_loss=0.4118, attn_decoder_loss=0.2607, over 27428.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1276, cr_loss=0.373, attn_decoder_loss=0.2498, over 5768052.90 frames. ], batch size: 124, lr: 3.92e-03, grad_scale: 16.0 2024-09-18 19:20:20,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=506380.0, ans=0.125 2024-09-18 19:20:28,788 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.311e+01 8.874e+01 9.241e+01 9.772e+01 1.532e+02, threshold=1.848e+02, percent-clipped=0.0 2024-09-18 19:20:50,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=506460.0, ans=0.125 2024-09-18 19:21:03,588 INFO [train.py:1198] (1/2) Epoch 28, batch 4450, loss[loss=0.2583, ctc_loss=0.1515, cr_loss=0.3881, attn_decoder_loss=0.2616, over 19700.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1316, cr_loss=0.3783, attn_decoder_loss=0.2522, over 5577560.91 frames. ], batch size: 209, lr: 3.92e-03, grad_scale: 8.0 2024-09-18 19:21:25,120 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:21:45,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=506580.0, ans=0.1 2024-09-18 19:21:58,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=506620.0, ans=0.2 2024-09-18 19:22:18,777 INFO [train.py:1198] (1/2) Epoch 28, batch 4500, loss[loss=0.2595, ctc_loss=0.1529, cr_loss=0.3724, attn_decoder_loss=0.2631, over 20547.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1357, cr_loss=0.3813, attn_decoder_loss=0.2544, over 5238401.56 frames. ], batch size: 210, lr: 3.92e-03, grad_scale: 8.0 2024-09-18 19:22:19,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=506700.0, ans=0.125 2024-09-18 19:22:23,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=506700.0, ans=0.0 2024-09-18 19:22:27,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=506700.0, ans=0.0 2024-09-18 19:22:37,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.95 vs. limit=22.5 2024-09-18 19:22:38,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=506740.0, ans=0.0 2024-09-18 19:23:47,632 INFO [train.py:1198] (1/2) Epoch 29, batch 0, loss[loss=0.2216, ctc_loss=0.1073, cr_loss=0.3398, attn_decoder_loss=0.2268, over 29599.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1073, cr_loss=0.3398, attn_decoder_loss=0.2268, over 29599.00 frames. ], batch size: 73, lr: 3.85e-03, grad_scale: 16.0 2024-09-18 19:23:47,633 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 19:23:52,463 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6832, 4.5922, 4.4039, 4.1298], device='cuda:1') 2024-09-18 19:24:06,127 INFO [train.py:1230] (1/2) Epoch 29, validation: loss=0.2126, ctc_loss=0.03746, cr_loss=5.58e-15, attn_decoder_loss=0.2321, over 944034.00 frames. 2024-09-18 19:24:06,128 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 19:24:09,037 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.797e+01 1.050e+02 1.169e+02 1.299e+02 2.763e+02, threshold=2.337e+02, percent-clipped=3.0 2024-09-18 19:24:15,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=506800.0, ans=0.125 2024-09-18 19:24:54,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=506920.0, ans=0.5 2024-09-18 19:25:03,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=506920.0, ans=0.1 2024-09-18 19:25:08,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=506960.0, ans=0.0 2024-09-18 19:25:12,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=506960.0, ans=0.1 2024-09-18 19:25:16,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=506960.0, ans=0.125 2024-09-18 19:25:21,689 INFO [train.py:1198] (1/2) Epoch 29, batch 50, loss[loss=0.2115, ctc_loss=0.1051, cr_loss=0.3289, attn_decoder_loss=0.216, over 29455.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1272, cr_loss=0.3745, attn_decoder_loss=0.2463, over 1267295.76 frames. ], batch size: 70, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:25:28,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=507000.0, ans=0.09899494936611666 2024-09-18 19:25:34,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=507000.0, ans=0.1 2024-09-18 19:26:02,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=507080.0, ans=0.125 2024-09-18 19:26:29,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=507160.0, ans=0.125 2024-09-18 19:26:41,678 INFO [train.py:1198] (1/2) Epoch 29, batch 100, loss[loss=0.2417, ctc_loss=0.1347, cr_loss=0.4107, attn_decoder_loss=0.2445, over 29540.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1275, cr_loss=0.3741, attn_decoder_loss=0.2478, over 2252168.36 frames. ], batch size: 76, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:26:46,192 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.735e+01 9.318e+01 1.000e+02 1.586e+02, threshold=1.864e+02, percent-clipped=0.0 2024-09-18 19:26:46,555 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:27:06,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2024-09-18 19:27:12,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=507280.0, ans=0.125 2024-09-18 19:27:46,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=507360.0, ans=0.125 2024-09-18 19:27:56,473 INFO [train.py:1198] (1/2) Epoch 29, batch 150, loss[loss=0.211, ctc_loss=0.1042, cr_loss=0.3284, attn_decoder_loss=0.2156, over 29424.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.125, cr_loss=0.37, attn_decoder_loss=0.2455, over 3045972.17 frames. ], batch size: 70, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:27:56,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507400.0, ans=0.1 2024-09-18 19:27:59,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=507400.0, ans=0.125 2024-09-18 19:28:25,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=507480.0, ans=0.2 2024-09-18 19:28:25,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=507480.0, ans=10.0 2024-09-18 19:28:28,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=507480.0, ans=0.125 2024-09-18 19:28:32,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=507480.0, ans=0.2 2024-09-18 19:28:44,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=507520.0, ans=0.0 2024-09-18 19:28:59,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=507560.0, ans=0.1 2024-09-18 19:29:11,259 INFO [train.py:1198] (1/2) Epoch 29, batch 200, loss[loss=0.2424, ctc_loss=0.1174, cr_loss=0.344, attn_decoder_loss=0.2486, over 27605.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1245, cr_loss=0.3686, attn_decoder_loss=0.2447, over 3658529.29 frames. ], batch size: 125, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:29:11,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=507600.0, ans=0.0 2024-09-18 19:29:11,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=507600.0, ans=0.125 2024-09-18 19:29:14,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=507600.0, ans=0.09899494936611666 2024-09-18 19:29:15,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.328e+01 8.818e+01 9.310e+01 1.091e+02, threshold=1.764e+02, percent-clipped=0.0 2024-09-18 19:29:37,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=22.5 2024-09-18 19:29:50,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=507680.0, ans=0.2 2024-09-18 19:29:54,777 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:30:01,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=507720.0, ans=0.0 2024-09-18 19:30:05,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.40 vs. limit=10.0 2024-09-18 19:30:10,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=507720.0, ans=0.0 2024-09-18 19:30:14,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2024-09-18 19:30:25,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2024-09-18 19:30:29,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=507760.0, ans=0.125 2024-09-18 19:30:31,824 INFO [train.py:1198] (1/2) Epoch 29, batch 250, loss[loss=0.2517, ctc_loss=0.141, cr_loss=0.3938, attn_decoder_loss=0.2552, over 29187.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1239, cr_loss=0.3672, attn_decoder_loss=0.2443, over 4141477.92 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:30:53,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=507840.0, ans=0.125 2024-09-18 19:31:12,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=507880.0, ans=0.0 2024-09-18 19:31:12,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=507880.0, ans=0.125 2024-09-18 19:31:47,694 INFO [train.py:1198] (1/2) Epoch 29, batch 300, loss[loss=0.2568, ctc_loss=0.1307, cr_loss=0.374, attn_decoder_loss=0.2625, over 29514.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1231, cr_loss=0.3654, attn_decoder_loss=0.2437, over 4510823.58 frames. ], batch size: 92, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:31:50,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=508000.0, ans=0.0 2024-09-18 19:31:52,194 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.405e+01 8.844e+01 9.472e+01 2.622e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-18 19:31:56,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.89 vs. limit=22.5 2024-09-18 19:32:26,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=508080.0, ans=0.0 2024-09-18 19:32:31,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=508120.0, ans=0.125 2024-09-18 19:33:00,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=508160.0, ans=0.125 2024-09-18 19:33:03,340 INFO [train.py:1198] (1/2) Epoch 29, batch 350, loss[loss=0.2227, ctc_loss=0.1168, cr_loss=0.3476, attn_decoder_loss=0.2267, over 29305.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1231, cr_loss=0.3654, attn_decoder_loss=0.244, over 4794848.54 frames. ], batch size: 71, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:33:23,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=508240.0, ans=0.0 2024-09-18 19:33:28,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=508240.0, ans=0.2 2024-09-18 19:33:37,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=508280.0, ans=0.025 2024-09-18 19:33:47,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.92 vs. limit=15.0 2024-09-18 19:34:00,796 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:34:22,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=508400.0, ans=0.125 2024-09-18 19:34:23,115 INFO [train.py:1198] (1/2) Epoch 29, batch 400, loss[loss=0.2422, ctc_loss=0.1215, cr_loss=0.3721, attn_decoder_loss=0.2473, over 29722.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1225, cr_loss=0.3639, attn_decoder_loss=0.2434, over 5023338.67 frames. ], batch size: 82, lr: 3.85e-03, grad_scale: 16.0 2024-09-18 19:34:27,753 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 8.478e+01 8.916e+01 9.451e+01 2.866e+02, threshold=1.783e+02, percent-clipped=2.0 2024-09-18 19:34:31,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=508400.0, ans=0.0 2024-09-18 19:34:43,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2024-09-18 19:34:56,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=508480.0, ans=0.125 2024-09-18 19:35:04,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=508480.0, ans=0.0 2024-09-18 19:35:39,007 INFO [train.py:1198] (1/2) Epoch 29, batch 450, loss[loss=0.2459, ctc_loss=0.1291, cr_loss=0.3781, attn_decoder_loss=0.2505, over 29699.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1226, cr_loss=0.3644, attn_decoder_loss=0.2437, over 5186334.77 frames. ], batch size: 83, lr: 3.85e-03, grad_scale: 16.0 2024-09-18 19:35:45,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=508600.0, ans=0.0 2024-09-18 19:36:00,810 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:36:08,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.68 vs. limit=22.5 2024-09-18 19:36:24,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-18 19:36:26,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=508720.0, ans=0.0 2024-09-18 19:36:32,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=508720.0, ans=0.0 2024-09-18 19:36:52,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=508760.0, ans=0.0 2024-09-18 19:36:55,957 INFO [train.py:1198] (1/2) Epoch 29, batch 500, loss[loss=0.2486, ctc_loss=0.1282, cr_loss=0.3794, attn_decoder_loss=0.2536, over 29485.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1222, cr_loss=0.3637, attn_decoder_loss=0.243, over 5329516.04 frames. ], batch size: 94, lr: 3.84e-03, grad_scale: 16.0 2024-09-18 19:37:00,504 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 8.526e+01 8.926e+01 9.589e+01 3.622e+02, threshold=1.785e+02, percent-clipped=3.0 2024-09-18 19:37:15,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=508840.0, ans=0.1 2024-09-18 19:37:19,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508840.0, ans=0.1 2024-09-18 19:37:24,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=508840.0, ans=0.125 2024-09-18 19:37:24,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=508840.0, ans=0.125 2024-09-18 19:38:05,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=508960.0, ans=0.125 2024-09-18 19:38:15,952 INFO [train.py:1198] (1/2) Epoch 29, batch 550, loss[loss=0.2477, ctc_loss=0.1283, cr_loss=0.3798, attn_decoder_loss=0.2525, over 28865.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1219, cr_loss=0.363, attn_decoder_loss=0.2431, over 5422124.06 frames. ], batch size: 104, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:38:16,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=509000.0, ans=0.2 2024-09-18 19:38:44,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=509080.0, ans=0.0 2024-09-18 19:38:47,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=509080.0, ans=0.125 2024-09-18 19:39:31,424 INFO [train.py:1198] (1/2) Epoch 29, batch 600, loss[loss=0.2584, ctc_loss=0.1385, cr_loss=0.41, attn_decoder_loss=0.2626, over 29256.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1221, cr_loss=0.3635, attn_decoder_loss=0.2434, over 5509825.95 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:39:37,601 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.468e+01 8.932e+01 9.529e+01 2.879e+02, threshold=1.786e+02, percent-clipped=3.0 2024-09-18 19:39:43,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.08 vs. limit=15.0 2024-09-18 19:40:06,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=509280.0, ans=0.125 2024-09-18 19:40:07,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=509280.0, ans=0.125 2024-09-18 19:40:09,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=509280.0, ans=0.0 2024-09-18 19:40:13,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=509280.0, ans=0.125 2024-09-18 19:40:14,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.36 vs. limit=15.0 2024-09-18 19:40:16,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=509320.0, ans=0.07 2024-09-18 19:40:29,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2024-09-18 19:40:37,796 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:40:45,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=509400.0, ans=0.0 2024-09-18 19:40:46,514 INFO [train.py:1198] (1/2) Epoch 29, batch 650, loss[loss=0.2417, ctc_loss=0.1299, cr_loss=0.3924, attn_decoder_loss=0.2454, over 29754.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1219, cr_loss=0.3633, attn_decoder_loss=0.2431, over 5587455.01 frames. ], batch size: 81, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:41:01,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=509440.0, ans=0.125 2024-09-18 19:41:13,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=509440.0, ans=0.125 2024-09-18 19:41:34,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=509520.0, ans=0.0 2024-09-18 19:41:35,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=509520.0, ans=0.125 2024-09-18 19:41:37,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=509520.0, ans=0.2 2024-09-18 19:41:37,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=509520.0, ans=0.125 2024-09-18 19:41:43,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=509520.0, ans=0.2 2024-09-18 19:41:47,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=509560.0, ans=0.125 2024-09-18 19:41:56,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=509560.0, ans=0.0 2024-09-18 19:42:05,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.95 vs. limit=22.5 2024-09-18 19:42:06,941 INFO [train.py:1198] (1/2) Epoch 29, batch 700, loss[loss=0.2362, ctc_loss=0.1264, cr_loss=0.3866, attn_decoder_loss=0.2399, over 29539.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1226, cr_loss=0.365, attn_decoder_loss=0.2438, over 5637733.69 frames. ], batch size: 76, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:42:11,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=509600.0, ans=0.0 2024-09-18 19:42:12,943 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.488e+01 8.956e+01 9.496e+01 1.572e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-18 19:42:13,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=509600.0, ans=0.125 2024-09-18 19:42:13,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2024-09-18 19:42:26,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=509640.0, ans=0.125 2024-09-18 19:42:42,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509680.0, ans=0.1 2024-09-18 19:42:43,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=509680.0, ans=0.125 2024-09-18 19:42:48,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=509680.0, ans=0.125 2024-09-18 19:42:49,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=509680.0, ans=0.1 2024-09-18 19:42:59,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=509720.0, ans=0.5 2024-09-18 19:43:14,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=509760.0, ans=0.125 2024-09-18 19:43:23,117 INFO [train.py:1198] (1/2) Epoch 29, batch 750, loss[loss=0.2493, ctc_loss=0.1337, cr_loss=0.384, attn_decoder_loss=0.2536, over 29697.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.122, cr_loss=0.3639, attn_decoder_loss=0.2434, over 5676958.21 frames. ], batch size: 82, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:43:32,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2024-09-18 19:43:44,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.29 vs. limit=22.5 2024-09-18 19:44:01,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=509880.0, ans=0.125 2024-09-18 19:44:07,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=509920.0, ans=0.0 2024-09-18 19:44:28,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=509960.0, ans=0.125 2024-09-18 19:44:32,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=509960.0, ans=0.125 2024-09-18 19:44:38,406 INFO [train.py:1198] (1/2) Epoch 29, batch 800, loss[loss=0.2258, ctc_loss=0.1124, cr_loss=0.3557, attn_decoder_loss=0.2305, over 29603.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1218, cr_loss=0.3636, attn_decoder_loss=0.2433, over 5707962.83 frames. ], batch size: 73, lr: 3.84e-03, grad_scale: 16.0 2024-09-18 19:44:44,456 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.377e+01 8.861e+01 9.386e+01 4.532e+02, threshold=1.772e+02, percent-clipped=1.0 2024-09-18 19:45:14,037 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:45:19,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=510080.0, ans=0.125 2024-09-18 19:45:55,652 INFO [train.py:1198] (1/2) Epoch 29, batch 850, loss[loss=0.2458, ctc_loss=0.1228, cr_loss=0.3648, attn_decoder_loss=0.2514, over 29734.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1217, cr_loss=0.3636, attn_decoder_loss=0.2431, over 5736666.42 frames. ], batch size: 89, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:46:07,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=510200.0, ans=0.0 2024-09-18 19:46:13,037 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:46:43,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=510320.0, ans=0.2 2024-09-18 19:46:50,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2024-09-18 19:46:57,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=510360.0, ans=0.125 2024-09-18 19:47:13,949 INFO [train.py:1198] (1/2) Epoch 29, batch 900, loss[loss=0.2227, ctc_loss=0.1148, cr_loss=0.3378, attn_decoder_loss=0.2272, over 29593.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1221, cr_loss=0.3641, attn_decoder_loss=0.2433, over 5741270.45 frames. ], batch size: 73, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:47:21,305 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.540e+01 9.030e+01 9.336e+01 1.932e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-18 19:48:02,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510520.0, ans=0.1 2024-09-18 19:48:07,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510520.0, ans=0.1 2024-09-18 19:48:29,447 INFO [train.py:1198] (1/2) Epoch 29, batch 950, loss[loss=0.2305, ctc_loss=0.1128, cr_loss=0.3611, attn_decoder_loss=0.2355, over 29523.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1226, cr_loss=0.365, attn_decoder_loss=0.2439, over 5742088.56 frames. ], batch size: 74, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:48:29,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=510600.0, ans=0.125 2024-09-18 19:48:29,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=510600.0, ans=0.07 2024-09-18 19:48:32,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=510600.0, ans=0.0 2024-09-18 19:48:43,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=510640.0, ans=0.5 2024-09-18 19:48:59,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-18 19:49:03,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=510680.0, ans=0.125 2024-09-18 19:49:18,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=510720.0, ans=0.04949747468305833 2024-09-18 19:49:36,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=510760.0, ans=0.0 2024-09-18 19:49:39,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=510760.0, ans=0.125 2024-09-18 19:49:46,873 INFO [train.py:1198] (1/2) Epoch 29, batch 1000, loss[loss=0.2273, ctc_loss=0.1161, cr_loss=0.3533, attn_decoder_loss=0.2318, over 29475.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1233, cr_loss=0.3661, attn_decoder_loss=0.2444, over 5734608.49 frames. ], batch size: 77, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:49:56,635 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.627e+01 9.386e+01 1.009e+02 2.634e+02, threshold=1.877e+02, percent-clipped=2.0 2024-09-18 19:50:15,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=510840.0, ans=0.2 2024-09-18 19:50:33,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=510920.0, ans=0.125 2024-09-18 19:50:55,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=22.5 2024-09-18 19:51:03,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.20 vs. limit=22.5 2024-09-18 19:51:04,704 INFO [train.py:1198] (1/2) Epoch 29, batch 1050, loss[loss=0.2406, ctc_loss=0.1161, cr_loss=0.3624, attn_decoder_loss=0.2464, over 29674.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1225, cr_loss=0.3648, attn_decoder_loss=0.2435, over 5742866.70 frames. ], batch size: 85, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:51:17,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=511000.0, ans=0.1 2024-09-18 19:51:17,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.46 vs. limit=15.0 2024-09-18 19:51:34,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2024-09-18 19:51:39,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-09-18 19:51:58,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=511120.0, ans=0.1 2024-09-18 19:51:59,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=511120.0, ans=0.125 2024-09-18 19:52:19,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=511200.0, ans=0.0 2024-09-18 19:52:21,229 INFO [train.py:1198] (1/2) Epoch 29, batch 1100, loss[loss=0.2391, ctc_loss=0.129, cr_loss=0.379, attn_decoder_loss=0.2429, over 29436.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1225, cr_loss=0.3651, attn_decoder_loss=0.2433, over 5755460.49 frames. ], batch size: 78, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:52:28,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 8.572e+01 8.922e+01 9.420e+01 4.206e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-18 19:53:13,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=511320.0, ans=0.1 2024-09-18 19:53:37,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=511400.0, ans=0.125 2024-09-18 19:53:38,727 INFO [train.py:1198] (1/2) Epoch 29, batch 1150, loss[loss=0.2374, ctc_loss=0.1258, cr_loss=0.3697, attn_decoder_loss=0.2416, over 29467.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1227, cr_loss=0.3655, attn_decoder_loss=0.2431, over 5755279.31 frames. ], batch size: 78, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:53:39,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=511400.0, ans=0.0 2024-09-18 19:53:48,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=511400.0, ans=0.125 2024-09-18 19:53:52,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=511400.0, ans=0.0 2024-09-18 19:53:56,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=511440.0, ans=0.0 2024-09-18 19:53:56,673 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:53:59,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=511440.0, ans=0.125 2024-09-18 19:54:42,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=511560.0, ans=0.2 2024-09-18 19:54:50,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2024-09-18 19:54:55,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=511600.0, ans=0.0 2024-09-18 19:54:56,976 INFO [train.py:1198] (1/2) Epoch 29, batch 1200, loss[loss=0.2381, ctc_loss=0.1113, cr_loss=0.3379, attn_decoder_loss=0.2447, over 29685.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1234, cr_loss=0.3669, attn_decoder_loss=0.244, over 5748348.98 frames. ], batch size: 85, lr: 3.83e-03, grad_scale: 16.0 2024-09-18 19:55:04,482 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.543e+01 9.016e+01 9.683e+01 2.653e+02, threshold=1.803e+02, percent-clipped=3.0 2024-09-18 19:55:12,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511640.0, ans=0.1 2024-09-18 19:55:26,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=511680.0, ans=0.125 2024-09-18 19:55:36,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=511680.0, ans=0.125 2024-09-18 19:55:47,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-18 19:56:04,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.27 vs. limit=12.0 2024-09-18 19:56:12,579 INFO [train.py:1198] (1/2) Epoch 29, batch 1250, loss[loss=0.2497, ctc_loss=0.1365, cr_loss=0.3929, attn_decoder_loss=0.2536, over 29522.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1235, cr_loss=0.3666, attn_decoder_loss=0.2443, over 5774568.03 frames. ], batch size: 92, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 19:56:51,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=511880.0, ans=0.1 2024-09-18 19:57:07,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.11 vs. limit=15.0 2024-09-18 19:57:08,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=511920.0, ans=0.0 2024-09-18 19:57:38,302 INFO [train.py:1198] (1/2) Epoch 29, batch 1300, loss[loss=0.2442, ctc_loss=0.1259, cr_loss=0.3665, attn_decoder_loss=0.2492, over 28327.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1229, cr_loss=0.3645, attn_decoder_loss=0.2435, over 5780100.07 frames. ], batch size: 111, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 19:57:38,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=512000.0, ans=0.125 2024-09-18 19:57:42,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-09-18 19:57:47,484 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.526e+01 8.940e+01 9.401e+01 4.173e+02, threshold=1.788e+02, percent-clipped=2.0 2024-09-18 19:57:53,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=512040.0, ans=0.0 2024-09-18 19:58:02,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512040.0, ans=0.1 2024-09-18 19:58:03,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=512040.0, ans=0.125 2024-09-18 19:58:09,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=512080.0, ans=0.0 2024-09-18 19:58:24,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=512120.0, ans=0.125 2024-09-18 19:58:37,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=512120.0, ans=0.025 2024-09-18 19:58:49,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=512160.0, ans=0.125 2024-09-18 19:58:49,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.56 vs. limit=22.5 2024-09-18 19:58:56,559 INFO [train.py:1198] (1/2) Epoch 29, batch 1350, loss[loss=0.2366, ctc_loss=0.1215, cr_loss=0.3568, attn_decoder_loss=0.2414, over 29749.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1222, cr_loss=0.3634, attn_decoder_loss=0.243, over 5796200.89 frames. ], batch size: 81, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 19:58:56,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=512200.0, ans=0.125 2024-09-18 19:59:01,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=512200.0, ans=0.125 2024-09-18 19:59:04,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=512200.0, ans=0.125 2024-09-18 19:59:10,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=512240.0, ans=0.025 2024-09-18 19:59:16,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=512240.0, ans=0.0 2024-09-18 19:59:22,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=512240.0, ans=0.025 2024-09-18 19:59:23,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2024-09-18 19:59:31,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-09-18 19:59:35,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=512280.0, ans=0.125 2024-09-18 19:59:41,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=512320.0, ans=0.125 2024-09-18 19:59:55,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.58 vs. limit=12.0 2024-09-18 20:00:07,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=512360.0, ans=0.125 2024-09-18 20:00:11,671 INFO [train.py:1198] (1/2) Epoch 29, batch 1400, loss[loss=0.2099, ctc_loss=0.1022, cr_loss=0.3144, attn_decoder_loss=0.2149, over 29576.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1223, cr_loss=0.364, attn_decoder_loss=0.2432, over 5807230.39 frames. ], batch size: 69, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:00:19,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=512400.0, ans=0.0 2024-09-18 20:00:20,750 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.361e+01 8.836e+01 9.387e+01 1.190e+02, threshold=1.767e+02, percent-clipped=0.0 2024-09-18 20:00:25,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=512440.0, ans=0.125 2024-09-18 20:00:27,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=512440.0, ans=0.2 2024-09-18 20:00:41,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=512440.0, ans=0.0 2024-09-18 20:01:02,529 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:01:06,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=512520.0, ans=0.0 2024-09-18 20:01:24,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=512560.0, ans=0.125 2024-09-18 20:01:29,200 INFO [train.py:1198] (1/2) Epoch 29, batch 1450, loss[loss=0.2538, ctc_loss=0.1364, cr_loss=0.4007, attn_decoder_loss=0.2579, over 29465.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1225, cr_loss=0.3644, attn_decoder_loss=0.2437, over 5803945.00 frames. ], batch size: 94, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:01:40,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=512600.0, ans=0.2 2024-09-18 20:02:09,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=512680.0, ans=0.125 2024-09-18 20:02:10,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=512680.0, ans=0.0 2024-09-18 20:02:17,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=512720.0, ans=0.125 2024-09-18 20:02:31,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=512760.0, ans=15.0 2024-09-18 20:02:38,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=512760.0, ans=0.125 2024-09-18 20:02:47,668 INFO [train.py:1198] (1/2) Epoch 29, batch 1500, loss[loss=0.243, ctc_loss=0.1213, cr_loss=0.372, attn_decoder_loss=0.2483, over 29618.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1227, cr_loss=0.3649, attn_decoder_loss=0.244, over 5806395.89 frames. ], batch size: 86, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:02:54,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.19 vs. limit=10.0 2024-09-18 20:02:58,363 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.696e+01 9.136e+01 9.651e+01 1.564e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-18 20:03:03,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-18 20:03:06,414 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:03:08,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=512840.0, ans=0.0 2024-09-18 20:03:20,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=512880.0, ans=0.125 2024-09-18 20:03:21,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=512880.0, ans=0.2 2024-09-18 20:03:26,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=512880.0, ans=0.025 2024-09-18 20:03:31,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=512880.0, ans=0.125 2024-09-18 20:03:34,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2024-09-18 20:03:58,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=512960.0, ans=0.025 2024-09-18 20:04:03,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-18 20:04:03,792 INFO [train.py:1198] (1/2) Epoch 29, batch 1550, loss[loss=0.2663, ctc_loss=0.1468, cr_loss=0.4307, attn_decoder_loss=0.27, over 29494.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1232, cr_loss=0.366, attn_decoder_loss=0.2443, over 5783638.65 frames. ], batch size: 90, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:04:03,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=513000.0, ans=0.015 2024-09-18 20:04:08,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=513000.0, ans=0.0 2024-09-18 20:04:33,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=513040.0, ans=0.125 2024-09-18 20:04:40,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=513080.0, ans=0.125 2024-09-18 20:04:42,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=513080.0, ans=0.125 2024-09-18 20:05:21,274 INFO [train.py:1198] (1/2) Epoch 29, batch 1600, loss[loss=0.2496, ctc_loss=0.1282, cr_loss=0.3836, attn_decoder_loss=0.2545, over 29666.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1232, cr_loss=0.3651, attn_decoder_loss=0.2441, over 5766603.69 frames. ], batch size: 85, lr: 3.83e-03, grad_scale: 16.0 2024-09-18 20:05:21,625 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:05:24,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=513200.0, ans=0.125 2024-09-18 20:05:25,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=513200.0, ans=0.0 2024-09-18 20:05:31,637 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.586e+01 9.089e+01 9.783e+01 2.042e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-18 20:05:59,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.99 vs. limit=15.0 2024-09-18 20:06:00,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-09-18 20:06:01,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=513280.0, ans=0.125 2024-09-18 20:06:21,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.44 vs. limit=10.0 2024-09-18 20:06:39,180 INFO [train.py:1198] (1/2) Epoch 29, batch 1650, loss[loss=0.2444, ctc_loss=0.1247, cr_loss=0.3708, attn_decoder_loss=0.2495, over 29687.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1228, cr_loss=0.3641, attn_decoder_loss=0.2437, over 5761817.36 frames. ], batch size: 89, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:07:01,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.94 vs. limit=22.5 2024-09-18 20:07:11,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=513480.0, ans=0.0 2024-09-18 20:07:14,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=513480.0, ans=10.0 2024-09-18 20:07:20,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.36 vs. limit=12.0 2024-09-18 20:07:26,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=513520.0, ans=0.0 2024-09-18 20:07:28,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=513520.0, ans=0.2 2024-09-18 20:07:28,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=513520.0, ans=0.125 2024-09-18 20:07:40,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513560.0, ans=0.1 2024-09-18 20:07:55,037 INFO [train.py:1198] (1/2) Epoch 29, batch 1700, loss[loss=0.2142, ctc_loss=0.1003, cr_loss=0.3069, attn_decoder_loss=0.22, over 29560.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1224, cr_loss=0.3636, attn_decoder_loss=0.2436, over 5781851.35 frames. ], batch size: 69, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:07:59,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=513600.0, ans=0.1 2024-09-18 20:08:07,210 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.371e+01 8.901e+01 9.499e+01 1.304e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-18 20:08:12,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2024-09-18 20:08:34,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=513680.0, ans=0.125 2024-09-18 20:08:45,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513720.0, ans=0.1 2024-09-18 20:08:47,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=513720.0, ans=0.125 2024-09-18 20:09:12,833 INFO [train.py:1198] (1/2) Epoch 29, batch 1750, loss[loss=0.2099, ctc_loss=0.09885, cr_loss=0.3092, attn_decoder_loss=0.2154, over 29405.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.122, cr_loss=0.3627, attn_decoder_loss=0.2433, over 5789966.00 frames. ], batch size: 67, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:09:26,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=513840.0, ans=0.1 2024-09-18 20:10:00,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=513920.0, ans=0.1 2024-09-18 20:10:20,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=513960.0, ans=0.125 2024-09-18 20:10:30,207 INFO [train.py:1198] (1/2) Epoch 29, batch 1800, loss[loss=0.2439, ctc_loss=0.1249, cr_loss=0.3564, attn_decoder_loss=0.2492, over 29690.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1226, cr_loss=0.3643, attn_decoder_loss=0.2437, over 5793627.95 frames. ], batch size: 83, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:10:31,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=514000.0, ans=0.0 2024-09-18 20:10:42,243 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.472e+01 8.834e+01 9.561e+01 3.303e+02, threshold=1.767e+02, percent-clipped=1.0 2024-09-18 20:10:48,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=514040.0, ans=0.125 2024-09-18 20:10:53,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=514040.0, ans=0.025 2024-09-18 20:11:00,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=514080.0, ans=0.125 2024-09-18 20:11:14,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=514120.0, ans=0.125 2024-09-18 20:11:19,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=12.0 2024-09-18 20:11:43,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.16 vs. limit=15.0 2024-09-18 20:11:46,032 INFO [train.py:1198] (1/2) Epoch 29, batch 1850, loss[loss=0.2496, ctc_loss=0.129, cr_loss=0.3815, attn_decoder_loss=0.2545, over 29623.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1225, cr_loss=0.3647, attn_decoder_loss=0.2434, over 5798305.50 frames. ], batch size: 86, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:11:46,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-09-18 20:11:49,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=514200.0, ans=0.125 2024-09-18 20:12:24,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=514280.0, ans=0.0 2024-09-18 20:12:29,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=514280.0, ans=0.2 2024-09-18 20:12:33,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=514320.0, ans=0.2 2024-09-18 20:12:35,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=514320.0, ans=0.125 2024-09-18 20:12:51,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=514360.0, ans=0.125 2024-09-18 20:12:54,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=514360.0, ans=0.2 2024-09-18 20:12:55,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.30 vs. limit=15.0 2024-09-18 20:13:03,715 INFO [train.py:1198] (1/2) Epoch 29, batch 1900, loss[loss=0.2513, ctc_loss=0.1269, cr_loss=0.3808, attn_decoder_loss=0.2566, over 29688.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1227, cr_loss=0.3652, attn_decoder_loss=0.2438, over 5805299.76 frames. ], batch size: 89, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:13:08,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=514400.0, ans=0.0 2024-09-18 20:13:15,864 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.747e+01 8.630e+01 9.084e+01 9.711e+01 2.750e+02, threshold=1.817e+02, percent-clipped=3.0 2024-09-18 20:14:01,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=514520.0, ans=0.125 2024-09-18 20:14:14,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=514560.0, ans=0.2 2024-09-18 20:14:22,092 INFO [train.py:1198] (1/2) Epoch 29, batch 1950, loss[loss=0.2386, ctc_loss=0.1265, cr_loss=0.3854, attn_decoder_loss=0.2425, over 29448.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1231, cr_loss=0.3662, attn_decoder_loss=0.2449, over 5820303.01 frames. ], batch size: 78, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:14:25,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=514600.0, ans=0.025 2024-09-18 20:14:40,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2024-09-18 20:14:59,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.35 vs. limit=10.0 2024-09-18 20:15:20,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=514760.0, ans=0.0 2024-09-18 20:15:37,425 INFO [train.py:1198] (1/2) Epoch 29, batch 2000, loss[loss=0.2155, ctc_loss=0.1117, cr_loss=0.3365, attn_decoder_loss=0.2196, over 29350.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1234, cr_loss=0.3667, attn_decoder_loss=0.245, over 5797385.27 frames. ], batch size: 67, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:15:46,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=514800.0, ans=0.0 2024-09-18 20:15:49,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.639e+01 9.197e+01 9.637e+01 2.415e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-18 20:15:53,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=514840.0, ans=0.125 2024-09-18 20:16:11,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=514880.0, ans=0.0 2024-09-18 20:16:22,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=514880.0, ans=0.125 2024-09-18 20:16:35,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2024-09-18 20:16:46,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=514960.0, ans=0.0 2024-09-18 20:16:48,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=514960.0, ans=0.2 2024-09-18 20:16:55,174 INFO [train.py:1198] (1/2) Epoch 29, batch 2050, loss[loss=0.2172, ctc_loss=0.1095, cr_loss=0.3341, attn_decoder_loss=0.2217, over 29411.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1233, cr_loss=0.3664, attn_decoder_loss=0.2445, over 5788684.50 frames. ], batch size: 70, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:17:00,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=515000.0, ans=0.035 2024-09-18 20:17:00,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=515000.0, ans=0.2 2024-09-18 20:17:04,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=515000.0, ans=0.0 2024-09-18 20:17:23,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.70 vs. limit=6.0 2024-09-18 20:17:36,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=515080.0, ans=0.125 2024-09-18 20:17:37,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=515080.0, ans=0.2 2024-09-18 20:17:46,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=515120.0, ans=0.1 2024-09-18 20:17:47,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=515120.0, ans=0.125 2024-09-18 20:17:49,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=515120.0, ans=0.025 2024-09-18 20:17:56,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=515160.0, ans=0.125 2024-09-18 20:17:58,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2024-09-18 20:18:13,431 INFO [train.py:1198] (1/2) Epoch 29, batch 2100, loss[loss=0.235, ctc_loss=0.1169, cr_loss=0.3567, attn_decoder_loss=0.2402, over 29763.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1224, cr_loss=0.3643, attn_decoder_loss=0.2435, over 5801386.76 frames. ], batch size: 81, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:18:25,547 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 8.420e+01 8.993e+01 9.361e+01 1.152e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-18 20:18:25,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=515200.0, ans=0.0 2024-09-18 20:18:28,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=515240.0, ans=0.0 2024-09-18 20:18:39,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=515240.0, ans=0.0 2024-09-18 20:18:43,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=515280.0, ans=0.0 2024-09-18 20:18:52,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=515280.0, ans=0.2 2024-09-18 20:19:00,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=515320.0, ans=0.09899494936611666 2024-09-18 20:19:28,179 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2024-09-18 20:19:28,672 INFO [train.py:1198] (1/2) Epoch 29, batch 2150, loss[loss=0.2381, ctc_loss=0.1201, cr_loss=0.3761, attn_decoder_loss=0.2429, over 29429.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1217, cr_loss=0.3628, attn_decoder_loss=0.2429, over 5815279.40 frames. ], batch size: 78, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:19:36,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=515400.0, ans=0.125 2024-09-18 20:19:36,866 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:19:36,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=515400.0, ans=0.125 2024-09-18 20:19:41,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=515400.0, ans=0.1 2024-09-18 20:19:50,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.36 vs. limit=15.0 2024-09-18 20:20:06,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.75 vs. limit=6.0 2024-09-18 20:20:35,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.82 vs. limit=10.0 2024-09-18 20:20:46,532 INFO [train.py:1198] (1/2) Epoch 29, batch 2200, loss[loss=0.2452, ctc_loss=0.1233, cr_loss=0.3677, attn_decoder_loss=0.2506, over 29621.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1225, cr_loss=0.3636, attn_decoder_loss=0.2434, over 5811791.41 frames. ], batch size: 86, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:20:52,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=515600.0, ans=0.125 2024-09-18 20:20:55,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=515600.0, ans=0.2 2024-09-18 20:20:58,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.349e+01 8.970e+01 9.403e+01 1.511e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 20:21:17,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=8.0 2024-09-18 20:21:25,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=515680.0, ans=0.125 2024-09-18 20:21:33,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=515720.0, ans=0.125 2024-09-18 20:21:41,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=515720.0, ans=0.0 2024-09-18 20:22:02,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=515800.0, ans=0.0 2024-09-18 20:22:04,183 INFO [train.py:1198] (1/2) Epoch 29, batch 2250, loss[loss=0.2512, ctc_loss=0.1267, cr_loss=0.3766, attn_decoder_loss=0.2567, over 29716.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1224, cr_loss=0.3641, attn_decoder_loss=0.2435, over 5810438.88 frames. ], batch size: 82, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:22:26,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=515840.0, ans=0.09899494936611666 2024-09-18 20:22:29,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2024-09-18 20:22:31,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=515840.0, ans=0.125 2024-09-18 20:23:19,799 INFO [train.py:1198] (1/2) Epoch 29, batch 2300, loss[loss=0.2114, ctc_loss=0.09794, cr_loss=0.3234, attn_decoder_loss=0.2168, over 29335.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1217, cr_loss=0.3629, attn_decoder_loss=0.2424, over 5798571.72 frames. ], batch size: 71, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:23:30,544 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:23:31,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.460e+01 8.964e+01 9.608e+01 5.700e+02, threshold=1.793e+02, percent-clipped=2.0 2024-09-18 20:24:04,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=516080.0, ans=0.2 2024-09-18 20:24:08,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-09-18 20:24:37,597 INFO [train.py:1198] (1/2) Epoch 29, batch 2350, loss[loss=0.2511, ctc_loss=0.1309, cr_loss=0.3785, attn_decoder_loss=0.256, over 29688.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.122, cr_loss=0.3637, attn_decoder_loss=0.2427, over 5803745.84 frames. ], batch size: 83, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:25:09,375 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:25:12,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=516280.0, ans=0.2 2024-09-18 20:25:23,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=516320.0, ans=0.125 2024-09-18 20:25:32,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=516320.0, ans=0.125 2024-09-18 20:25:55,371 INFO [train.py:1198] (1/2) Epoch 29, batch 2400, loss[loss=0.2286, ctc_loss=0.1199, cr_loss=0.3775, attn_decoder_loss=0.2323, over 29545.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1226, cr_loss=0.3647, attn_decoder_loss=0.2434, over 5807093.14 frames. ], batch size: 76, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:26:08,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.500e+01 8.937e+01 9.634e+01 2.540e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 20:26:15,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=516440.0, ans=0.0 2024-09-18 20:26:16,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516440.0, ans=0.1 2024-09-18 20:26:43,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2024-09-18 20:26:45,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=516520.0, ans=0.95 2024-09-18 20:27:11,181 INFO [train.py:1198] (1/2) Epoch 29, batch 2450, loss[loss=0.2412, ctc_loss=0.1204, cr_loss=0.3598, attn_decoder_loss=0.2467, over 29701.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1235, cr_loss=0.3662, attn_decoder_loss=0.2444, over 5784843.04 frames. ], batch size: 82, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:27:13,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.03 vs. limit=22.5 2024-09-18 20:27:20,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516600.0, ans=0.1 2024-09-18 20:27:20,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=516600.0, ans=0.0 2024-09-18 20:27:24,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=516640.0, ans=0.2 2024-09-18 20:27:36,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516640.0, ans=0.1 2024-09-18 20:27:38,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.23 vs. limit=22.5 2024-09-18 20:27:49,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=516680.0, ans=0.025 2024-09-18 20:27:49,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=516680.0, ans=0.2 2024-09-18 20:27:50,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.38 vs. limit=6.0 2024-09-18 20:27:57,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=516720.0, ans=0.0 2024-09-18 20:27:57,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-18 20:28:09,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=516720.0, ans=0.2 2024-09-18 20:28:26,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516760.0, ans=0.1 2024-09-18 20:28:29,440 INFO [train.py:1198] (1/2) Epoch 29, batch 2500, loss[loss=0.2423, ctc_loss=0.1212, cr_loss=0.3692, attn_decoder_loss=0.2475, over 29643.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1234, cr_loss=0.3663, attn_decoder_loss=0.2444, over 5794783.58 frames. ], batch size: 86, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:28:44,586 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.372e+01 8.869e+01 9.573e+01 2.936e+02, threshold=1.774e+02, percent-clipped=2.0 2024-09-18 20:29:00,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.60 vs. limit=15.0 2024-09-18 20:29:20,179 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-09-18 20:29:21,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=516920.0, ans=0.125 2024-09-18 20:29:47,333 INFO [train.py:1198] (1/2) Epoch 29, batch 2550, loss[loss=0.2115, ctc_loss=0.107, cr_loss=0.3342, attn_decoder_loss=0.2157, over 29380.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1234, cr_loss=0.3669, attn_decoder_loss=0.2443, over 5798385.67 frames. ], batch size: 67, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:30:02,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=517040.0, ans=0.0 2024-09-18 20:30:04,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=517040.0, ans=0.1 2024-09-18 20:30:07,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=517040.0, ans=0.125 2024-09-18 20:30:11,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=517040.0, ans=0.125 2024-09-18 20:30:49,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=517160.0, ans=0.125 2024-09-18 20:31:02,915 INFO [train.py:1198] (1/2) Epoch 29, batch 2600, loss[loss=0.242, ctc_loss=0.1224, cr_loss=0.3697, attn_decoder_loss=0.2471, over 29453.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1235, cr_loss=0.3667, attn_decoder_loss=0.2446, over 5793768.66 frames. ], batch size: 78, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:31:17,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.549e+01 8.951e+01 9.409e+01 2.372e+02, threshold=1.790e+02, percent-clipped=2.0 2024-09-18 20:31:18,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=517240.0, ans=0.0 2024-09-18 20:31:33,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=517280.0, ans=0.0 2024-09-18 20:31:36,866 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:31:47,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=517280.0, ans=0.0 2024-09-18 20:31:48,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.77 vs. limit=12.0 2024-09-18 20:32:07,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=517360.0, ans=0.0 2024-09-18 20:32:20,505 INFO [train.py:1198] (1/2) Epoch 29, batch 2650, loss[loss=0.2453, ctc_loss=0.1286, cr_loss=0.3735, attn_decoder_loss=0.2499, over 29210.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1234, cr_loss=0.3664, attn_decoder_loss=0.2447, over 5800481.06 frames. ], batch size: 100, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:32:35,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=517440.0, ans=0.125 2024-09-18 20:32:37,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-09-18 20:32:55,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=517480.0, ans=0.125 2024-09-18 20:33:01,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=517480.0, ans=0.05 2024-09-18 20:33:09,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=517520.0, ans=0.2 2024-09-18 20:33:15,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=517520.0, ans=0.125 2024-09-18 20:33:15,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=517520.0, ans=0.125 2024-09-18 20:33:38,534 INFO [train.py:1198] (1/2) Epoch 29, batch 2700, loss[loss=0.235, ctc_loss=0.1085, cr_loss=0.3277, attn_decoder_loss=0.2417, over 29534.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1237, cr_loss=0.3669, attn_decoder_loss=0.2448, over 5796305.23 frames. ], batch size: 87, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:33:41,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=517600.0, ans=0.125 2024-09-18 20:33:47,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=517600.0, ans=0.125 2024-09-18 20:33:53,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.653e+01 9.179e+01 9.808e+01 2.021e+02, threshold=1.836e+02, percent-clipped=2.0 2024-09-18 20:34:21,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=517680.0, ans=0.125 2024-09-18 20:34:23,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.87 vs. limit=22.5 2024-09-18 20:34:33,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=517720.0, ans=0.125 2024-09-18 20:34:54,588 INFO [train.py:1198] (1/2) Epoch 29, batch 2750, loss[loss=0.2164, ctc_loss=0.1056, cr_loss=0.3426, attn_decoder_loss=0.2211, over 29510.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1227, cr_loss=0.365, attn_decoder_loss=0.2435, over 5795661.01 frames. ], batch size: 75, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:35:13,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=517840.0, ans=0.125 2024-09-18 20:35:24,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.75 vs. limit=22.5 2024-09-18 20:35:27,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=517880.0, ans=0.125 2024-09-18 20:35:28,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517880.0, ans=0.1 2024-09-18 20:35:29,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.20 vs. limit=15.0 2024-09-18 20:35:43,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-09-18 20:36:08,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=517960.0, ans=0.125 2024-09-18 20:36:11,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=518000.0, ans=0.125 2024-09-18 20:36:12,343 INFO [train.py:1198] (1/2) Epoch 29, batch 2800, loss[loss=0.2482, ctc_loss=0.1429, cr_loss=0.3638, attn_decoder_loss=0.2519, over 20029.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1228, cr_loss=0.3658, attn_decoder_loss=0.2435, over 5776812.59 frames. ], batch size: 210, lr: 3.81e-03, grad_scale: 16.0 2024-09-18 20:36:28,867 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.371e+01 8.942e+01 9.579e+01 2.215e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 20:36:37,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.46 vs. limit=22.5 2024-09-18 20:37:05,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=518120.0, ans=0.125 2024-09-18 20:37:05,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=518120.0, ans=0.025 2024-09-18 20:37:10,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=518120.0, ans=0.05 2024-09-18 20:37:30,067 INFO [train.py:1198] (1/2) Epoch 29, batch 2850, loss[loss=0.221, ctc_loss=0.1073, cr_loss=0.3351, attn_decoder_loss=0.2262, over 29515.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1231, cr_loss=0.3658, attn_decoder_loss=0.2438, over 5761879.30 frames. ], batch size: 77, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:37:51,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=518240.0, ans=0.125 2024-09-18 20:37:56,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=518240.0, ans=0.125 2024-09-18 20:38:23,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=518320.0, ans=0.5 2024-09-18 20:38:32,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=518360.0, ans=0.0 2024-09-18 20:38:35,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=518360.0, ans=0.5 2024-09-18 20:38:46,510 INFO [train.py:1198] (1/2) Epoch 29, batch 2900, loss[loss=0.2386, ctc_loss=0.1213, cr_loss=0.3573, attn_decoder_loss=0.2437, over 29404.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1237, cr_loss=0.3676, attn_decoder_loss=0.245, over 5787573.80 frames. ], batch size: 79, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:38:55,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=518400.0, ans=0.2 2024-09-18 20:39:05,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.499e+01 8.947e+01 9.458e+01 2.522e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-18 20:39:37,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=518520.0, ans=0.125 2024-09-18 20:40:04,382 INFO [train.py:1198] (1/2) Epoch 29, batch 2950, loss[loss=0.23, ctc_loss=0.1162, cr_loss=0.3622, attn_decoder_loss=0.2346, over 29528.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1229, cr_loss=0.3658, attn_decoder_loss=0.2437, over 5783949.43 frames. ], batch size: 75, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:40:16,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=518600.0, ans=0.0 2024-09-18 20:40:22,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=518640.0, ans=0.125 2024-09-18 20:40:29,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.49 vs. limit=15.0 2024-09-18 20:40:32,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=518640.0, ans=0.125 2024-09-18 20:40:45,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=518680.0, ans=0.125 2024-09-18 20:40:47,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=518680.0, ans=0.2 2024-09-18 20:40:59,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=22.5 2024-09-18 20:41:01,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518720.0, ans=0.1 2024-09-18 20:41:19,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=518800.0, ans=0.125 2024-09-18 20:41:22,997 INFO [train.py:1198] (1/2) Epoch 29, batch 3000, loss[loss=0.2419, ctc_loss=0.1248, cr_loss=0.3759, attn_decoder_loss=0.2465, over 29769.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1225, cr_loss=0.3645, attn_decoder_loss=0.2436, over 5782300.45 frames. ], batch size: 81, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:41:22,997 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 20:41:34,884 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.6141, 3.6614, 3.4227, 3.1690, 3.7522, 3.1724, 3.7708, 2.8082], device='cuda:1') 2024-09-18 20:41:41,476 INFO [train.py:1230] (1/2) Epoch 29, validation: loss=0.2115, ctc_loss=0.03752, cr_loss=5.604e-15, attn_decoder_loss=0.2309, over 944034.00 frames. 2024-09-18 20:41:41,476 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 20:41:58,310 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.694e+01 9.323e+01 9.820e+01 2.000e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-18 20:41:59,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.72 vs. limit=15.0 2024-09-18 20:42:09,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=518840.0, ans=0.125 2024-09-18 20:42:13,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=518880.0, ans=0.125 2024-09-18 20:42:24,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=518880.0, ans=0.125 2024-09-18 20:42:37,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=518920.0, ans=0.125 2024-09-18 20:42:59,583 INFO [train.py:1198] (1/2) Epoch 29, batch 3050, loss[loss=0.2288, ctc_loss=0.1174, cr_loss=0.364, attn_decoder_loss=0.2331, over 29503.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1231, cr_loss=0.3659, attn_decoder_loss=0.2443, over 5777107.52 frames. ], batch size: 76, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:43:28,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=519080.0, ans=0.125 2024-09-18 20:43:34,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=519080.0, ans=0.125 2024-09-18 20:43:57,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-18 20:43:58,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=519160.0, ans=0.0 2024-09-18 20:44:02,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.62 vs. limit=12.0 2024-09-18 20:44:03,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=519160.0, ans=0.025 2024-09-18 20:44:09,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=519160.0, ans=0.125 2024-09-18 20:44:15,289 INFO [train.py:1198] (1/2) Epoch 29, batch 3100, loss[loss=0.2548, ctc_loss=0.1325, cr_loss=0.3755, attn_decoder_loss=0.2601, over 29220.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1231, cr_loss=0.3658, attn_decoder_loss=0.244, over 5776497.20 frames. ], batch size: 100, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:44:26,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=519200.0, ans=0.125 2024-09-18 20:44:31,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.578e+01 9.222e+01 9.783e+01 2.939e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-18 20:45:02,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=519320.0, ans=0.125 2024-09-18 20:45:29,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=519360.0, ans=0.125 2024-09-18 20:45:33,295 INFO [train.py:1198] (1/2) Epoch 29, batch 3150, loss[loss=0.2471, ctc_loss=0.1285, cr_loss=0.3759, attn_decoder_loss=0.252, over 28852.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1227, cr_loss=0.3653, attn_decoder_loss=0.2437, over 5781857.37 frames. ], batch size: 104, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:45:44,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.98 vs. limit=10.0 2024-09-18 20:45:56,381 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:46:00,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=519440.0, ans=0.1 2024-09-18 20:46:24,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=519520.0, ans=0.2 2024-09-18 20:46:32,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519560.0, ans=0.1 2024-09-18 20:46:44,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=12.0 2024-09-18 20:46:50,882 INFO [train.py:1198] (1/2) Epoch 29, batch 3200, loss[loss=0.2367, ctc_loss=0.1191, cr_loss=0.3546, attn_decoder_loss=0.2418, over 29399.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1225, cr_loss=0.3649, attn_decoder_loss=0.2432, over 5791811.39 frames. ], batch size: 79, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:47:07,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.418e+01 8.919e+01 9.479e+01 2.582e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-18 20:47:25,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.09 vs. limit=15.0 2024-09-18 20:48:07,105 INFO [train.py:1198] (1/2) Epoch 29, batch 3250, loss[loss=0.2459, ctc_loss=0.126, cr_loss=0.3699, attn_decoder_loss=0.251, over 29702.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1224, cr_loss=0.3651, attn_decoder_loss=0.2436, over 5798110.04 frames. ], batch size: 84, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:48:57,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2024-09-18 20:49:25,273 INFO [train.py:1198] (1/2) Epoch 29, batch 3300, loss[loss=0.2513, ctc_loss=0.1279, cr_loss=0.3716, attn_decoder_loss=0.2568, over 28331.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1218, cr_loss=0.3639, attn_decoder_loss=0.2427, over 5796286.30 frames. ], batch size: 111, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:49:34,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=520000.0, ans=0.125 2024-09-18 20:49:42,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.577e+01 8.993e+01 9.559e+01 2.414e+02, threshold=1.799e+02, percent-clipped=3.0 2024-09-18 20:49:59,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=520080.0, ans=0.0 2024-09-18 20:50:02,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=520080.0, ans=0.125 2024-09-18 20:50:03,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=520080.0, ans=0.025 2024-09-18 20:50:15,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=520120.0, ans=0.0 2024-09-18 20:50:21,749 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:50:34,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=520160.0, ans=0.025 2024-09-18 20:50:43,269 INFO [train.py:1198] (1/2) Epoch 29, batch 3350, loss[loss=0.2495, ctc_loss=0.127, cr_loss=0.3745, attn_decoder_loss=0.2548, over 28872.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1227, cr_loss=0.3651, attn_decoder_loss=0.2434, over 5773273.16 frames. ], batch size: 104, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:50:51,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=520200.0, ans=0.125 2024-09-18 20:51:01,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=520240.0, ans=0.0 2024-09-18 20:51:05,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2024-09-18 20:51:26,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=520280.0, ans=0.125 2024-09-18 20:51:27,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=520320.0, ans=0.0 2024-09-18 20:51:35,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=520320.0, ans=0.125 2024-09-18 20:51:38,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=520320.0, ans=0.125 2024-09-18 20:51:44,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-09-18 20:51:45,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=520360.0, ans=0.125 2024-09-18 20:51:50,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2024-09-18 20:51:51,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=520360.0, ans=0.0 2024-09-18 20:51:54,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=520360.0, ans=0.125 2024-09-18 20:51:59,172 INFO [train.py:1198] (1/2) Epoch 29, batch 3400, loss[loss=0.2139, ctc_loss=0.1053, cr_loss=0.3304, attn_decoder_loss=0.2186, over 29342.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1231, cr_loss=0.366, attn_decoder_loss=0.2438, over 5766801.83 frames. ], batch size: 67, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:52:00,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=520400.0, ans=0.125 2024-09-18 20:52:08,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520400.0, ans=0.1 2024-09-18 20:52:15,975 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.635e+01 9.188e+01 1.005e+02 1.629e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-18 20:52:25,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=520440.0, ans=0.0 2024-09-18 20:52:54,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=520520.0, ans=0.125 2024-09-18 20:52:57,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2024-09-18 20:53:08,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=520560.0, ans=0.125 2024-09-18 20:53:16,960 INFO [train.py:1198] (1/2) Epoch 29, batch 3450, loss[loss=0.2514, ctc_loss=0.1266, cr_loss=0.3824, attn_decoder_loss=0.2568, over 28192.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1231, cr_loss=0.3669, attn_decoder_loss=0.2441, over 5774304.51 frames. ], batch size: 111, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 20:53:34,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520640.0, ans=0.1 2024-09-18 20:53:50,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=520680.0, ans=0.125 2024-09-18 20:53:58,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=520680.0, ans=0.0 2024-09-18 20:54:01,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=520720.0, ans=0.125 2024-09-18 20:54:05,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=520720.0, ans=0.0 2024-09-18 20:54:10,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=520720.0, ans=0.125 2024-09-18 20:54:35,153 INFO [train.py:1198] (1/2) Epoch 29, batch 3500, loss[loss=0.2149, ctc_loss=0.1095, cr_loss=0.3385, attn_decoder_loss=0.2191, over 29334.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1227, cr_loss=0.3656, attn_decoder_loss=0.2436, over 5777093.72 frames. ], batch size: 71, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 20:54:35,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=520800.0, ans=0.025 2024-09-18 20:54:35,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-18 20:54:39,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=520800.0, ans=0.0 2024-09-18 20:54:53,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.236e+01 8.769e+01 9.566e+01 1.320e+02, threshold=1.754e+02, percent-clipped=0.0 2024-09-18 20:54:54,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-09-18 20:54:58,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=520840.0, ans=0.025 2024-09-18 20:55:01,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520840.0, ans=0.1 2024-09-18 20:55:07,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=520880.0, ans=0.125 2024-09-18 20:55:24,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=520920.0, ans=0.125 2024-09-18 20:55:26,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=520920.0, ans=0.125 2024-09-18 20:55:31,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2024-09-18 20:55:34,035 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:55:50,176 INFO [train.py:1198] (1/2) Epoch 29, batch 3550, loss[loss=0.2398, ctc_loss=0.1186, cr_loss=0.3426, attn_decoder_loss=0.2457, over 29690.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1227, cr_loss=0.3654, attn_decoder_loss=0.2436, over 5784555.59 frames. ], batch size: 89, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 20:55:53,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=521000.0, ans=0.125 2024-09-18 20:55:54,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521000.0, ans=0.1 2024-09-18 20:55:58,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-09-18 20:56:27,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=521080.0, ans=0.125 2024-09-18 20:56:32,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=521080.0, ans=0.125 2024-09-18 20:56:35,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=521120.0, ans=0.1 2024-09-18 20:56:45,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-09-18 20:56:54,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=521160.0, ans=0.125 2024-09-18 20:57:00,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2024-09-18 20:57:04,190 INFO [train.py:1198] (1/2) Epoch 29, batch 3600, loss[loss=0.2274, ctc_loss=0.1099, cr_loss=0.3308, attn_decoder_loss=0.2331, over 29498.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1226, cr_loss=0.3653, attn_decoder_loss=0.2436, over 5792569.74 frames. ], batch size: 77, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:57:06,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=521200.0, ans=0.025 2024-09-18 20:57:08,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.49 vs. limit=10.0 2024-09-18 20:57:12,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=521200.0, ans=0.0 2024-09-18 20:57:19,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=521240.0, ans=0.0 2024-09-18 20:57:22,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 8.552e+01 8.984e+01 9.474e+01 4.897e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 20:57:35,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521280.0, ans=0.125 2024-09-18 20:57:36,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-18 20:57:40,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=521280.0, ans=0.125 2024-09-18 20:57:47,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=521320.0, ans=0.0 2024-09-18 20:57:47,830 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:57:48,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=521320.0, ans=0.125 2024-09-18 20:58:05,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=521360.0, ans=0.0 2024-09-18 20:58:07,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-09-18 20:58:07,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2024-09-18 20:58:18,781 INFO [train.py:1198] (1/2) Epoch 29, batch 3650, loss[loss=0.2501, ctc_loss=0.1369, cr_loss=0.4036, attn_decoder_loss=0.2537, over 29498.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1222, cr_loss=0.3643, attn_decoder_loss=0.2432, over 5793761.72 frames. ], batch size: 90, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:58:23,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=521400.0, ans=0.125 2024-09-18 20:58:25,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=521400.0, ans=0.0 2024-09-18 20:58:25,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=521400.0, ans=0.0 2024-09-18 20:58:25,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.67 vs. limit=15.0 2024-09-18 20:58:51,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=521480.0, ans=0.125 2024-09-18 20:59:03,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.55 vs. limit=22.5 2024-09-18 20:59:05,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=521520.0, ans=0.0 2024-09-18 20:59:10,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=521520.0, ans=0.125 2024-09-18 20:59:36,072 INFO [train.py:1198] (1/2) Epoch 29, batch 3700, loss[loss=0.2372, ctc_loss=0.1128, cr_loss=0.351, attn_decoder_loss=0.2432, over 29692.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.122, cr_loss=0.3644, attn_decoder_loss=0.2433, over 5803990.18 frames. ], batch size: 84, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 20:59:48,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-09-18 20:59:55,283 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.397e+01 8.875e+01 9.405e+01 1.712e+02, threshold=1.775e+02, percent-clipped=0.0 2024-09-18 21:00:22,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=521720.0, ans=0.0 2024-09-18 21:00:29,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=521720.0, ans=0.125 2024-09-18 21:00:37,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=521760.0, ans=0.1 2024-09-18 21:00:38,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=521760.0, ans=0.125 2024-09-18 21:00:43,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=521760.0, ans=0.2 2024-09-18 21:00:46,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521760.0, ans=0.1 2024-09-18 21:00:49,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521760.0, ans=0.125 2024-09-18 21:00:50,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=521800.0, ans=0.0 2024-09-18 21:00:51,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=22.5 2024-09-18 21:00:52,041 INFO [train.py:1198] (1/2) Epoch 29, batch 3750, loss[loss=0.2119, ctc_loss=0.1011, cr_loss=0.3215, attn_decoder_loss=0.217, over 29318.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1222, cr_loss=0.3649, attn_decoder_loss=0.2433, over 5807842.76 frames. ], batch size: 67, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 21:01:08,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=521840.0, ans=0.0 2024-09-18 21:01:37,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=521920.0, ans=0.125 2024-09-18 21:01:49,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=521920.0, ans=0.125 2024-09-18 21:01:58,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=521960.0, ans=0.125 2024-09-18 21:02:05,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=522000.0, ans=0.125 2024-09-18 21:02:06,642 INFO [train.py:1198] (1/2) Epoch 29, batch 3800, loss[loss=0.2499, ctc_loss=0.1242, cr_loss=0.3657, attn_decoder_loss=0.2557, over 29651.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1219, cr_loss=0.3641, attn_decoder_loss=0.2429, over 5799208.46 frames. ], batch size: 86, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 21:02:14,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=522000.0, ans=0.025 2024-09-18 21:02:18,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=522000.0, ans=0.125 2024-09-18 21:02:25,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.472e+01 8.859e+01 9.703e+01 1.383e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-18 21:02:33,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=522040.0, ans=0.1 2024-09-18 21:02:40,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=12.0 2024-09-18 21:02:45,754 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:02:50,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=522120.0, ans=0.2 2024-09-18 21:02:54,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=522120.0, ans=0.1 2024-09-18 21:03:01,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=522120.0, ans=0.2 2024-09-18 21:03:20,793 INFO [train.py:1198] (1/2) Epoch 29, batch 3850, loss[loss=0.2517, ctc_loss=0.1322, cr_loss=0.3854, attn_decoder_loss=0.2565, over 29249.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1219, cr_loss=0.3642, attn_decoder_loss=0.2428, over 5812424.21 frames. ], batch size: 100, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 21:03:52,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=522280.0, ans=0.0 2024-09-18 21:03:53,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=522280.0, ans=0.0 2024-09-18 21:03:56,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-09-18 21:04:19,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=522360.0, ans=0.1 2024-09-18 21:04:37,290 INFO [train.py:1198] (1/2) Epoch 29, batch 3900, loss[loss=0.2449, ctc_loss=0.1179, cr_loss=0.3476, attn_decoder_loss=0.2513, over 29625.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1223, cr_loss=0.3645, attn_decoder_loss=0.2434, over 5816619.43 frames. ], batch size: 86, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:04:42,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=522400.0, ans=0.2 2024-09-18 21:04:50,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=522440.0, ans=0.07 2024-09-18 21:04:56,262 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.663e+01 9.050e+01 9.576e+01 3.697e+02, threshold=1.810e+02, percent-clipped=2.0 2024-09-18 21:04:56,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=522440.0, ans=0.125 2024-09-18 21:04:58,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=522440.0, ans=0.0 2024-09-18 21:05:04,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-09-18 21:05:06,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=522480.0, ans=0.0 2024-09-18 21:05:18,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=522480.0, ans=0.2 2024-09-18 21:05:42,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=522560.0, ans=0.2 2024-09-18 21:05:52,782 INFO [train.py:1198] (1/2) Epoch 29, batch 3950, loss[loss=0.2587, ctc_loss=0.1389, cr_loss=0.398, attn_decoder_loss=0.2631, over 29491.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1227, cr_loss=0.3658, attn_decoder_loss=0.2438, over 5836097.34 frames. ], batch size: 97, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:06:02,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=522600.0, ans=0.125 2024-09-18 21:06:04,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=522600.0, ans=0.125 2024-09-18 21:06:07,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=8.0 2024-09-18 21:06:07,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=522640.0, ans=0.0 2024-09-18 21:06:15,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=522640.0, ans=0.125 2024-09-18 21:06:19,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=522640.0, ans=0.125 2024-09-18 21:06:30,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.54 vs. limit=22.5 2024-09-18 21:06:33,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-18 21:06:37,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=522720.0, ans=0.1 2024-09-18 21:06:41,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=522720.0, ans=0.5 2024-09-18 21:07:00,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.40 vs. limit=22.5 2024-09-18 21:07:06,729 INFO [train.py:1198] (1/2) Epoch 29, batch 4000, loss[loss=0.2171, ctc_loss=0.0954, cr_loss=0.3056, attn_decoder_loss=0.2238, over 29513.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1229, cr_loss=0.3658, attn_decoder_loss=0.244, over 5813547.02 frames. ], batch size: 74, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:07:11,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=15.0 2024-09-18 21:07:14,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=522800.0, ans=0.125 2024-09-18 21:07:25,858 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.544e+01 9.038e+01 9.843e+01 4.905e+02, threshold=1.808e+02, percent-clipped=2.0 2024-09-18 21:07:34,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.90 vs. limit=15.0 2024-09-18 21:07:59,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=522920.0, ans=0.0 2024-09-18 21:08:05,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=522960.0, ans=0.125 2024-09-18 21:08:10,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-09-18 21:08:16,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=522960.0, ans=0.0 2024-09-18 21:08:21,176 INFO [train.py:1198] (1/2) Epoch 29, batch 4050, loss[loss=0.2641, ctc_loss=0.1535, cr_loss=0.3875, attn_decoder_loss=0.2677, over 20721.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1229, cr_loss=0.3657, attn_decoder_loss=0.2438, over 5796716.83 frames. ], batch size: 209, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:08:21,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=523000.0, ans=0.1 2024-09-18 21:08:21,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=523000.0, ans=0.125 2024-09-18 21:08:43,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=523040.0, ans=0.1 2024-09-18 21:08:48,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=523080.0, ans=0.125 2024-09-18 21:08:50,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=523080.0, ans=0.125 2024-09-18 21:08:50,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=523080.0, ans=0.07 2024-09-18 21:09:05,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=523120.0, ans=0.125 2024-09-18 21:09:18,481 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:09:27,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=523160.0, ans=0.95 2024-09-18 21:09:30,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=523160.0, ans=0.125 2024-09-18 21:09:36,357 INFO [train.py:1198] (1/2) Epoch 29, batch 4100, loss[loss=0.2576, ctc_loss=0.1415, cr_loss=0.4181, attn_decoder_loss=0.2613, over 29520.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1233, cr_loss=0.3661, attn_decoder_loss=0.2441, over 5791037.28 frames. ], batch size: 90, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:09:45,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=523200.0, ans=0.125 2024-09-18 21:09:47,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=523200.0, ans=0.0 2024-09-18 21:09:56,914 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.631e+01 9.111e+01 9.616e+01 2.001e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 21:10:00,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=523240.0, ans=0.0 2024-09-18 21:10:01,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523240.0, ans=0.1 2024-09-18 21:10:21,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-09-18 21:10:21,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=523320.0, ans=0.025 2024-09-18 21:10:31,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-18 21:10:43,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=523360.0, ans=0.125 2024-09-18 21:10:51,014 INFO [train.py:1198] (1/2) Epoch 29, batch 4150, loss[loss=0.2332, ctc_loss=0.1188, cr_loss=0.36, attn_decoder_loss=0.238, over 29504.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1229, cr_loss=0.3656, attn_decoder_loss=0.2435, over 5797260.69 frames. ], batch size: 77, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:11:06,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-09-18 21:11:19,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.21 vs. limit=15.0 2024-09-18 21:11:31,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523480.0, ans=0.1 2024-09-18 21:11:33,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=12.0 2024-09-18 21:11:44,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-09-18 21:11:48,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=523560.0, ans=0.07 2024-09-18 21:11:50,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=523560.0, ans=0.0 2024-09-18 21:11:59,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=523560.0, ans=0.2 2024-09-18 21:12:04,658 INFO [train.py:1198] (1/2) Epoch 29, batch 4200, loss[loss=0.2502, ctc_loss=0.1336, cr_loss=0.4008, attn_decoder_loss=0.2543, over 29490.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1232, cr_loss=0.3668, attn_decoder_loss=0.244, over 5799243.91 frames. ], batch size: 90, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:12:04,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=523600.0, ans=0.125 2024-09-18 21:12:09,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=523600.0, ans=0.09899494936611666 2024-09-18 21:12:25,444 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.488e+01 8.959e+01 9.406e+01 1.586e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-18 21:12:40,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=523680.0, ans=0.125 2024-09-18 21:13:03,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=523760.0, ans=0.1 2024-09-18 21:13:19,316 INFO [train.py:1198] (1/2) Epoch 29, batch 4250, loss[loss=0.2184, ctc_loss=0.1024, cr_loss=0.3201, attn_decoder_loss=0.2242, over 29526.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1228, cr_loss=0.3657, attn_decoder_loss=0.244, over 5805434.25 frames. ], batch size: 74, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:13:25,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.55 vs. limit=10.0 2024-09-18 21:13:34,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2024-09-18 21:13:38,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2024-09-18 21:13:54,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=523880.0, ans=0.125 2024-09-18 21:14:06,148 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:14:12,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-18 21:14:23,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=523960.0, ans=0.0 2024-09-18 21:14:33,932 INFO [train.py:1198] (1/2) Epoch 29, batch 4300, loss[loss=0.237, ctc_loss=0.1177, cr_loss=0.3507, attn_decoder_loss=0.2424, over 29518.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1224, cr_loss=0.365, attn_decoder_loss=0.244, over 5794558.48 frames. ], batch size: 87, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:14:49,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=524040.0, ans=0.125 2024-09-18 21:14:54,741 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.566e+01 9.093e+01 9.563e+01 1.622e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-18 21:15:00,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524040.0, ans=0.1 2024-09-18 21:15:09,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=524080.0, ans=0.125 2024-09-18 21:15:11,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=524080.0, ans=0.0 2024-09-18 21:15:20,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=524120.0, ans=0.2 2024-09-18 21:15:23,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=524120.0, ans=0.125 2024-09-18 21:15:26,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=524120.0, ans=0.0 2024-09-18 21:15:48,232 INFO [train.py:1198] (1/2) Epoch 29, batch 4350, loss[loss=0.254, ctc_loss=0.1308, cr_loss=0.3824, attn_decoder_loss=0.2591, over 29446.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1253, cr_loss=0.3706, attn_decoder_loss=0.2472, over 5796462.84 frames. ], batch size: 97, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:15:53,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=524200.0, ans=0.125 2024-09-18 21:16:13,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=524240.0, ans=0.125 2024-09-18 21:16:32,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-18 21:16:46,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=524360.0, ans=0.0 2024-09-18 21:16:52,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=524360.0, ans=0.2 2024-09-18 21:17:02,836 INFO [train.py:1198] (1/2) Epoch 29, batch 4400, loss[loss=0.2588, ctc_loss=0.1443, cr_loss=0.3954, attn_decoder_loss=0.2627, over 27285.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.127, cr_loss=0.3742, attn_decoder_loss=0.2496, over 5767331.29 frames. ], batch size: 124, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:17:06,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2024-09-18 21:17:14,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=524400.0, ans=0.0 2024-09-18 21:17:23,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.178e+01 8.977e+01 9.367e+01 9.862e+01 3.705e+02, threshold=1.873e+02, percent-clipped=1.0 2024-09-18 21:17:31,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=524480.0, ans=0.125 2024-09-18 21:17:59,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=524520.0, ans=0.2 2024-09-18 21:17:59,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524520.0, ans=0.1 2024-09-18 21:18:12,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524560.0, ans=0.1 2024-09-18 21:18:16,792 INFO [train.py:1198] (1/2) Epoch 29, batch 4450, loss[loss=0.2625, ctc_loss=0.1587, cr_loss=0.4126, attn_decoder_loss=0.2649, over 20399.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1307, cr_loss=0.3786, attn_decoder_loss=0.2519, over 5574456.43 frames. ], batch size: 209, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:18:18,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=524600.0, ans=0.025 2024-09-18 21:18:20,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=524600.0, ans=0.125 2024-09-18 21:18:30,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=524600.0, ans=0.125 2024-09-18 21:18:31,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524640.0, ans=0.1 2024-09-18 21:18:47,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.32 vs. limit=22.5 2024-09-18 21:19:10,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=524720.0, ans=0.07 2024-09-18 21:19:12,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=524720.0, ans=0.125 2024-09-18 21:19:18,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2024-09-18 21:19:28,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=524760.0, ans=0.125 2024-09-18 21:19:33,077 INFO [train.py:1198] (1/2) Epoch 29, batch 4500, loss[loss=0.2547, ctc_loss=0.1384, cr_loss=0.3638, attn_decoder_loss=0.2595, over 20395.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1342, cr_loss=0.3812, attn_decoder_loss=0.2539, over 5235346.35 frames. ], batch size: 209, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:19:55,853 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.478e+01 1.036e+02 1.116e+02 1.208e+02 3.141e+02, threshold=2.233e+02, percent-clipped=1.0 2024-09-18 21:21:05,422 INFO [train.py:1198] (1/2) Epoch 30, batch 0, loss[loss=0.2251, ctc_loss=0.1143, cr_loss=0.352, attn_decoder_loss=0.2296, over 29627.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1143, cr_loss=0.352, attn_decoder_loss=0.2296, over 29627.00 frames. ], batch size: 73, lr: 3.72e-03, grad_scale: 16.0 2024-09-18 21:21:05,422 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 21:21:23,762 INFO [train.py:1230] (1/2) Epoch 30, validation: loss=0.2119, ctc_loss=0.03754, cr_loss=5.775e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 21:21:23,762 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 21:21:27,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=524900.0, ans=0.2 2024-09-18 21:21:53,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2024-09-18 21:21:55,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=524980.0, ans=0.05 2024-09-18 21:22:10,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=525020.0, ans=0.0 2024-09-18 21:22:12,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=525020.0, ans=0.125 2024-09-18 21:22:17,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.90 vs. limit=15.0 2024-09-18 21:22:37,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=525060.0, ans=0.125 2024-09-18 21:22:40,141 INFO [train.py:1198] (1/2) Epoch 30, batch 50, loss[loss=0.2195, ctc_loss=0.1123, cr_loss=0.3279, attn_decoder_loss=0.2241, over 29436.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1246, cr_loss=0.3698, attn_decoder_loss=0.2453, over 1267680.18 frames. ], batch size: 70, lr: 3.72e-03, grad_scale: 16.0 2024-09-18 21:22:42,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.44 vs. limit=15.0 2024-09-18 21:22:42,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-18 21:22:49,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=525100.0, ans=0.125 2024-09-18 21:22:57,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.60 vs. limit=22.5 2024-09-18 21:23:36,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=525220.0, ans=0.0 2024-09-18 21:23:39,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=525260.0, ans=0.125 2024-09-18 21:23:42,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 8.848e+01 9.545e+01 1.010e+02 1.497e+02, threshold=1.909e+02, percent-clipped=0.0 2024-09-18 21:23:54,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=525300.0, ans=0.0 2024-09-18 21:23:56,174 INFO [train.py:1198] (1/2) Epoch 30, batch 100, loss[loss=0.2284, ctc_loss=0.1143, cr_loss=0.348, attn_decoder_loss=0.2334, over 29516.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.125, cr_loss=0.3688, attn_decoder_loss=0.2463, over 2251336.72 frames. ], batch size: 76, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:24:09,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=525340.0, ans=0.125 2024-09-18 21:24:14,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=525340.0, ans=0.07 2024-09-18 21:24:21,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=525340.0, ans=0.125 2024-09-18 21:24:29,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=525380.0, ans=0.0 2024-09-18 21:24:42,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=525420.0, ans=0.125 2024-09-18 21:24:42,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=525420.0, ans=0.09899494936611666 2024-09-18 21:24:44,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-09-18 21:24:49,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=525420.0, ans=0.125 2024-09-18 21:24:49,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=525420.0, ans=0.0 2024-09-18 21:24:51,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2024-09-18 21:25:13,043 INFO [train.py:1198] (1/2) Epoch 30, batch 150, loss[loss=0.2232, ctc_loss=0.1089, cr_loss=0.3545, attn_decoder_loss=0.228, over 29424.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.122, cr_loss=0.3635, attn_decoder_loss=0.2437, over 3046267.96 frames. ], batch size: 70, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:25:22,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=525500.0, ans=0.0 2024-09-18 21:26:02,905 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-18 21:26:06,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.62 vs. limit=15.0 2024-09-18 21:26:10,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=525620.0, ans=0.95 2024-09-18 21:26:17,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.425e+01 8.976e+01 9.725e+01 1.408e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-18 21:26:23,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525660.0, ans=0.1 2024-09-18 21:26:28,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=525660.0, ans=0.95 2024-09-18 21:26:28,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.67 vs. limit=15.0 2024-09-18 21:26:30,927 INFO [train.py:1198] (1/2) Epoch 30, batch 200, loss[loss=0.2491, ctc_loss=0.1255, cr_loss=0.3661, attn_decoder_loss=0.2547, over 27491.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1213, cr_loss=0.3628, attn_decoder_loss=0.243, over 3657753.43 frames. ], batch size: 125, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:26:31,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=525700.0, ans=0.025 2024-09-18 21:26:58,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=525740.0, ans=0.0 2024-09-18 21:27:01,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=525780.0, ans=0.2 2024-09-18 21:27:07,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-09-18 21:27:16,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=525820.0, ans=0.125 2024-09-18 21:27:19,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=525820.0, ans=0.2 2024-09-18 21:27:41,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=525860.0, ans=0.125 2024-09-18 21:27:46,550 INFO [train.py:1198] (1/2) Epoch 30, batch 250, loss[loss=0.2518, ctc_loss=0.1305, cr_loss=0.3897, attn_decoder_loss=0.2566, over 29233.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1212, cr_loss=0.363, attn_decoder_loss=0.2428, over 4138179.76 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:27:54,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=525900.0, ans=0.1 2024-09-18 21:28:02,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-09-18 21:28:16,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2024-09-18 21:28:17,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-18 21:28:24,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=525980.0, ans=0.0 2024-09-18 21:28:29,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=525980.0, ans=0.0 2024-09-18 21:28:50,799 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.439e+01 8.914e+01 9.350e+01 1.362e+02, threshold=1.783e+02, percent-clipped=0.0 2024-09-18 21:28:57,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=526060.0, ans=0.125 2024-09-18 21:29:04,750 INFO [train.py:1198] (1/2) Epoch 30, batch 300, loss[loss=0.2538, ctc_loss=0.1315, cr_loss=0.3778, attn_decoder_loss=0.259, over 29553.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1211, cr_loss=0.3628, attn_decoder_loss=0.2425, over 4507189.15 frames. ], batch size: 92, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:29:24,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=526140.0, ans=0.125 2024-09-18 21:29:39,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=526180.0, ans=0.0 2024-09-18 21:29:54,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526220.0, ans=0.1 2024-09-18 21:30:00,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=526220.0, ans=0.125 2024-09-18 21:30:15,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=526260.0, ans=0.125 2024-09-18 21:30:22,667 INFO [train.py:1198] (1/2) Epoch 30, batch 350, loss[loss=0.2175, ctc_loss=0.1158, cr_loss=0.3606, attn_decoder_loss=0.2208, over 29349.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1217, cr_loss=0.3643, attn_decoder_loss=0.2432, over 4794330.41 frames. ], batch size: 71, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:30:58,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-09-18 21:30:59,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=526380.0, ans=0.125 2024-09-18 21:31:07,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=526420.0, ans=0.125 2024-09-18 21:31:07,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=526420.0, ans=0.125 2024-09-18 21:31:19,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=526420.0, ans=0.125 2024-09-18 21:31:24,895 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.639e+01 9.253e+01 9.920e+01 3.039e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-18 21:31:38,412 INFO [train.py:1198] (1/2) Epoch 30, batch 400, loss[loss=0.2489, ctc_loss=0.127, cr_loss=0.3836, attn_decoder_loss=0.2539, over 29702.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1215, cr_loss=0.3632, attn_decoder_loss=0.243, over 5024184.53 frames. ], batch size: 82, lr: 3.72e-03, grad_scale: 16.0 2024-09-18 21:31:54,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=526540.0, ans=0.0 2024-09-18 21:31:57,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2024-09-18 21:32:33,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=526620.0, ans=0.0 2024-09-18 21:32:39,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=526660.0, ans=0.025 2024-09-18 21:32:42,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=526660.0, ans=0.5 2024-09-18 21:32:56,035 INFO [train.py:1198] (1/2) Epoch 30, batch 450, loss[loss=0.2463, ctc_loss=0.1277, cr_loss=0.3638, attn_decoder_loss=0.2514, over 29697.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1213, cr_loss=0.363, attn_decoder_loss=0.2429, over 5184654.23 frames. ], batch size: 83, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:33:00,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=22.5 2024-09-18 21:33:13,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.17 vs. limit=10.0 2024-09-18 21:33:35,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=526780.0, ans=0.125 2024-09-18 21:33:48,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=526820.0, ans=0.0 2024-09-18 21:33:49,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.51 vs. limit=15.0 2024-09-18 21:33:54,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=526820.0, ans=0.125 2024-09-18 21:33:57,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=526860.0, ans=0.125 2024-09-18 21:34:01,944 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.512e+01 8.936e+01 9.488e+01 1.864e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 21:34:13,945 INFO [train.py:1198] (1/2) Epoch 30, batch 500, loss[loss=0.2664, ctc_loss=0.1386, cr_loss=0.4079, attn_decoder_loss=0.2715, over 29431.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1209, cr_loss=0.3619, attn_decoder_loss=0.2423, over 5329203.37 frames. ], batch size: 94, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:34:44,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=526980.0, ans=0.0 2024-09-18 21:34:46,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.75 vs. limit=22.5 2024-09-18 21:34:54,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-09-18 21:34:54,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=15.0 2024-09-18 21:34:59,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=22.5 2024-09-18 21:35:05,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2024-09-18 21:35:07,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-09-18 21:35:30,013 INFO [train.py:1198] (1/2) Epoch 30, batch 550, loss[loss=0.2537, ctc_loss=0.132, cr_loss=0.3811, attn_decoder_loss=0.2588, over 28805.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1211, cr_loss=0.3619, attn_decoder_loss=0.2426, over 5421609.90 frames. ], batch size: 104, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:35:38,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=527100.0, ans=0.025 2024-09-18 21:35:41,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=527100.0, ans=0.0 2024-09-18 21:35:42,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=527100.0, ans=0.025 2024-09-18 21:36:10,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=527180.0, ans=0.0 2024-09-18 21:36:23,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=527220.0, ans=0.2 2024-09-18 21:36:31,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.98 vs. limit=22.5 2024-09-18 21:36:33,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527260.0, ans=0.1 2024-09-18 21:36:35,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.20 vs. limit=22.5 2024-09-18 21:36:36,365 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.634e+01 8.972e+01 9.427e+01 2.186e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-18 21:36:48,692 INFO [train.py:1198] (1/2) Epoch 30, batch 600, loss[loss=0.2538, ctc_loss=0.1251, cr_loss=0.377, attn_decoder_loss=0.2597, over 29286.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1215, cr_loss=0.3631, attn_decoder_loss=0.2429, over 5509463.65 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:36:58,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=527300.0, ans=0.2 2024-09-18 21:36:59,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=527300.0, ans=0.125 2024-09-18 21:37:03,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2024-09-18 21:37:14,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=527340.0, ans=0.125 2024-09-18 21:37:19,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527380.0, ans=0.1 2024-09-18 21:37:20,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=527380.0, ans=0.125 2024-09-18 21:37:40,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=527420.0, ans=0.2 2024-09-18 21:37:42,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-09-18 21:38:00,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.22 vs. limit=8.0 2024-09-18 21:38:06,366 INFO [train.py:1198] (1/2) Epoch 30, batch 650, loss[loss=0.2438, ctc_loss=0.1333, cr_loss=0.3849, attn_decoder_loss=0.2475, over 29767.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1209, cr_loss=0.3616, attn_decoder_loss=0.2423, over 5586540.67 frames. ], batch size: 81, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:38:13,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.77 vs. limit=12.0 2024-09-18 21:38:34,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=527540.0, ans=10.0 2024-09-18 21:39:09,635 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.386e+01 8.897e+01 9.302e+01 1.225e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-18 21:39:21,804 INFO [train.py:1198] (1/2) Epoch 30, batch 700, loss[loss=0.2333, ctc_loss=0.1191, cr_loss=0.3591, attn_decoder_loss=0.238, over 29530.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1216, cr_loss=0.3629, attn_decoder_loss=0.243, over 5638247.31 frames. ], batch size: 76, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:39:36,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=12.0 2024-09-18 21:39:41,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=527740.0, ans=0.125 2024-09-18 21:39:55,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=527780.0, ans=0.04949747468305833 2024-09-18 21:40:04,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.06 vs. limit=15.0 2024-09-18 21:40:22,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.14 vs. limit=10.0 2024-09-18 21:40:26,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=527860.0, ans=0.125 2024-09-18 21:40:39,931 INFO [train.py:1198] (1/2) Epoch 30, batch 750, loss[loss=0.2421, ctc_loss=0.1255, cr_loss=0.382, attn_decoder_loss=0.2466, over 29732.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1218, cr_loss=0.3638, attn_decoder_loss=0.2429, over 5676724.93 frames. ], batch size: 82, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:40:52,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=527900.0, ans=0.2 2024-09-18 21:40:53,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=527940.0, ans=0.2 2024-09-18 21:41:05,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=527940.0, ans=0.0 2024-09-18 21:41:47,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=528020.0, ans=0.125 2024-09-18 21:41:52,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.561e+01 8.909e+01 9.515e+01 3.316e+02, threshold=1.782e+02, percent-clipped=2.0 2024-09-18 21:42:04,963 INFO [train.py:1198] (1/2) Epoch 30, batch 800, loss[loss=0.2237, ctc_loss=0.1106, cr_loss=0.3514, attn_decoder_loss=0.2285, over 29595.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1214, cr_loss=0.3633, attn_decoder_loss=0.2428, over 5706231.41 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 16.0 2024-09-18 21:42:14,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=528100.0, ans=0.0 2024-09-18 21:42:15,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=528100.0, ans=0.125 2024-09-18 21:42:29,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=528140.0, ans=0.05 2024-09-18 21:42:31,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.03 vs. limit=10.0 2024-09-18 21:42:32,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=528140.0, ans=0.125 2024-09-18 21:42:35,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.57 vs. limit=15.0 2024-09-18 21:42:39,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=528180.0, ans=0.1 2024-09-18 21:42:52,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.87 vs. limit=10.0 2024-09-18 21:42:57,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=12.0 2024-09-18 21:43:06,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=528260.0, ans=0.125 2024-09-18 21:43:20,043 INFO [train.py:1198] (1/2) Epoch 30, batch 850, loss[loss=0.252, ctc_loss=0.1352, cr_loss=0.4062, attn_decoder_loss=0.256, over 29699.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1212, cr_loss=0.3629, attn_decoder_loss=0.2425, over 5735070.66 frames. ], batch size: 89, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:43:20,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=528300.0, ans=0.125 2024-09-18 21:43:25,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.60 vs. limit=10.0 2024-09-18 21:43:27,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=15.0 2024-09-18 21:43:29,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=528300.0, ans=0.125 2024-09-18 21:43:39,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=528340.0, ans=0.0 2024-09-18 21:44:04,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=528420.0, ans=0.125 2024-09-18 21:44:06,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.45 vs. limit=15.0 2024-09-18 21:44:14,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=528420.0, ans=0.125 2024-09-18 21:44:27,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.440e+01 8.960e+01 9.629e+01 1.513e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-18 21:44:32,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=528460.0, ans=0.2 2024-09-18 21:44:33,001 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2024-09-18 21:44:37,925 INFO [train.py:1198] (1/2) Epoch 30, batch 900, loss[loss=0.218, ctc_loss=0.09342, cr_loss=0.3088, attn_decoder_loss=0.225, over 29613.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1213, cr_loss=0.3633, attn_decoder_loss=0.2427, over 5740393.95 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:44:40,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.38 vs. limit=15.0 2024-09-18 21:44:44,097 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:44:44,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-09-18 21:44:45,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=528500.0, ans=0.125 2024-09-18 21:44:59,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=528540.0, ans=0.025 2024-09-18 21:45:09,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.89 vs. limit=15.0 2024-09-18 21:45:12,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=528580.0, ans=0.125 2024-09-18 21:45:14,483 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:45:16,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-09-18 21:45:29,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=528620.0, ans=0.125 2024-09-18 21:45:29,949 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=12.0 2024-09-18 21:45:32,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=528620.0, ans=0.05 2024-09-18 21:45:34,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=528620.0, ans=0.125 2024-09-18 21:45:42,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=528660.0, ans=0.0 2024-09-18 21:45:55,576 INFO [train.py:1198] (1/2) Epoch 30, batch 950, loss[loss=0.2227, ctc_loss=0.1041, cr_loss=0.3317, attn_decoder_loss=0.2285, over 29511.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1211, cr_loss=0.3626, attn_decoder_loss=0.2428, over 5742400.31 frames. ], batch size: 74, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:46:05,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=12.0 2024-09-18 21:46:29,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=528780.0, ans=0.125 2024-09-18 21:46:38,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=528780.0, ans=0.125 2024-09-18 21:46:42,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=528820.0, ans=0.125 2024-09-18 21:47:00,630 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.476e+01 9.054e+01 9.594e+01 4.825e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-18 21:47:07,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-09-18 21:47:08,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528860.0, ans=0.1 2024-09-18 21:47:08,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=528860.0, ans=0.125 2024-09-18 21:47:09,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=528900.0, ans=0.125 2024-09-18 21:47:11,027 INFO [train.py:1198] (1/2) Epoch 30, batch 1000, loss[loss=0.234, ctc_loss=0.117, cr_loss=0.3595, attn_decoder_loss=0.239, over 29508.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1223, cr_loss=0.3645, attn_decoder_loss=0.2438, over 5736566.78 frames. ], batch size: 77, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:47:11,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.69 vs. limit=15.0 2024-09-18 21:47:15,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=528900.0, ans=0.0 2024-09-18 21:47:29,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=528940.0, ans=0.125 2024-09-18 21:47:44,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=528980.0, ans=0.125 2024-09-18 21:48:09,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=529020.0, ans=0.0 2024-09-18 21:48:28,828 INFO [train.py:1198] (1/2) Epoch 30, batch 1050, loss[loss=0.2509, ctc_loss=0.1225, cr_loss=0.3755, attn_decoder_loss=0.2568, over 29673.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1219, cr_loss=0.3642, attn_decoder_loss=0.243, over 5745340.70 frames. ], batch size: 85, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:48:43,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.22 vs. limit=10.0 2024-09-18 21:49:11,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2024-09-18 21:49:20,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529220.0, ans=0.1 2024-09-18 21:49:25,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=529220.0, ans=0.125 2024-09-18 21:49:29,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=529260.0, ans=0.125 2024-09-18 21:49:36,319 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.411e+01 8.824e+01 9.446e+01 1.337e+02, threshold=1.765e+02, percent-clipped=0.0 2024-09-18 21:49:46,959 INFO [train.py:1198] (1/2) Epoch 30, batch 1100, loss[loss=0.2425, ctc_loss=0.1232, cr_loss=0.3848, attn_decoder_loss=0.2473, over 29447.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1214, cr_loss=0.3635, attn_decoder_loss=0.2429, over 5757235.76 frames. ], batch size: 78, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:49:48,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=529300.0, ans=0.2 2024-09-18 21:50:00,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529340.0, ans=0.1 2024-09-18 21:50:07,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=529340.0, ans=0.0 2024-09-18 21:50:14,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2024-09-18 21:50:15,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=529380.0, ans=0.125 2024-09-18 21:51:02,521 INFO [train.py:1198] (1/2) Epoch 30, batch 1150, loss[loss=0.2241, ctc_loss=0.1063, cr_loss=0.3191, attn_decoder_loss=0.2301, over 29436.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1216, cr_loss=0.364, attn_decoder_loss=0.243, over 5755091.72 frames. ], batch size: 78, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 21:51:07,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=529500.0, ans=0.0 2024-09-18 21:51:21,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.93 vs. limit=15.0 2024-09-18 21:51:33,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=529580.0, ans=0.0 2024-09-18 21:51:44,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=529580.0, ans=0.125 2024-09-18 21:51:55,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=22.5 2024-09-18 21:52:06,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=529660.0, ans=0.09899494936611666 2024-09-18 21:52:10,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.491e+01 9.020e+01 1.005e+02 1.994e+02, threshold=1.804e+02, percent-clipped=1.0 2024-09-18 21:52:21,428 INFO [train.py:1198] (1/2) Epoch 30, batch 1200, loss[loss=0.2446, ctc_loss=0.1167, cr_loss=0.3614, attn_decoder_loss=0.2508, over 29670.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1221, cr_loss=0.3648, attn_decoder_loss=0.2438, over 5747435.67 frames. ], batch size: 85, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 21:52:28,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2024-09-18 21:52:38,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=529740.0, ans=0.125 2024-09-18 21:52:52,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=529780.0, ans=0.2 2024-09-18 21:52:55,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=529780.0, ans=0.1 2024-09-18 21:52:59,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=529780.0, ans=0.0 2024-09-18 21:53:09,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-09-18 21:53:15,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=529820.0, ans=0.125 2024-09-18 21:53:15,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=529820.0, ans=10.0 2024-09-18 21:53:33,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2024-09-18 21:53:37,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=529860.0, ans=0.125 2024-09-18 21:53:40,008 INFO [train.py:1198] (1/2) Epoch 30, batch 1250, loss[loss=0.266, ctc_loss=0.1551, cr_loss=0.4598, attn_decoder_loss=0.2681, over 29528.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1227, cr_loss=0.3666, attn_decoder_loss=0.2444, over 5775556.79 frames. ], batch size: 92, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 21:53:51,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=529900.0, ans=0.2 2024-09-18 21:54:21,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=529980.0, ans=0.2 2024-09-18 21:54:21,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=529980.0, ans=0.1 2024-09-18 21:54:36,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=530020.0, ans=0.125 2024-09-18 21:54:41,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=530060.0, ans=0.0 2024-09-18 21:54:45,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.456e+01 8.891e+01 9.459e+01 2.793e+02, threshold=1.778e+02, percent-clipped=2.0 2024-09-18 21:54:55,964 INFO [train.py:1198] (1/2) Epoch 30, batch 1300, loss[loss=0.2538, ctc_loss=0.1328, cr_loss=0.3903, attn_decoder_loss=0.2585, over 28425.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1222, cr_loss=0.3646, attn_decoder_loss=0.2438, over 5779795.74 frames. ], batch size: 112, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 21:54:58,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.00 vs. limit=15.0 2024-09-18 21:55:05,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=530100.0, ans=0.025 2024-09-18 21:55:31,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=530180.0, ans=0.125 2024-09-18 21:56:14,323 INFO [train.py:1198] (1/2) Epoch 30, batch 1350, loss[loss=0.2355, ctc_loss=0.1209, cr_loss=0.3638, attn_decoder_loss=0.2401, over 29772.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1219, cr_loss=0.3646, attn_decoder_loss=0.2434, over 5795482.34 frames. ], batch size: 81, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 21:56:27,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=530340.0, ans=0.125 2024-09-18 21:57:04,664 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:57:04,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=530420.0, ans=0.2 2024-09-18 21:57:04,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530420.0, ans=0.1 2024-09-18 21:57:07,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=530420.0, ans=0.125 2024-09-18 21:57:10,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=530420.0, ans=0.2 2024-09-18 21:57:20,872 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.385e+01 8.847e+01 9.362e+01 1.529e+02, threshold=1.769e+02, percent-clipped=0.0 2024-09-18 21:57:21,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=530460.0, ans=0.125 2024-09-18 21:57:30,075 INFO [train.py:1198] (1/2) Epoch 30, batch 1400, loss[loss=0.2138, ctc_loss=0.1015, cr_loss=0.3271, attn_decoder_loss=0.219, over 29572.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1217, cr_loss=0.3641, attn_decoder_loss=0.243, over 5806774.11 frames. ], batch size: 69, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 21:57:49,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=530540.0, ans=0.125 2024-09-18 21:57:58,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=530540.0, ans=0.125 2024-09-18 21:58:07,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=530580.0, ans=0.125 2024-09-18 21:58:21,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=530620.0, ans=0.125 2024-09-18 21:58:33,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-18 21:58:47,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.42 vs. limit=22.5 2024-09-18 21:58:48,079 INFO [train.py:1198] (1/2) Epoch 30, batch 1450, loss[loss=0.2545, ctc_loss=0.1317, cr_loss=0.3793, attn_decoder_loss=0.2598, over 29401.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1218, cr_loss=0.3639, attn_decoder_loss=0.2437, over 5803281.60 frames. ], batch size: 94, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 21:58:52,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530700.0, ans=0.1 2024-09-18 21:59:12,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=530740.0, ans=0.125 2024-09-18 21:59:14,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2024-09-18 21:59:24,816 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:59:28,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2024-09-18 21:59:50,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=530860.0, ans=0.025 2024-09-18 21:59:56,665 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 8.746e+01 9.204e+01 9.882e+01 6.648e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-18 21:59:59,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-09-18 22:00:04,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=530900.0, ans=0.0 2024-09-18 22:00:05,887 INFO [train.py:1198] (1/2) Epoch 30, batch 1500, loss[loss=0.2498, ctc_loss=0.1294, cr_loss=0.3764, attn_decoder_loss=0.2548, over 29624.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1217, cr_loss=0.3635, attn_decoder_loss=0.2436, over 5805593.86 frames. ], batch size: 86, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 22:00:25,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=530940.0, ans=0.125 2024-09-18 22:00:39,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=530980.0, ans=0.125 2024-09-18 22:00:47,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=530980.0, ans=0.125 2024-09-18 22:00:55,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=531020.0, ans=0.125 2024-09-18 22:01:01,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=531020.0, ans=0.0 2024-09-18 22:01:05,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=531060.0, ans=0.0 2024-09-18 22:01:22,006 INFO [train.py:1198] (1/2) Epoch 30, batch 1550, loss[loss=0.2485, ctc_loss=0.1286, cr_loss=0.3832, attn_decoder_loss=0.2533, over 29493.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.122, cr_loss=0.3639, attn_decoder_loss=0.2436, over 5782344.87 frames. ], batch size: 90, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 22:01:32,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=531100.0, ans=0.1 2024-09-18 22:01:50,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=531140.0, ans=0.025 2024-09-18 22:01:55,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2024-09-18 22:02:21,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531220.0, ans=0.1 2024-09-18 22:02:23,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.27 vs. limit=6.0 2024-09-18 22:02:31,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.587e+01 9.165e+01 1.006e+02 3.566e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-18 22:02:34,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=531260.0, ans=0.1 2024-09-18 22:02:40,591 INFO [train.py:1198] (1/2) Epoch 30, batch 1600, loss[loss=0.2591, ctc_loss=0.1353, cr_loss=0.4062, attn_decoder_loss=0.2638, over 29671.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1219, cr_loss=0.3639, attn_decoder_loss=0.2435, over 5766618.97 frames. ], batch size: 85, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:02:45,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=531300.0, ans=0.2 2024-09-18 22:02:46,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=531300.0, ans=0.0 2024-09-18 22:02:57,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=531340.0, ans=0.2 2024-09-18 22:03:05,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=531340.0, ans=10.0 2024-09-18 22:03:08,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=531340.0, ans=0.2 2024-09-18 22:03:40,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=531420.0, ans=0.125 2024-09-18 22:03:52,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=531460.0, ans=0.1 2024-09-18 22:03:58,211 INFO [train.py:1198] (1/2) Epoch 30, batch 1650, loss[loss=0.2412, ctc_loss=0.1129, cr_loss=0.3526, attn_decoder_loss=0.2476, over 29691.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1217, cr_loss=0.3637, attn_decoder_loss=0.243, over 5761113.83 frames. ], batch size: 89, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:04:39,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=531580.0, ans=0.0 2024-09-18 22:04:52,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=531620.0, ans=0.1 2024-09-18 22:05:03,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.05 vs. limit=15.0 2024-09-18 22:05:04,522 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.547e+01 8.983e+01 9.697e+01 1.906e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 22:05:10,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=531660.0, ans=0.125 2024-09-18 22:05:13,434 INFO [train.py:1198] (1/2) Epoch 30, batch 1700, loss[loss=0.2111, ctc_loss=0.1091, cr_loss=0.3306, attn_decoder_loss=0.2151, over 29576.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1213, cr_loss=0.3631, attn_decoder_loss=0.2428, over 5782331.41 frames. ], batch size: 69, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:05:18,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=531700.0, ans=0.0 2024-09-18 22:05:47,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=531780.0, ans=0.0 2024-09-18 22:06:04,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=531820.0, ans=0.125 2024-09-18 22:06:04,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=531820.0, ans=0.125 2024-09-18 22:06:11,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=531820.0, ans=10.0 2024-09-18 22:06:31,430 INFO [train.py:1198] (1/2) Epoch 30, batch 1750, loss[loss=0.2153, ctc_loss=0.1039, cr_loss=0.3368, attn_decoder_loss=0.2202, over 29317.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.121, cr_loss=0.3624, attn_decoder_loss=0.2425, over 5790470.31 frames. ], batch size: 67, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:07:06,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=531980.0, ans=0.125 2024-09-18 22:07:08,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=531980.0, ans=0.125 2024-09-18 22:07:10,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=531980.0, ans=0.0 2024-09-18 22:07:20,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=532020.0, ans=0.1 2024-09-18 22:07:40,548 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.324e+01 8.730e+01 9.634e+01 1.252e+02, threshold=1.746e+02, percent-clipped=0.0 2024-09-18 22:07:48,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532100.0, ans=0.1 2024-09-18 22:07:49,663 INFO [train.py:1198] (1/2) Epoch 30, batch 1800, loss[loss=0.2387, ctc_loss=0.117, cr_loss=0.3583, attn_decoder_loss=0.2442, over 29688.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1213, cr_loss=0.363, attn_decoder_loss=0.2427, over 5792159.86 frames. ], batch size: 83, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:08:34,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=532220.0, ans=0.07 2024-09-18 22:08:45,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.99 vs. limit=10.0 2024-09-18 22:09:05,417 INFO [train.py:1198] (1/2) Epoch 30, batch 1850, loss[loss=0.2478, ctc_loss=0.1275, cr_loss=0.3671, attn_decoder_loss=0.253, over 29649.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1214, cr_loss=0.3635, attn_decoder_loss=0.2428, over 5796806.41 frames. ], batch size: 86, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:09:05,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=532300.0, ans=0.0 2024-09-18 22:10:15,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.313e+01 8.937e+01 9.417e+01 1.433e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-18 22:10:22,900 INFO [train.py:1198] (1/2) Epoch 30, batch 1900, loss[loss=0.2452, ctc_loss=0.1355, cr_loss=0.3755, attn_decoder_loss=0.2491, over 29708.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1218, cr_loss=0.3642, attn_decoder_loss=0.2435, over 5804445.42 frames. ], batch size: 89, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:10:35,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=532500.0, ans=0.125 2024-09-18 22:11:18,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=532620.0, ans=0.0 2024-09-18 22:11:41,044 INFO [train.py:1198] (1/2) Epoch 30, batch 1950, loss[loss=0.2437, ctc_loss=0.1274, cr_loss=0.3859, attn_decoder_loss=0.248, over 29447.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1221, cr_loss=0.3657, attn_decoder_loss=0.2445, over 5819606.95 frames. ], batch size: 78, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:11:54,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=532740.0, ans=0.125 2024-09-18 22:12:10,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2024-09-18 22:12:13,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-09-18 22:12:15,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2024-09-18 22:12:17,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2024-09-18 22:12:35,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=532820.0, ans=0.0 2024-09-18 22:12:43,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=532860.0, ans=0.125 2024-09-18 22:12:49,466 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.712e+01 9.082e+01 9.590e+01 8.305e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-18 22:12:56,993 INFO [train.py:1198] (1/2) Epoch 30, batch 2000, loss[loss=0.2242, ctc_loss=0.1156, cr_loss=0.3686, attn_decoder_loss=0.2281, over 29386.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1229, cr_loss=0.367, attn_decoder_loss=0.2451, over 5796756.19 frames. ], batch size: 67, lr: 3.69e-03, grad_scale: 16.0 2024-09-18 22:13:06,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=532900.0, ans=0.125 2024-09-18 22:13:20,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=532940.0, ans=0.0 2024-09-18 22:13:28,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=532980.0, ans=0.05 2024-09-18 22:13:35,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.35 vs. limit=15.0 2024-09-18 22:14:01,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=533060.0, ans=0.2 2024-09-18 22:14:03,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=533060.0, ans=0.125 2024-09-18 22:14:08,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2024-09-18 22:14:15,044 INFO [train.py:1198] (1/2) Epoch 30, batch 2050, loss[loss=0.2304, ctc_loss=0.1246, cr_loss=0.38, attn_decoder_loss=0.2337, over 29408.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1226, cr_loss=0.366, attn_decoder_loss=0.2443, over 5789237.65 frames. ], batch size: 70, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:14:37,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.60 vs. limit=15.0 2024-09-18 22:14:47,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=533180.0, ans=0.0 2024-09-18 22:14:48,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=533180.0, ans=0.0 2024-09-18 22:14:48,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=533180.0, ans=0.125 2024-09-18 22:14:51,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=533180.0, ans=0.125 2024-09-18 22:14:53,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=533180.0, ans=0.2 2024-09-18 22:14:54,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=533180.0, ans=0.025 2024-09-18 22:14:56,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=533180.0, ans=0.2 2024-09-18 22:15:19,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=533260.0, ans=15.0 2024-09-18 22:15:21,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=533260.0, ans=0.5 2024-09-18 22:15:24,161 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:15:24,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=533260.0, ans=0.0 2024-09-18 22:15:26,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2024-09-18 22:15:26,828 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.326e+01 8.954e+01 9.784e+01 1.550e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-18 22:15:33,011 INFO [train.py:1198] (1/2) Epoch 30, batch 2100, loss[loss=0.2441, ctc_loss=0.1262, cr_loss=0.3802, attn_decoder_loss=0.2487, over 29767.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1221, cr_loss=0.3652, attn_decoder_loss=0.2438, over 5800496.97 frames. ], batch size: 81, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:15:42,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533300.0, ans=0.1 2024-09-18 22:15:55,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=533340.0, ans=0.0 2024-09-18 22:15:57,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=533340.0, ans=0.0 2024-09-18 22:16:14,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys.whitening_limit, batch_count=533380.0, ans=6.0 2024-09-18 22:16:16,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533420.0, ans=0.1 2024-09-18 22:16:41,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=533460.0, ans=0.07 2024-09-18 22:16:48,304 INFO [train.py:1198] (1/2) Epoch 30, batch 2150, loss[loss=0.2293, ctc_loss=0.1213, cr_loss=0.3603, attn_decoder_loss=0.2333, over 29461.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1214, cr_loss=0.3638, attn_decoder_loss=0.243, over 5815555.51 frames. ], batch size: 78, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:17:04,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.42 vs. limit=12.0 2024-09-18 22:17:39,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533620.0, ans=0.1 2024-09-18 22:18:00,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.351e+01 8.919e+01 9.432e+01 1.434e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-18 22:18:02,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=533660.0, ans=0.125 2024-09-18 22:18:04,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=533660.0, ans=0.125 2024-09-18 22:18:06,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2024-09-18 22:18:06,998 INFO [train.py:1198] (1/2) Epoch 30, batch 2200, loss[loss=0.2474, ctc_loss=0.1251, cr_loss=0.3723, attn_decoder_loss=0.2527, over 29653.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.122, cr_loss=0.3655, attn_decoder_loss=0.2434, over 5812978.18 frames. ], batch size: 86, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:18:17,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=533700.0, ans=0.2 2024-09-18 22:18:28,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=533740.0, ans=0.125 2024-09-18 22:18:40,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=533780.0, ans=0.2 2024-09-18 22:18:43,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=533780.0, ans=0.125 2024-09-18 22:18:48,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=533780.0, ans=0.125 2024-09-18 22:18:52,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=533820.0, ans=0.125 2024-09-18 22:19:03,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=533820.0, ans=0.1 2024-09-18 22:19:12,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-09-18 22:19:22,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=533860.0, ans=0.2 2024-09-18 22:19:24,811 INFO [train.py:1198] (1/2) Epoch 30, batch 2250, loss[loss=0.235, ctc_loss=0.1216, cr_loss=0.3612, attn_decoder_loss=0.2395, over 29704.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1217, cr_loss=0.3644, attn_decoder_loss=0.2431, over 5812267.39 frames. ], batch size: 82, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:19:33,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2024-09-18 22:19:43,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=533940.0, ans=0.125 2024-09-18 22:19:44,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533940.0, ans=0.1 2024-09-18 22:19:44,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=533940.0, ans=0.025 2024-09-18 22:19:46,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=533940.0, ans=0.125 2024-09-18 22:19:49,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=533940.0, ans=0.125 2024-09-18 22:19:52,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=533940.0, ans=0.0 2024-09-18 22:19:54,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=533980.0, ans=12.0 2024-09-18 22:20:34,699 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 8.443e+01 9.095e+01 9.654e+01 4.299e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-18 22:20:40,904 INFO [train.py:1198] (1/2) Epoch 30, batch 2300, loss[loss=0.2125, ctc_loss=0.09919, cr_loss=0.3084, attn_decoder_loss=0.2183, over 29335.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1208, cr_loss=0.3624, attn_decoder_loss=0.242, over 5797828.61 frames. ], batch size: 71, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:20:44,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534100.0, ans=0.1 2024-09-18 22:21:14,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=534180.0, ans=0.2 2024-09-18 22:21:27,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=534220.0, ans=0.2 2024-09-18 22:21:39,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=534220.0, ans=0.2 2024-09-18 22:21:56,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2024-09-18 22:21:58,859 INFO [train.py:1198] (1/2) Epoch 30, batch 2350, loss[loss=0.2466, ctc_loss=0.1237, cr_loss=0.3723, attn_decoder_loss=0.252, over 29699.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.121, cr_loss=0.363, attn_decoder_loss=0.2423, over 5804272.62 frames. ], batch size: 83, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:22:26,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534340.0, ans=0.1 2024-09-18 22:22:42,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=534380.0, ans=0.1 2024-09-18 22:22:45,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=534420.0, ans=0.125 2024-09-18 22:22:53,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=534420.0, ans=0.2 2024-09-18 22:22:57,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=534420.0, ans=0.125 2024-09-18 22:23:05,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=534460.0, ans=0.125 2024-09-18 22:23:05,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=534460.0, ans=0.125 2024-09-18 22:23:11,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.566e+01 9.166e+01 9.835e+01 1.994e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-18 22:23:17,293 INFO [train.py:1198] (1/2) Epoch 30, batch 2400, loss[loss=0.2231, ctc_loss=0.1052, cr_loss=0.3228, attn_decoder_loss=0.229, over 29536.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1217, cr_loss=0.3642, attn_decoder_loss=0.243, over 5808405.61 frames. ], batch size: 76, lr: 3.69e-03, grad_scale: 16.0 2024-09-18 22:23:23,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=534500.0, ans=0.0 2024-09-18 22:23:49,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=534580.0, ans=0.2 2024-09-18 22:23:55,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534580.0, ans=0.1 2024-09-18 22:24:07,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=534620.0, ans=0.0 2024-09-18 22:24:07,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534620.0, ans=0.1 2024-09-18 22:24:09,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=534620.0, ans=0.0 2024-09-18 22:24:20,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=534660.0, ans=0.0 2024-09-18 22:24:23,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.65 vs. limit=10.0 2024-09-18 22:24:33,285 INFO [train.py:1198] (1/2) Epoch 30, batch 2450, loss[loss=0.239, ctc_loss=0.1175, cr_loss=0.3587, attn_decoder_loss=0.2445, over 29711.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1224, cr_loss=0.3659, attn_decoder_loss=0.244, over 5786619.30 frames. ], batch size: 82, lr: 3.69e-03, grad_scale: 16.0 2024-09-18 22:24:45,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=534700.0, ans=0.2 2024-09-18 22:24:57,791 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:25:19,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2024-09-18 22:25:25,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=534820.0, ans=0.1 2024-09-18 22:25:25,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=534820.0, ans=0.125 2024-09-18 22:25:28,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=534820.0, ans=0.125 2024-09-18 22:25:28,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=534820.0, ans=0.05 2024-09-18 22:25:31,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=534820.0, ans=0.0 2024-09-18 22:25:44,674 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.707e+01 9.216e+01 9.749e+01 1.884e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-18 22:25:50,698 INFO [train.py:1198] (1/2) Epoch 30, batch 2500, loss[loss=0.2406, ctc_loss=0.118, cr_loss=0.3584, attn_decoder_loss=0.2463, over 29612.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1224, cr_loss=0.3658, attn_decoder_loss=0.2439, over 5796240.02 frames. ], batch size: 86, lr: 3.69e-03, grad_scale: 16.0 2024-09-18 22:26:12,320 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:26:24,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=534980.0, ans=0.5 2024-09-18 22:26:27,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.20 vs. limit=15.0 2024-09-18 22:26:39,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=535020.0, ans=0.0 2024-09-18 22:26:41,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=535020.0, ans=0.125 2024-09-18 22:26:41,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=12.0 2024-09-18 22:26:42,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=535020.0, ans=0.125 2024-09-18 22:26:45,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=535020.0, ans=0.125 2024-09-18 22:26:50,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=535060.0, ans=0.125 2024-09-18 22:27:01,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=535060.0, ans=0.125 2024-09-18 22:27:03,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=535060.0, ans=0.125 2024-09-18 22:27:09,174 INFO [train.py:1198] (1/2) Epoch 30, batch 2550, loss[loss=0.204, ctc_loss=0.09782, cr_loss=0.314, attn_decoder_loss=0.2089, over 29315.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1224, cr_loss=0.3665, attn_decoder_loss=0.2438, over 5798400.05 frames. ], batch size: 67, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:27:09,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=535100.0, ans=0.125 2024-09-18 22:27:16,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=535100.0, ans=0.2 2024-09-18 22:27:47,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=535180.0, ans=0.125 2024-09-18 22:27:49,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-18 22:27:53,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=535220.0, ans=0.125 2024-09-18 22:28:04,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=535220.0, ans=0.125 2024-09-18 22:28:05,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=535220.0, ans=0.07 2024-09-18 22:28:20,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.542e+01 9.079e+01 9.680e+01 2.807e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-18 22:28:22,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=535260.0, ans=0.0 2024-09-18 22:28:25,107 INFO [train.py:1198] (1/2) Epoch 30, batch 2600, loss[loss=0.2274, ctc_loss=0.1125, cr_loss=0.3364, attn_decoder_loss=0.2327, over 29434.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1222, cr_loss=0.3662, attn_decoder_loss=0.2441, over 5794399.47 frames. ], batch size: 78, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:28:26,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=535300.0, ans=0.1 2024-09-18 22:28:28,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=535300.0, ans=0.0 2024-09-18 22:28:35,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2024-09-18 22:29:18,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535420.0, ans=0.1 2024-09-18 22:29:42,490 INFO [train.py:1198] (1/2) Epoch 30, batch 2650, loss[loss=0.2438, ctc_loss=0.1241, cr_loss=0.3693, attn_decoder_loss=0.2489, over 29202.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1221, cr_loss=0.3664, attn_decoder_loss=0.2442, over 5800986.30 frames. ], batch size: 100, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:29:49,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.26 vs. limit=22.5 2024-09-18 22:29:50,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=535500.0, ans=0.0 2024-09-18 22:30:20,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2024-09-18 22:30:33,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=535620.0, ans=0.035 2024-09-18 22:30:33,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=535620.0, ans=0.0 2024-09-18 22:30:55,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.677e+01 9.089e+01 9.646e+01 4.909e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-18 22:30:56,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.42 vs. limit=15.0 2024-09-18 22:31:00,026 INFO [train.py:1198] (1/2) Epoch 30, batch 2700, loss[loss=0.2457, ctc_loss=0.1274, cr_loss=0.3695, attn_decoder_loss=0.2507, over 29520.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1227, cr_loss=0.367, attn_decoder_loss=0.2447, over 5796968.31 frames. ], batch size: 87, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:31:22,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=535740.0, ans=0.05 2024-09-18 22:31:30,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=535780.0, ans=0.0 2024-09-18 22:31:56,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=535820.0, ans=0.035 2024-09-18 22:32:11,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=535860.0, ans=0.0 2024-09-18 22:32:15,740 INFO [train.py:1198] (1/2) Epoch 30, batch 2750, loss[loss=0.2367, ctc_loss=0.1223, cr_loss=0.3821, attn_decoder_loss=0.2409, over 29535.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1224, cr_loss=0.366, attn_decoder_loss=0.2437, over 5795525.22 frames. ], batch size: 75, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:32:55,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=535980.0, ans=0.125 2024-09-18 22:33:13,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=536020.0, ans=0.2 2024-09-18 22:33:26,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2024-09-18 22:33:26,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=536060.0, ans=0.025 2024-09-18 22:33:27,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2024-09-18 22:33:28,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=536060.0, ans=0.09899494936611666 2024-09-18 22:33:29,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.488e+01 8.995e+01 9.694e+01 2.537e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-18 22:33:34,345 INFO [train.py:1198] (1/2) Epoch 30, batch 2800, loss[loss=0.2551, ctc_loss=0.1466, cr_loss=0.3848, attn_decoder_loss=0.2586, over 20385.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1227, cr_loss=0.3658, attn_decoder_loss=0.2438, over 5776660.07 frames. ], batch size: 209, lr: 3.68e-03, grad_scale: 16.0 2024-09-18 22:34:12,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=536180.0, ans=0.2 2024-09-18 22:34:38,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=536260.0, ans=0.125 2024-09-18 22:34:39,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=536260.0, ans=0.125 2024-09-18 22:34:50,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=536300.0, ans=0.0 2024-09-18 22:34:51,692 INFO [train.py:1198] (1/2) Epoch 30, batch 2850, loss[loss=0.2389, ctc_loss=0.1295, cr_loss=0.3843, attn_decoder_loss=0.2425, over 29501.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1229, cr_loss=0.366, attn_decoder_loss=0.2443, over 5762226.68 frames. ], batch size: 77, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:35:24,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=536380.0, ans=0.125 2024-09-18 22:35:34,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.22 vs. limit=15.0 2024-09-18 22:35:34,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=536380.0, ans=0.5 2024-09-18 22:36:04,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.865e+01 8.533e+01 9.000e+01 9.896e+01 2.723e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-18 22:36:05,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.06 vs. limit=22.5 2024-09-18 22:36:07,494 INFO [train.py:1198] (1/2) Epoch 30, batch 2900, loss[loss=0.2352, ctc_loss=0.1225, cr_loss=0.3758, attn_decoder_loss=0.2394, over 29435.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1233, cr_loss=0.3671, attn_decoder_loss=0.2451, over 5787133.55 frames. ], batch size: 79, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:36:12,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=536500.0, ans=0.0 2024-09-18 22:36:18,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=536500.0, ans=15.0 2024-09-18 22:36:29,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=536540.0, ans=0.125 2024-09-18 22:36:42,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=536580.0, ans=0.0 2024-09-18 22:36:47,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=536580.0, ans=0.02 2024-09-18 22:36:50,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.18 vs. limit=15.0 2024-09-18 22:37:13,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=536660.0, ans=0.2 2024-09-18 22:37:25,636 INFO [train.py:1198] (1/2) Epoch 30, batch 2950, loss[loss=0.2211, ctc_loss=0.1102, cr_loss=0.3463, attn_decoder_loss=0.2257, over 29526.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1221, cr_loss=0.3644, attn_decoder_loss=0.2437, over 5780479.17 frames. ], batch size: 75, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:37:52,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-09-18 22:37:55,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=536780.0, ans=0.125 2024-09-18 22:38:01,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=536780.0, ans=0.0 2024-09-18 22:38:18,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2024-09-18 22:38:41,266 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.475e+01 9.079e+01 9.790e+01 2.714e+02, threshold=1.816e+02, percent-clipped=3.0 2024-09-18 22:38:44,470 INFO [train.py:1198] (1/2) Epoch 30, batch 3000, loss[loss=0.2336, ctc_loss=0.1144, cr_loss=0.3409, attn_decoder_loss=0.2393, over 29756.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1217, cr_loss=0.3636, attn_decoder_loss=0.2434, over 5781098.17 frames. ], batch size: 81, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:38:44,470 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 22:39:01,304 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.1718, 2.6885, 2.6134, 2.7114, 2.6339, 2.6832, 2.0530, 2.8223], device='cuda:1') 2024-09-18 22:39:02,887 INFO [train.py:1230] (1/2) Epoch 30, validation: loss=0.2118, ctc_loss=0.03796, cr_loss=5.626e-15, attn_decoder_loss=0.2311, over 944034.00 frames. 2024-09-18 22:39:02,888 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 22:39:12,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=15.0 2024-09-18 22:39:38,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=536980.0, ans=0.125 2024-09-18 22:39:52,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=537020.0, ans=0.125 2024-09-18 22:39:56,686 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:40:10,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537060.0, ans=0.1 2024-09-18 22:40:11,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=537060.0, ans=0.125 2024-09-18 22:40:13,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=537060.0, ans=0.125 2024-09-18 22:40:19,083 INFO [train.py:1198] (1/2) Epoch 30, batch 3050, loss[loss=0.2333, ctc_loss=0.1212, cr_loss=0.3649, attn_decoder_loss=0.2377, over 29552.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1226, cr_loss=0.3657, attn_decoder_loss=0.2443, over 5774679.06 frames. ], batch size: 76, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:41:13,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=12.0 2024-09-18 22:41:16,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=6.0 2024-09-18 22:41:20,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=537260.0, ans=0.0 2024-09-18 22:41:33,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 8.461e+01 8.902e+01 9.446e+01 1.923e+02, threshold=1.780e+02, percent-clipped=1.0 2024-09-18 22:41:36,772 INFO [train.py:1198] (1/2) Epoch 30, batch 3100, loss[loss=0.2539, ctc_loss=0.1285, cr_loss=0.3719, attn_decoder_loss=0.2596, over 29243.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1226, cr_loss=0.366, attn_decoder_loss=0.2441, over 5775711.42 frames. ], batch size: 100, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:41:38,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=537300.0, ans=0.0 2024-09-18 22:41:50,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=537340.0, ans=0.0 2024-09-18 22:41:52,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=537340.0, ans=0.2 2024-09-18 22:41:52,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=537340.0, ans=0.05 2024-09-18 22:42:02,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=537340.0, ans=0.025 2024-09-18 22:42:17,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=537380.0, ans=0.125 2024-09-18 22:42:54,839 INFO [train.py:1198] (1/2) Epoch 30, batch 3150, loss[loss=0.2413, ctc_loss=0.1112, cr_loss=0.3476, attn_decoder_loss=0.248, over 28853.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1225, cr_loss=0.3655, attn_decoder_loss=0.2442, over 5781878.29 frames. ], batch size: 104, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:43:27,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2024-09-18 22:43:58,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=537660.0, ans=0.0 2024-09-18 22:44:06,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537660.0, ans=0.125 2024-09-18 22:44:07,776 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.378e+01 8.875e+01 9.441e+01 1.254e+02, threshold=1.775e+02, percent-clipped=0.0 2024-09-18 22:44:10,855 INFO [train.py:1198] (1/2) Epoch 30, batch 3200, loss[loss=0.2349, ctc_loss=0.1175, cr_loss=0.3597, attn_decoder_loss=0.24, over 29407.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1213, cr_loss=0.3631, attn_decoder_loss=0.2432, over 5791882.94 frames. ], batch size: 79, lr: 3.68e-03, grad_scale: 16.0 2024-09-18 22:44:15,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537700.0, ans=0.1 2024-09-18 22:44:27,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=537740.0, ans=0.0 2024-09-18 22:44:32,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=537740.0, ans=0.125 2024-09-18 22:44:34,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=537740.0, ans=0.0 2024-09-18 22:44:52,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.44 vs. limit=15.0 2024-09-18 22:44:56,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=537780.0, ans=0.125 2024-09-18 22:45:07,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2024-09-18 22:45:26,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=537860.0, ans=0.125 2024-09-18 22:45:29,322 INFO [train.py:1198] (1/2) Epoch 30, batch 3250, loss[loss=0.2348, ctc_loss=0.1149, cr_loss=0.3409, attn_decoder_loss=0.2405, over 29686.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1214, cr_loss=0.3638, attn_decoder_loss=0.2435, over 5798736.97 frames. ], batch size: 84, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:45:35,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=537900.0, ans=0.07 2024-09-18 22:45:53,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=537940.0, ans=0.1 2024-09-18 22:46:03,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=537980.0, ans=0.0 2024-09-18 22:46:07,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=537980.0, ans=0.125 2024-09-18 22:46:41,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=538060.0, ans=0.125 2024-09-18 22:46:45,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.587e+01 8.918e+01 9.595e+01 3.976e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-18 22:46:47,133 INFO [train.py:1198] (1/2) Epoch 30, batch 3300, loss[loss=0.2462, ctc_loss=0.1227, cr_loss=0.3638, attn_decoder_loss=0.2518, over 28217.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1206, cr_loss=0.3619, attn_decoder_loss=0.2423, over 5795567.30 frames. ], batch size: 111, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:47:02,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=538140.0, ans=0.0 2024-09-18 22:47:13,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=538140.0, ans=0.0 2024-09-18 22:47:14,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=538140.0, ans=0.035 2024-09-18 22:47:22,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=538180.0, ans=0.125 2024-09-18 22:47:35,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=538220.0, ans=10.0 2024-09-18 22:47:46,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=538260.0, ans=0.2 2024-09-18 22:47:51,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.91 vs. limit=15.0 2024-09-18 22:47:52,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=538260.0, ans=0.125 2024-09-18 22:48:02,468 INFO [train.py:1198] (1/2) Epoch 30, batch 3350, loss[loss=0.2581, ctc_loss=0.1383, cr_loss=0.3948, attn_decoder_loss=0.2626, over 28901.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1214, cr_loss=0.3626, attn_decoder_loss=0.2431, over 5772684.60 frames. ], batch size: 104, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:48:08,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=538300.0, ans=0.125 2024-09-18 22:48:19,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=538340.0, ans=0.0 2024-09-18 22:48:29,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=538340.0, ans=0.125 2024-09-18 22:48:34,599 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:48:56,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=538420.0, ans=0.125 2024-09-18 22:49:17,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=538460.0, ans=15.0 2024-09-18 22:49:19,303 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.711e+01 9.247e+01 9.714e+01 4.351e+02, threshold=1.849e+02, percent-clipped=3.0 2024-09-18 22:49:19,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=538500.0, ans=0.125 2024-09-18 22:49:20,832 INFO [train.py:1198] (1/2) Epoch 30, batch 3400, loss[loss=0.2118, ctc_loss=0.1016, cr_loss=0.3325, attn_decoder_loss=0.2167, over 29402.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1215, cr_loss=0.3632, attn_decoder_loss=0.243, over 5766879.40 frames. ], batch size: 67, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:49:49,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=538540.0, ans=0.125 2024-09-18 22:49:49,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=538540.0, ans=0.025 2024-09-18 22:49:51,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2024-09-18 22:49:58,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=538580.0, ans=0.04949747468305833 2024-09-18 22:49:58,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=538580.0, ans=0.09899494936611666 2024-09-18 22:49:58,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.44 vs. limit=15.0 2024-09-18 22:49:59,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2024-09-18 22:50:16,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=538620.0, ans=0.125 2024-09-18 22:50:25,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=538660.0, ans=0.04949747468305833 2024-09-18 22:50:38,778 INFO [train.py:1198] (1/2) Epoch 30, batch 3450, loss[loss=0.2388, ctc_loss=0.1128, cr_loss=0.3436, attn_decoder_loss=0.2451, over 28274.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1211, cr_loss=0.3625, attn_decoder_loss=0.2431, over 5774918.33 frames. ], batch size: 111, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:50:39,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff3.min_abs, batch_count=538700.0, ans=0.2 2024-09-18 22:50:40,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=538700.0, ans=0.2 2024-09-18 22:51:13,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=538780.0, ans=0.125 2024-09-18 22:51:15,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.34 vs. limit=6.0 2024-09-18 22:51:22,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=538820.0, ans=0.1 2024-09-18 22:51:52,847 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.505e+01 9.176e+01 9.588e+01 2.343e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-18 22:51:54,375 INFO [train.py:1198] (1/2) Epoch 30, batch 3500, loss[loss=0.2203, ctc_loss=0.1127, cr_loss=0.353, attn_decoder_loss=0.2244, over 29313.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1211, cr_loss=0.3626, attn_decoder_loss=0.2427, over 5777214.86 frames. ], batch size: 71, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:51:58,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2024-09-18 22:52:16,039 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:52:50,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=539020.0, ans=0.0 2024-09-18 22:52:54,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=539020.0, ans=0.125 2024-09-18 22:53:04,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=539060.0, ans=0.125 2024-09-18 22:53:11,558 INFO [train.py:1198] (1/2) Epoch 30, batch 3550, loss[loss=0.2376, ctc_loss=0.1107, cr_loss=0.3513, attn_decoder_loss=0.2439, over 29719.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1211, cr_loss=0.3624, attn_decoder_loss=0.2426, over 5783830.64 frames. ], batch size: 89, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:53:23,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2024-09-18 22:53:54,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=15.0 2024-09-18 22:53:56,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=539220.0, ans=0.0 2024-09-18 22:54:03,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=539220.0, ans=0.125 2024-09-18 22:54:10,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=539260.0, ans=0.125 2024-09-18 22:54:17,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539260.0, ans=0.1 2024-09-18 22:54:20,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=539260.0, ans=0.125 2024-09-18 22:54:23,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=539260.0, ans=0.0 2024-09-18 22:54:26,535 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.421e+01 8.886e+01 9.459e+01 1.383e+02, threshold=1.777e+02, percent-clipped=0.0 2024-09-18 22:54:28,094 INFO [train.py:1198] (1/2) Epoch 30, batch 3600, loss[loss=0.2381, ctc_loss=0.1251, cr_loss=0.3721, attn_decoder_loss=0.2424, over 29519.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1212, cr_loss=0.3627, attn_decoder_loss=0.243, over 5791824.04 frames. ], batch size: 77, lr: 3.67e-03, grad_scale: 16.0 2024-09-18 22:54:41,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2024-09-18 22:54:43,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=539340.0, ans=0.5 2024-09-18 22:54:43,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=539340.0, ans=0.125 2024-09-18 22:54:46,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=539340.0, ans=0.025 2024-09-18 22:55:00,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539380.0, ans=0.1 2024-09-18 22:55:24,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=539420.0, ans=0.0 2024-09-18 22:55:39,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=539460.0, ans=0.125 2024-09-18 22:55:42,274 INFO [train.py:1198] (1/2) Epoch 30, batch 3650, loss[loss=0.2599, ctc_loss=0.1396, cr_loss=0.3979, attn_decoder_loss=0.2645, over 29493.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1207, cr_loss=0.3618, attn_decoder_loss=0.2425, over 5794752.86 frames. ], batch size: 90, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:55:58,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=539540.0, ans=0.0 2024-09-18 22:56:06,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=539540.0, ans=0.0 2024-09-18 22:56:13,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=539580.0, ans=0.0 2024-09-18 22:56:18,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=539580.0, ans=0.0 2024-09-18 22:56:56,902 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.540e+01 9.082e+01 9.609e+01 1.779e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-18 22:56:56,924 INFO [train.py:1198] (1/2) Epoch 30, batch 3700, loss[loss=0.2459, ctc_loss=0.1247, cr_loss=0.3624, attn_decoder_loss=0.2513, over 29706.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1206, cr_loss=0.3621, attn_decoder_loss=0.2427, over 5804721.57 frames. ], batch size: 84, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:56:57,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=539700.0, ans=0.0 2024-09-18 22:57:06,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=539700.0, ans=0.0 2024-09-18 22:57:06,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=539700.0, ans=0.125 2024-09-18 22:57:09,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=539700.0, ans=0.125 2024-09-18 22:57:18,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=539740.0, ans=0.125 2024-09-18 22:57:21,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539740.0, ans=0.1 2024-09-18 22:57:24,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539740.0, ans=0.125 2024-09-18 22:57:45,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=539820.0, ans=0.125 2024-09-18 22:58:11,649 INFO [train.py:1198] (1/2) Epoch 30, batch 3750, loss[loss=0.2071, ctc_loss=0.1048, cr_loss=0.3223, attn_decoder_loss=0.2113, over 29367.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1203, cr_loss=0.3611, attn_decoder_loss=0.2422, over 5808378.70 frames. ], batch size: 67, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:58:29,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=539940.0, ans=0.0 2024-09-18 22:58:37,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=539940.0, ans=0.09899494936611666 2024-09-18 22:58:57,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=540020.0, ans=0.5 2024-09-18 22:59:08,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=540020.0, ans=0.2 2024-09-18 22:59:16,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=540060.0, ans=0.2 2024-09-18 22:59:23,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=540060.0, ans=0.125 2024-09-18 22:59:28,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.475e+01 8.896e+01 9.603e+01 2.511e+02, threshold=1.779e+02, percent-clipped=2.0 2024-09-18 22:59:28,038 INFO [train.py:1198] (1/2) Epoch 30, batch 3800, loss[loss=0.2365, ctc_loss=0.1156, cr_loss=0.345, attn_decoder_loss=0.2422, over 29634.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1202, cr_loss=0.361, attn_decoder_loss=0.242, over 5799031.44 frames. ], batch size: 86, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:59:34,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.14 vs. limit=22.5 2024-09-18 22:59:46,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=540140.0, ans=0.1 2024-09-18 22:59:47,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=540140.0, ans=0.1 2024-09-18 23:00:07,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=540180.0, ans=0.125 2024-09-18 23:00:11,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=540220.0, ans=0.0 2024-09-18 23:00:34,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=540260.0, ans=0.0 2024-09-18 23:00:35,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=540260.0, ans=0.125 2024-09-18 23:00:36,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=540260.0, ans=0.125 2024-09-18 23:00:44,095 INFO [train.py:1198] (1/2) Epoch 30, batch 3850, loss[loss=0.26, ctc_loss=0.1424, cr_loss=0.4166, attn_decoder_loss=0.2638, over 29246.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1205, cr_loss=0.3613, attn_decoder_loss=0.2421, over 5812286.43 frames. ], batch size: 100, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 23:00:51,871 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:01:02,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=540340.0, ans=0.125 2024-09-18 23:01:03,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=540340.0, ans=0.1 2024-09-18 23:01:15,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=540380.0, ans=0.125 2024-09-18 23:01:55,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2024-09-18 23:01:58,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.853e+01 8.547e+01 9.033e+01 9.737e+01 1.184e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-18 23:01:58,803 INFO [train.py:1198] (1/2) Epoch 30, batch 3900, loss[loss=0.2468, ctc_loss=0.1195, cr_loss=0.3558, attn_decoder_loss=0.253, over 29618.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1213, cr_loss=0.3631, attn_decoder_loss=0.2429, over 5816216.72 frames. ], batch size: 86, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 23:02:01,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.76 vs. limit=12.0 2024-09-18 23:02:38,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.07 vs. limit=22.5 2024-09-18 23:02:49,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=540620.0, ans=0.125 2024-09-18 23:02:58,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=540660.0, ans=0.035 2024-09-18 23:03:00,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.45 vs. limit=15.0 2024-09-18 23:03:11,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=540700.0, ans=0.2 2024-09-18 23:03:13,094 INFO [train.py:1198] (1/2) Epoch 30, batch 3950, loss[loss=0.2572, ctc_loss=0.1392, cr_loss=0.3869, attn_decoder_loss=0.2618, over 29502.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1213, cr_loss=0.3636, attn_decoder_loss=0.2429, over 5835671.59 frames. ], batch size: 97, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 23:03:16,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=540700.0, ans=0.0 2024-09-18 23:04:10,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.44 vs. limit=15.0 2024-09-18 23:04:21,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=540860.0, ans=0.0 2024-09-18 23:04:23,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=540860.0, ans=0.0 2024-09-18 23:04:27,383 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.475e+01 8.885e+01 9.495e+01 1.627e+02, threshold=1.777e+02, percent-clipped=0.0 2024-09-18 23:04:27,405 INFO [train.py:1198] (1/2) Epoch 30, batch 4000, loss[loss=0.2248, ctc_loss=0.1077, cr_loss=0.3457, attn_decoder_loss=0.2301, over 29502.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1213, cr_loss=0.3635, attn_decoder_loss=0.2429, over 5812032.08 frames. ], batch size: 74, lr: 3.67e-03, grad_scale: 16.0 2024-09-18 23:04:43,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=540940.0, ans=0.0 2024-09-18 23:05:05,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2024-09-18 23:05:29,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2024-09-18 23:05:44,350 INFO [train.py:1198] (1/2) Epoch 30, batch 4050, loss[loss=0.2605, ctc_loss=0.1558, cr_loss=0.3904, attn_decoder_loss=0.2635, over 20547.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1212, cr_loss=0.3628, attn_decoder_loss=0.2427, over 5796473.75 frames. ], batch size: 209, lr: 3.66e-03, grad_scale: 16.0 2024-09-18 23:05:47,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541100.0, ans=0.1 2024-09-18 23:05:52,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=22.5 2024-09-18 23:06:14,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-09-18 23:06:38,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=541220.0, ans=0.125 2024-09-18 23:06:49,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=541260.0, ans=0.125 2024-09-18 23:06:57,913 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.748e+01 9.261e+01 9.930e+01 1.570e+02, threshold=1.852e+02, percent-clipped=0.0 2024-09-18 23:06:57,938 INFO [train.py:1198] (1/2) Epoch 30, batch 4100, loss[loss=0.2621, ctc_loss=0.1483, cr_loss=0.4434, attn_decoder_loss=0.2648, over 29490.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1217, cr_loss=0.3639, attn_decoder_loss=0.2434, over 5791995.30 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 16.0 2024-09-18 23:07:04,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=541300.0, ans=0.125 2024-09-18 23:07:20,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=541340.0, ans=0.0 2024-09-18 23:07:25,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.44 vs. limit=15.0 2024-09-18 23:07:32,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=541380.0, ans=0.2 2024-09-18 23:07:38,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=541380.0, ans=0.125 2024-09-18 23:07:44,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=541420.0, ans=0.2 2024-09-18 23:08:01,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=541460.0, ans=0.125 2024-09-18 23:08:02,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.93 vs. limit=10.0 2024-09-18 23:08:11,984 INFO [train.py:1198] (1/2) Epoch 30, batch 4150, loss[loss=0.2305, ctc_loss=0.123, cr_loss=0.369, attn_decoder_loss=0.2342, over 29497.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1214, cr_loss=0.3636, attn_decoder_loss=0.243, over 5797565.11 frames. ], batch size: 77, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:08:21,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=541500.0, ans=0.125 2024-09-18 23:08:50,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=541580.0, ans=0.125 2024-09-18 23:09:06,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=541620.0, ans=0.125 2024-09-18 23:09:11,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=541660.0, ans=0.125 2024-09-18 23:09:24,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=541660.0, ans=0.125 2024-09-18 23:09:27,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=541700.0, ans=0.0 2024-09-18 23:09:28,238 INFO [train.py:1198] (1/2) Epoch 30, batch 4200, loss[loss=0.2622, ctc_loss=0.1383, cr_loss=0.3928, attn_decoder_loss=0.2673, over 29529.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1215, cr_loss=0.3642, attn_decoder_loss=0.2435, over 5799501.27 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:09:29,683 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.390e+01 9.004e+01 9.409e+01 1.747e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-18 23:09:37,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys.whitening_limit, batch_count=541700.0, ans=6.0 2024-09-18 23:09:44,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=541740.0, ans=0.125 2024-09-18 23:09:52,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=541740.0, ans=0.125 2024-09-18 23:09:55,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2024-09-18 23:10:20,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=541820.0, ans=0.125 2024-09-18 23:10:33,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=541860.0, ans=0.125 2024-09-18 23:10:34,757 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:10:41,862 INFO [train.py:1198] (1/2) Epoch 30, batch 4250, loss[loss=0.2157, ctc_loss=0.1016, cr_loss=0.3146, attn_decoder_loss=0.2213, over 29519.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1213, cr_loss=0.3639, attn_decoder_loss=0.2435, over 5806079.31 frames. ], batch size: 74, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:10:45,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=541900.0, ans=0.125 2024-09-18 23:10:48,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=541900.0, ans=0.1 2024-09-18 23:10:49,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=541900.0, ans=0.125 2024-09-18 23:11:02,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=541940.0, ans=0.125 2024-09-18 23:11:14,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=541980.0, ans=0.0 2024-09-18 23:11:31,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=542020.0, ans=0.025 2024-09-18 23:11:55,753 INFO [train.py:1198] (1/2) Epoch 30, batch 4300, loss[loss=0.2443, ctc_loss=0.1192, cr_loss=0.3481, attn_decoder_loss=0.2504, over 29516.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1212, cr_loss=0.3631, attn_decoder_loss=0.2435, over 5796045.31 frames. ], batch size: 87, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:11:57,300 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.934e+01 8.627e+01 9.132e+01 9.730e+01 6.693e+02, threshold=1.826e+02, percent-clipped=2.0 2024-09-18 23:12:59,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2024-09-18 23:13:11,894 INFO [train.py:1198] (1/2) Epoch 30, batch 4350, loss[loss=0.2588, ctc_loss=0.1393, cr_loss=0.4031, attn_decoder_loss=0.2631, over 29499.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1237, cr_loss=0.3686, attn_decoder_loss=0.2466, over 5799138.04 frames. ], batch size: 97, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:13:19,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=542300.0, ans=0.125 2024-09-18 23:13:21,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=542300.0, ans=10.0 2024-09-18 23:13:30,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.55 vs. limit=15.0 2024-09-18 23:13:32,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=542340.0, ans=0.1 2024-09-18 23:13:51,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=542380.0, ans=0.07 2024-09-18 23:13:55,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-18 23:13:57,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=542420.0, ans=0.125 2024-09-18 23:14:25,117 INFO [train.py:1198] (1/2) Epoch 30, batch 4400, loss[loss=0.2605, ctc_loss=0.1444, cr_loss=0.4148, attn_decoder_loss=0.2642, over 27566.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1252, cr_loss=0.3715, attn_decoder_loss=0.2486, over 5768847.41 frames. ], batch size: 125, lr: 3.66e-03, grad_scale: 16.0 2024-09-18 23:14:26,524 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 8.805e+01 9.147e+01 9.646e+01 3.836e+02, threshold=1.829e+02, percent-clipped=2.0 2024-09-18 23:14:26,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=542500.0, ans=10.0 2024-09-18 23:14:39,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.10 vs. limit=15.0 2024-09-18 23:14:40,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=542540.0, ans=0.0 2024-09-18 23:14:41,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=542540.0, ans=0.125 2024-09-18 23:14:47,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=542540.0, ans=0.125 2024-09-18 23:14:47,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=542540.0, ans=0.0 2024-09-18 23:15:29,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=542660.0, ans=0.125 2024-09-18 23:15:34,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=542660.0, ans=0.2 2024-09-18 23:15:37,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=542660.0, ans=0.125 2024-09-18 23:15:38,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=542700.0, ans=0.0 2024-09-18 23:15:39,909 INFO [train.py:1198] (1/2) Epoch 30, batch 4450, loss[loss=0.267, ctc_loss=0.1577, cr_loss=0.3925, attn_decoder_loss=0.2705, over 20783.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1288, cr_loss=0.3759, attn_decoder_loss=0.2507, over 5574678.60 frames. ], batch size: 210, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:15:49,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=542700.0, ans=0.1 2024-09-18 23:16:13,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=542780.0, ans=0.125 2024-09-18 23:16:25,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=542820.0, ans=0.0 2024-09-18 23:16:25,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=542820.0, ans=0.125 2024-09-18 23:16:53,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=542860.0, ans=0.125 2024-09-18 23:16:55,893 INFO [train.py:1198] (1/2) Epoch 30, batch 4500, loss[loss=0.2647, ctc_loss=0.1558, cr_loss=0.3846, attn_decoder_loss=0.2682, over 19400.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1328, cr_loss=0.3787, attn_decoder_loss=0.253, over 5232821.14 frames. ], batch size: 209, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:16:58,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.613e+01 9.634e+01 1.118e+02 1.226e+02 1.647e+02, threshold=2.235e+02, percent-clipped=0.0 2024-09-18 23:16:59,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=542900.0, ans=0.0 2024-09-18 23:17:09,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=542940.0, ans=0.2 2024-09-18 23:17:24,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=542980.0, ans=0.0 2024-09-18 23:17:27,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.24 vs. limit=15.0 2024-09-18 23:18:19,541 INFO [train.py:1198] (1/2) Epoch 31, batch 0, loss[loss=0.2199, ctc_loss=0.1122, cr_loss=0.3369, attn_decoder_loss=0.2244, over 29596.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1122, cr_loss=0.3369, attn_decoder_loss=0.2244, over 29596.00 frames. ], batch size: 73, lr: 3.60e-03, grad_scale: 16.0 2024-09-18 23:18:19,541 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 23:18:37,942 INFO [train.py:1230] (1/2) Epoch 31, validation: loss=0.2119, ctc_loss=0.03668, cr_loss=5.946e-15, attn_decoder_loss=0.2314, over 944034.00 frames. 2024-09-18 23:18:37,942 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-18 23:18:39,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=543000.0, ans=0.0 2024-09-18 23:19:01,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2024-09-18 23:19:08,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=543080.0, ans=0.0 2024-09-18 23:19:09,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-18 23:19:25,218 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:19:30,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=15.0 2024-09-18 23:19:55,995 INFO [train.py:1198] (1/2) Epoch 31, batch 50, loss[loss=0.2077, ctc_loss=0.09466, cr_loss=0.3162, attn_decoder_loss=0.2132, over 29436.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1209, cr_loss=0.359, attn_decoder_loss=0.2425, over 1267607.03 frames. ], batch size: 70, lr: 3.60e-03, grad_scale: 8.0 2024-09-18 23:19:56,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=543200.0, ans=0.0 2024-09-18 23:20:10,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=543240.0, ans=0.125 2024-09-18 23:20:38,778 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.848e+01 9.634e+01 1.110e+02 1.417e+02, threshold=1.927e+02, percent-clipped=0.0 2024-09-18 23:20:54,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=543320.0, ans=0.125 2024-09-18 23:21:02,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=543360.0, ans=0.125 2024-09-18 23:21:14,572 INFO [train.py:1198] (1/2) Epoch 31, batch 100, loss[loss=0.2406, ctc_loss=0.1217, cr_loss=0.3649, attn_decoder_loss=0.2457, over 29511.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1226, cr_loss=0.3641, attn_decoder_loss=0.2448, over 2251260.62 frames. ], batch size: 76, lr: 3.60e-03, grad_scale: 8.0 2024-09-18 23:21:34,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=543440.0, ans=0.125 2024-09-18 23:21:45,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.04 vs. limit=22.5 2024-09-18 23:21:47,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=543480.0, ans=0.1 2024-09-18 23:21:49,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2024-09-18 23:21:52,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=543480.0, ans=0.1 2024-09-18 23:22:01,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=543520.0, ans=0.1 2024-09-18 23:22:17,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=543560.0, ans=0.125 2024-09-18 23:22:29,482 INFO [train.py:1198] (1/2) Epoch 31, batch 150, loss[loss=0.2035, ctc_loss=0.09444, cr_loss=0.2961, attn_decoder_loss=0.209, over 29435.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1203, cr_loss=0.3602, attn_decoder_loss=0.2427, over 3046153.14 frames. ], batch size: 70, lr: 3.60e-03, grad_scale: 8.0 2024-09-18 23:22:45,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=543640.0, ans=0.1 2024-09-18 23:23:11,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 8.450e+01 8.920e+01 9.351e+01 1.507e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-18 23:23:47,251 INFO [train.py:1198] (1/2) Epoch 31, batch 200, loss[loss=0.2486, ctc_loss=0.1277, cr_loss=0.3911, attn_decoder_loss=0.2534, over 27307.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1206, cr_loss=0.3622, attn_decoder_loss=0.2426, over 3658832.86 frames. ], batch size: 124, lr: 3.60e-03, grad_scale: 8.0 2024-09-18 23:23:58,196 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:23:58,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=543800.0, ans=15.0 2024-09-18 23:24:05,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=543840.0, ans=0.1 2024-09-18 23:24:07,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.19 vs. limit=12.0 2024-09-18 23:24:18,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-09-18 23:24:23,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=543880.0, ans=0.125 2024-09-18 23:24:24,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-09-18 23:24:29,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=543880.0, ans=0.025 2024-09-18 23:24:54,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=543960.0, ans=0.07 2024-09-18 23:25:02,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=543960.0, ans=0.125 2024-09-18 23:25:13,221 INFO [train.py:1198] (1/2) Epoch 31, batch 250, loss[loss=0.2537, ctc_loss=0.1268, cr_loss=0.3657, attn_decoder_loss=0.2596, over 29164.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1198, cr_loss=0.3608, attn_decoder_loss=0.2424, over 4141136.80 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:25:55,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-09-18 23:25:55,611 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.439e+01 8.894e+01 9.430e+01 6.449e+02, threshold=1.779e+02, percent-clipped=1.0 2024-09-18 23:26:00,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=544120.0, ans=0.2 2024-09-18 23:26:07,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.38 vs. limit=22.5 2024-09-18 23:26:12,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=544160.0, ans=0.0 2024-09-18 23:26:25,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.10 vs. limit=10.0 2024-09-18 23:26:28,632 INFO [train.py:1198] (1/2) Epoch 31, batch 300, loss[loss=0.2504, ctc_loss=0.1354, cr_loss=0.3969, attn_decoder_loss=0.2543, over 29528.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1202, cr_loss=0.3615, attn_decoder_loss=0.2425, over 4510455.63 frames. ], batch size: 92, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:26:43,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-09-18 23:26:50,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=544240.0, ans=0.125 2024-09-18 23:27:02,499 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:27:34,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=544360.0, ans=0.07 2024-09-18 23:27:35,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=544360.0, ans=0.125 2024-09-18 23:27:40,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-09-18 23:27:46,727 INFO [train.py:1198] (1/2) Epoch 31, batch 350, loss[loss=0.2212, ctc_loss=0.1115, cr_loss=0.3434, attn_decoder_loss=0.2258, over 29343.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1208, cr_loss=0.3627, attn_decoder_loss=0.2432, over 4796482.41 frames. ], batch size: 71, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:28:06,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=544440.0, ans=0.125 2024-09-18 23:28:12,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=544440.0, ans=0.05 2024-09-18 23:28:24,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=544480.0, ans=0.125 2024-09-18 23:28:28,465 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.389e+01 8.860e+01 9.607e+01 2.348e+02, threshold=1.772e+02, percent-clipped=3.0 2024-09-18 23:28:31,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=544520.0, ans=0.125 2024-09-18 23:28:33,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=544520.0, ans=0.125 2024-09-18 23:28:37,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=544520.0, ans=0.125 2024-09-18 23:29:01,566 INFO [train.py:1198] (1/2) Epoch 31, batch 400, loss[loss=0.2391, ctc_loss=0.1168, cr_loss=0.3782, attn_decoder_loss=0.2443, over 29703.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1204, cr_loss=0.3622, attn_decoder_loss=0.2429, over 5024366.91 frames. ], batch size: 82, lr: 3.59e-03, grad_scale: 16.0 2024-09-18 23:29:10,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=544600.0, ans=0.125 2024-09-18 23:29:17,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=544640.0, ans=0.125 2024-09-18 23:29:23,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=544640.0, ans=0.125 2024-09-18 23:29:27,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=544640.0, ans=0.025 2024-09-18 23:30:03,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-18 23:30:20,319 INFO [train.py:1198] (1/2) Epoch 31, batch 450, loss[loss=0.2502, ctc_loss=0.1356, cr_loss=0.4, attn_decoder_loss=0.254, over 29685.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1204, cr_loss=0.3621, attn_decoder_loss=0.2428, over 5187453.06 frames. ], batch size: 83, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:30:20,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=544800.0, ans=0.2 2024-09-18 23:30:20,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=544800.0, ans=0.125 2024-09-18 23:30:31,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=544800.0, ans=0.0 2024-09-18 23:30:49,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=544880.0, ans=0.025 2024-09-18 23:30:54,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=544880.0, ans=0.2 2024-09-18 23:31:03,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=544880.0, ans=0.1 2024-09-18 23:31:04,391 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.525e+01 8.811e+01 9.438e+01 1.510e+02, threshold=1.762e+02, percent-clipped=0.0 2024-09-18 23:31:09,582 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:31:12,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=544920.0, ans=0.0 2024-09-18 23:31:13,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=544920.0, ans=0.125 2024-09-18 23:31:24,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.63 vs. limit=15.0 2024-09-18 23:31:29,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=22.5 2024-09-18 23:31:37,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=545000.0, ans=0.07 2024-09-18 23:31:38,404 INFO [train.py:1198] (1/2) Epoch 31, batch 500, loss[loss=0.2553, ctc_loss=0.1268, cr_loss=0.3777, attn_decoder_loss=0.2612, over 29440.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1204, cr_loss=0.3619, attn_decoder_loss=0.2422, over 5330420.76 frames. ], batch size: 94, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:31:40,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=545000.0, ans=0.0 2024-09-18 23:31:44,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=545000.0, ans=0.125 2024-09-18 23:31:48,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=545000.0, ans=0.07 2024-09-18 23:31:52,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=545040.0, ans=0.025 2024-09-18 23:32:13,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=545080.0, ans=0.0 2024-09-18 23:32:28,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=545120.0, ans=0.1 2024-09-18 23:32:36,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=545120.0, ans=0.0 2024-09-18 23:32:39,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=545160.0, ans=0.025 2024-09-18 23:32:54,163 INFO [train.py:1198] (1/2) Epoch 31, batch 550, loss[loss=0.2455, ctc_loss=0.1175, cr_loss=0.3564, attn_decoder_loss=0.2519, over 28793.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1203, cr_loss=0.3615, attn_decoder_loss=0.2421, over 5421577.81 frames. ], batch size: 104, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:32:59,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=545200.0, ans=0.025 2024-09-18 23:32:59,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=545200.0, ans=0.125 2024-09-18 23:33:20,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.33 vs. limit=15.0 2024-09-18 23:33:40,331 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.523e+01 8.948e+01 9.609e+01 1.463e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-18 23:34:12,434 INFO [train.py:1198] (1/2) Epoch 31, batch 600, loss[loss=0.2407, ctc_loss=0.122, cr_loss=0.3666, attn_decoder_loss=0.2458, over 29264.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1206, cr_loss=0.3621, attn_decoder_loss=0.2425, over 5509534.27 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:34:14,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=545400.0, ans=0.0 2024-09-18 23:34:23,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.59 vs. limit=15.0 2024-09-18 23:34:41,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=545480.0, ans=0.125 2024-09-18 23:34:41,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-09-18 23:34:49,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.61 vs. limit=10.0 2024-09-18 23:35:03,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=545520.0, ans=0.125 2024-09-18 23:35:08,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=545520.0, ans=0.0 2024-09-18 23:35:08,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.36 vs. limit=22.5 2024-09-18 23:35:09,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=545520.0, ans=0.0 2024-09-18 23:35:30,102 INFO [train.py:1198] (1/2) Epoch 31, batch 650, loss[loss=0.2403, ctc_loss=0.1193, cr_loss=0.3699, attn_decoder_loss=0.2455, over 29766.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1197, cr_loss=0.36, attn_decoder_loss=0.2418, over 5586392.53 frames. ], batch size: 81, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:35:31,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=545600.0, ans=0.0 2024-09-18 23:35:39,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=545600.0, ans=0.125 2024-09-18 23:35:49,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.51 vs. limit=12.0 2024-09-18 23:35:56,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=545640.0, ans=0.0 2024-09-18 23:35:57,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=545640.0, ans=0.125 2024-09-18 23:35:59,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.79 vs. limit=22.5 2024-09-18 23:36:14,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.324e+01 8.831e+01 9.249e+01 1.386e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-18 23:36:17,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=545720.0, ans=0.125 2024-09-18 23:36:31,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=545760.0, ans=0.125 2024-09-18 23:36:38,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=545760.0, ans=0.2 2024-09-18 23:36:46,083 INFO [train.py:1198] (1/2) Epoch 31, batch 700, loss[loss=0.2191, ctc_loss=0.1132, cr_loss=0.3502, attn_decoder_loss=0.2231, over 29530.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1199, cr_loss=0.3607, attn_decoder_loss=0.2423, over 5637119.85 frames. ], batch size: 76, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:36:50,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=545800.0, ans=0.125 2024-09-18 23:37:10,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=545840.0, ans=0.125 2024-09-18 23:37:21,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=545880.0, ans=0.125 2024-09-18 23:37:38,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2024-09-18 23:37:40,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545920.0, ans=0.1 2024-09-18 23:37:47,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=545960.0, ans=0.125 2024-09-18 23:37:52,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=545960.0, ans=0.1 2024-09-18 23:37:55,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=545960.0, ans=0.125 2024-09-18 23:38:00,079 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:38:04,175 INFO [train.py:1198] (1/2) Epoch 31, batch 750, loss[loss=0.2357, ctc_loss=0.1166, cr_loss=0.3697, attn_decoder_loss=0.2407, over 29726.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1197, cr_loss=0.3606, attn_decoder_loss=0.2418, over 5675903.51 frames. ], batch size: 82, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:38:12,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=546000.0, ans=0.0 2024-09-18 23:38:30,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=546040.0, ans=0.125 2024-09-18 23:38:47,925 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.065e+01 8.604e+01 9.058e+01 9.496e+01 1.707e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 23:38:51,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-09-18 23:39:19,499 INFO [train.py:1198] (1/2) Epoch 31, batch 800, loss[loss=0.2153, ctc_loss=0.1062, cr_loss=0.334, attn_decoder_loss=0.22, over 29638.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1201, cr_loss=0.3612, attn_decoder_loss=0.2419, over 5706404.70 frames. ], batch size: 73, lr: 3.59e-03, grad_scale: 16.0 2024-09-18 23:39:58,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=546280.0, ans=0.2 2024-09-18 23:39:58,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=546280.0, ans=0.125 2024-09-18 23:40:17,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.92 vs. limit=10.0 2024-09-18 23:40:33,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=546360.0, ans=0.125 2024-09-18 23:40:34,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=546360.0, ans=0.0 2024-09-18 23:40:37,683 INFO [train.py:1198] (1/2) Epoch 31, batch 850, loss[loss=0.2521, ctc_loss=0.127, cr_loss=0.398, attn_decoder_loss=0.2571, over 29703.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1203, cr_loss=0.3622, attn_decoder_loss=0.242, over 5734810.02 frames. ], batch size: 89, lr: 3.59e-03, grad_scale: 16.0 2024-09-18 23:40:51,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=546440.0, ans=0.125 2024-09-18 23:40:54,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=546440.0, ans=0.2 2024-09-18 23:41:19,510 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:41:20,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=546480.0, ans=0.125 2024-09-18 23:41:20,948 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:41:23,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.645e+01 9.090e+01 9.691e+01 3.180e+02, threshold=1.818e+02, percent-clipped=2.0 2024-09-18 23:41:46,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=546560.0, ans=0.0 2024-09-18 23:41:51,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=546560.0, ans=0.125 2024-09-18 23:41:54,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=546600.0, ans=0.125 2024-09-18 23:41:55,485 INFO [train.py:1198] (1/2) Epoch 31, batch 900, loss[loss=0.219, ctc_loss=0.1065, cr_loss=0.3387, attn_decoder_loss=0.2239, over 29632.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1204, cr_loss=0.3623, attn_decoder_loss=0.2421, over 5740566.76 frames. ], batch size: 73, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:42:09,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=546640.0, ans=0.1 2024-09-18 23:42:16,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=546640.0, ans=0.0 2024-09-18 23:42:30,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=546680.0, ans=0.125 2024-09-18 23:42:46,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=546720.0, ans=0.025 2024-09-18 23:42:52,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.39 vs. limit=6.0 2024-09-18 23:43:03,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=546760.0, ans=0.125 2024-09-18 23:43:06,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=546760.0, ans=0.125 2024-09-18 23:43:10,543 INFO [train.py:1198] (1/2) Epoch 31, batch 950, loss[loss=0.2222, ctc_loss=0.1047, cr_loss=0.3477, attn_decoder_loss=0.2276, over 29519.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1203, cr_loss=0.3622, attn_decoder_loss=0.2423, over 5740904.85 frames. ], batch size: 74, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:43:23,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=546800.0, ans=0.125 2024-09-18 23:43:28,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.29 vs. limit=6.0 2024-09-18 23:43:31,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=546840.0, ans=0.0 2024-09-18 23:43:58,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.530e+01 9.181e+01 9.954e+01 1.509e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-18 23:44:06,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=546920.0, ans=0.125 2024-09-18 23:44:08,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-09-18 23:44:28,410 INFO [train.py:1198] (1/2) Epoch 31, batch 1000, loss[loss=0.2159, ctc_loss=0.0976, cr_loss=0.3223, attn_decoder_loss=0.2219, over 29462.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1209, cr_loss=0.3634, attn_decoder_loss=0.243, over 5735285.74 frames. ], batch size: 77, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:44:47,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=547040.0, ans=0.125 2024-09-18 23:44:54,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=547040.0, ans=0.125 2024-09-18 23:44:54,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=547040.0, ans=0.125 2024-09-18 23:45:06,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=547080.0, ans=0.95 2024-09-18 23:45:33,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=547160.0, ans=0.125 2024-09-18 23:45:38,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=547160.0, ans=0.0 2024-09-18 23:45:47,376 INFO [train.py:1198] (1/2) Epoch 31, batch 1050, loss[loss=0.2515, ctc_loss=0.1276, cr_loss=0.3841, attn_decoder_loss=0.2567, over 29677.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1205, cr_loss=0.3623, attn_decoder_loss=0.2422, over 5742621.31 frames. ], batch size: 85, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:46:01,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=547240.0, ans=0.125 2024-09-18 23:46:10,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=547240.0, ans=0.2 2024-09-18 23:46:21,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=547280.0, ans=10.0 2024-09-18 23:46:27,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=547280.0, ans=0.125 2024-09-18 23:46:33,063 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.412e+01 9.049e+01 9.703e+01 1.961e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-18 23:46:58,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=547360.0, ans=0.125 2024-09-18 23:47:03,203 INFO [train.py:1198] (1/2) Epoch 31, batch 1100, loss[loss=0.2246, ctc_loss=0.1026, cr_loss=0.3225, attn_decoder_loss=0.231, over 29434.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1207, cr_loss=0.363, attn_decoder_loss=0.2424, over 5755402.92 frames. ], batch size: 78, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:47:32,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=547440.0, ans=0.0 2024-09-18 23:47:33,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2024-09-18 23:47:52,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=547520.0, ans=0.125 2024-09-18 23:48:01,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=22.5 2024-09-18 23:48:03,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=547520.0, ans=0.125 2024-09-18 23:48:13,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=547560.0, ans=0.0 2024-09-18 23:48:21,593 INFO [train.py:1198] (1/2) Epoch 31, batch 1150, loss[loss=0.2401, ctc_loss=0.128, cr_loss=0.3827, attn_decoder_loss=0.244, over 29459.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1209, cr_loss=0.3637, attn_decoder_loss=0.2427, over 5752489.98 frames. ], batch size: 78, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:48:37,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-09-18 23:48:38,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=547640.0, ans=0.025 2024-09-18 23:48:39,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=547640.0, ans=0.0 2024-09-18 23:48:46,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=547640.0, ans=15.0 2024-09-18 23:49:00,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=547680.0, ans=0.0 2024-09-18 23:49:00,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=547680.0, ans=0.125 2024-09-18 23:49:02,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=547680.0, ans=0.0 2024-09-18 23:49:09,382 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.554e+01 9.013e+01 9.585e+01 3.112e+02, threshold=1.803e+02, percent-clipped=2.0 2024-09-18 23:49:39,561 INFO [train.py:1198] (1/2) Epoch 31, batch 1200, loss[loss=0.2459, ctc_loss=0.1267, cr_loss=0.3544, attn_decoder_loss=0.2513, over 29662.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1211, cr_loss=0.3632, attn_decoder_loss=0.243, over 5745801.09 frames. ], batch size: 85, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:50:05,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=547840.0, ans=0.125 2024-09-18 23:50:58,373 INFO [train.py:1198] (1/2) Epoch 31, batch 1250, loss[loss=0.2573, ctc_loss=0.1399, cr_loss=0.4022, attn_decoder_loss=0.2614, over 29524.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1217, cr_loss=0.3649, attn_decoder_loss=0.2438, over 5773911.71 frames. ], batch size: 92, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:51:07,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=548000.0, ans=0.125 2024-09-18 23:51:08,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.34 vs. limit=6.0 2024-09-18 23:51:16,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548040.0, ans=0.1 2024-09-18 23:51:43,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.406e+01 8.800e+01 9.095e+01 1.339e+02, threshold=1.760e+02, percent-clipped=0.0 2024-09-18 23:52:02,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=548160.0, ans=0.025 2024-09-18 23:52:03,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-18 23:52:07,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=548160.0, ans=0.125 2024-09-18 23:52:14,291 INFO [train.py:1198] (1/2) Epoch 31, batch 1300, loss[loss=0.2493, ctc_loss=0.1268, cr_loss=0.3728, attn_decoder_loss=0.2546, over 28593.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1213, cr_loss=0.3643, attn_decoder_loss=0.2431, over 5779222.08 frames. ], batch size: 112, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:52:37,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=548240.0, ans=0.125 2024-09-18 23:52:39,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=548240.0, ans=0.125 2024-09-18 23:53:03,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=548320.0, ans=0.125 2024-09-18 23:53:03,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=548320.0, ans=0.025 2024-09-18 23:53:13,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=548320.0, ans=0.04949747468305833 2024-09-18 23:53:26,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=548360.0, ans=0.125 2024-09-18 23:53:32,537 INFO [train.py:1198] (1/2) Epoch 31, batch 1350, loss[loss=0.2478, ctc_loss=0.1282, cr_loss=0.3697, attn_decoder_loss=0.2529, over 29757.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.121, cr_loss=0.3637, attn_decoder_loss=0.2428, over 5798074.50 frames. ], batch size: 81, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:53:54,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-18 23:54:01,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=548480.0, ans=0.0 2024-09-18 23:54:08,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=548480.0, ans=0.0 2024-09-18 23:54:17,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.480e+01 8.970e+01 9.556e+01 1.739e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 23:54:26,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=548520.0, ans=0.125 2024-09-18 23:54:35,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=548560.0, ans=0.125 2024-09-18 23:54:41,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=548560.0, ans=0.2 2024-09-18 23:54:47,704 INFO [train.py:1198] (1/2) Epoch 31, batch 1400, loss[loss=0.2059, ctc_loss=0.09306, cr_loss=0.3047, attn_decoder_loss=0.2117, over 29531.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1204, cr_loss=0.3626, attn_decoder_loss=0.2425, over 5808726.72 frames. ], batch size: 69, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:54:49,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=548600.0, ans=0.125 2024-09-18 23:55:00,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=548600.0, ans=0.07 2024-09-18 23:55:25,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2024-09-18 23:55:33,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=548720.0, ans=0.1 2024-09-18 23:55:35,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548720.0, ans=0.1 2024-09-18 23:55:42,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=548720.0, ans=0.0 2024-09-18 23:56:05,888 INFO [train.py:1198] (1/2) Epoch 31, batch 1450, loss[loss=0.2481, ctc_loss=0.1263, cr_loss=0.3771, attn_decoder_loss=0.2532, over 29425.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1207, cr_loss=0.3634, attn_decoder_loss=0.2432, over 5804639.35 frames. ], batch size: 94, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:56:09,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=12.0 2024-09-18 23:56:22,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548840.0, ans=0.1 2024-09-18 23:56:51,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=548920.0, ans=0.0 2024-09-18 23:56:52,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.514e+01 9.000e+01 9.465e+01 1.182e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-18 23:56:54,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=548920.0, ans=0.125 2024-09-18 23:56:59,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=548920.0, ans=0.0 2024-09-18 23:57:03,989 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:57:12,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=548960.0, ans=0.2 2024-09-18 23:57:12,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=548960.0, ans=0.125 2024-09-18 23:57:23,233 INFO [train.py:1198] (1/2) Epoch 31, batch 1500, loss[loss=0.2464, ctc_loss=0.1213, cr_loss=0.3772, attn_decoder_loss=0.2519, over 29654.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1209, cr_loss=0.3636, attn_decoder_loss=0.2434, over 5806351.89 frames. ], batch size: 86, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:57:32,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=549000.0, ans=0.125 2024-09-18 23:57:50,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=549040.0, ans=0.125 2024-09-18 23:57:53,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=549080.0, ans=0.125 2024-09-18 23:58:27,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=549160.0, ans=0.125 2024-09-18 23:58:33,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549160.0, ans=0.1 2024-09-18 23:58:41,412 INFO [train.py:1198] (1/2) Epoch 31, batch 1550, loss[loss=0.2434, ctc_loss=0.1204, cr_loss=0.3786, attn_decoder_loss=0.2487, over 29510.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.121, cr_loss=0.3634, attn_decoder_loss=0.2432, over 5781988.83 frames. ], batch size: 90, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:58:50,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=549200.0, ans=0.2 2024-09-18 23:58:55,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=549240.0, ans=0.025 2024-09-18 23:59:13,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=549280.0, ans=0.125 2024-09-18 23:59:23,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=549280.0, ans=0.025 2024-09-18 23:59:25,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=549320.0, ans=0.0 2024-09-18 23:59:27,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.398e+01 8.813e+01 9.564e+01 2.152e+02, threshold=1.763e+02, percent-clipped=1.0 2024-09-18 23:59:28,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=549320.0, ans=0.125 2024-09-18 23:59:36,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=549320.0, ans=0.125 2024-09-18 23:59:39,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=549320.0, ans=0.125 2024-09-18 23:59:54,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=549360.0, ans=0.125 2024-09-18 23:59:56,779 INFO [train.py:1198] (1/2) Epoch 31, batch 1600, loss[loss=0.2444, ctc_loss=0.1211, cr_loss=0.3529, attn_decoder_loss=0.2502, over 29685.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1212, cr_loss=0.363, attn_decoder_loss=0.243, over 5763204.96 frames. ], batch size: 85, lr: 3.58e-03, grad_scale: 16.0 2024-09-19 00:00:04,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=549400.0, ans=0.0 2024-09-19 00:00:27,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=549480.0, ans=0.125 2024-09-19 00:00:33,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=549480.0, ans=0.125 2024-09-19 00:01:14,871 INFO [train.py:1198] (1/2) Epoch 31, batch 1650, loss[loss=0.2462, ctc_loss=0.1232, cr_loss=0.3522, attn_decoder_loss=0.2521, over 29683.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1208, cr_loss=0.3625, attn_decoder_loss=0.2429, over 5757998.76 frames. ], batch size: 89, lr: 3.58e-03, grad_scale: 8.0 2024-09-19 00:01:16,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=549600.0, ans=0.0 2024-09-19 00:01:19,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=549600.0, ans=0.0 2024-09-19 00:01:20,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2024-09-19 00:01:25,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549600.0, ans=0.1 2024-09-19 00:02:03,353 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.401e+01 8.981e+01 9.648e+01 1.683e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-19 00:02:32,504 INFO [train.py:1198] (1/2) Epoch 31, batch 1700, loss[loss=0.2119, ctc_loss=0.1006, cr_loss=0.3267, attn_decoder_loss=0.217, over 29557.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1204, cr_loss=0.3618, attn_decoder_loss=0.2427, over 5778542.75 frames. ], batch size: 69, lr: 3.58e-03, grad_scale: 8.0 2024-09-19 00:02:32,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=549800.0, ans=0.0 2024-09-19 00:02:35,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=549800.0, ans=0.2 2024-09-19 00:02:43,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=549800.0, ans=0.025 2024-09-19 00:02:57,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=549840.0, ans=0.04949747468305833 2024-09-19 00:03:01,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549880.0, ans=0.1 2024-09-19 00:03:05,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549880.0, ans=0.1 2024-09-19 00:03:05,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2024-09-19 00:03:18,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=549920.0, ans=0.125 2024-09-19 00:03:48,475 INFO [train.py:1198] (1/2) Epoch 31, batch 1750, loss[loss=0.2113, ctc_loss=0.105, cr_loss=0.3264, attn_decoder_loss=0.2159, over 29297.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1203, cr_loss=0.3617, attn_decoder_loss=0.2425, over 5786861.90 frames. ], batch size: 67, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:04:36,768 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.450e+01 9.086e+01 9.663e+01 1.697e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-19 00:04:44,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=550120.0, ans=0.0 2024-09-19 00:04:45,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.91 vs. limit=15.0 2024-09-19 00:05:05,971 INFO [train.py:1198] (1/2) Epoch 31, batch 1800, loss[loss=0.2465, ctc_loss=0.1285, cr_loss=0.381, attn_decoder_loss=0.2511, over 29693.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1205, cr_loss=0.3625, attn_decoder_loss=0.2429, over 5790043.50 frames. ], batch size: 83, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:05:12,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550200.0, ans=0.1 2024-09-19 00:05:12,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=550200.0, ans=0.125 2024-09-19 00:05:27,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=550240.0, ans=0.125 2024-09-19 00:05:38,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.40 vs. limit=15.0 2024-09-19 00:06:23,827 INFO [train.py:1198] (1/2) Epoch 31, batch 1850, loss[loss=0.2523, ctc_loss=0.1295, cr_loss=0.3857, attn_decoder_loss=0.2573, over 29648.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1205, cr_loss=0.3625, attn_decoder_loss=0.2427, over 5796830.99 frames. ], batch size: 86, lr: 3.57e-03, grad_scale: 4.0 2024-09-19 00:06:27,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=550400.0, ans=0.125 2024-09-19 00:06:29,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-09-19 00:06:45,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=550440.0, ans=0.1 2024-09-19 00:07:14,094 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.638e+01 9.110e+01 9.627e+01 2.703e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 00:07:14,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=550520.0, ans=0.1 2024-09-19 00:07:23,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=550560.0, ans=0.0 2024-09-19 00:07:31,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=550560.0, ans=0.0 2024-09-19 00:07:37,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.58 vs. limit=22.5 2024-09-19 00:07:39,697 INFO [train.py:1198] (1/2) Epoch 31, batch 1900, loss[loss=0.2388, ctc_loss=0.1092, cr_loss=0.3406, attn_decoder_loss=0.2456, over 29709.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1206, cr_loss=0.3634, attn_decoder_loss=0.2433, over 5803972.65 frames. ], batch size: 89, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:07:53,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550640.0, ans=0.1 2024-09-19 00:08:13,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=550680.0, ans=0.125 2024-09-19 00:08:39,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=550760.0, ans=0.0 2024-09-19 00:08:51,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=550760.0, ans=0.125 2024-09-19 00:08:57,912 INFO [train.py:1198] (1/2) Epoch 31, batch 1950, loss[loss=0.2362, ctc_loss=0.1201, cr_loss=0.3676, attn_decoder_loss=0.2409, over 29472.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1211, cr_loss=0.365, attn_decoder_loss=0.2443, over 5818511.47 frames. ], batch size: 78, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:08:59,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=550800.0, ans=0.125 2024-09-19 00:09:08,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=550800.0, ans=0.0 2024-09-19 00:09:47,673 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.604e+01 9.078e+01 9.873e+01 2.917e+02, threshold=1.816e+02, percent-clipped=2.0 2024-09-19 00:09:50,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2024-09-19 00:10:15,517 INFO [train.py:1198] (1/2) Epoch 31, batch 2000, loss[loss=0.2199, ctc_loss=0.1112, cr_loss=0.3599, attn_decoder_loss=0.224, over 29342.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1217, cr_loss=0.3656, attn_decoder_loss=0.2446, over 5797133.96 frames. ], batch size: 67, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:10:28,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=551000.0, ans=0.07 2024-09-19 00:10:28,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551000.0, ans=0.1 2024-09-19 00:10:32,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=551040.0, ans=0.125 2024-09-19 00:10:50,786 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:10:59,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=551120.0, ans=0.0 2024-09-19 00:11:08,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=551120.0, ans=0.0 2024-09-19 00:11:19,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=551160.0, ans=0.1 2024-09-19 00:11:20,821 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:11:31,484 INFO [train.py:1198] (1/2) Epoch 31, batch 2050, loss[loss=0.2126, ctc_loss=0.1044, cr_loss=0.3235, attn_decoder_loss=0.2174, over 29408.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.121, cr_loss=0.3635, attn_decoder_loss=0.2434, over 5788128.30 frames. ], batch size: 70, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:11:43,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-09-19 00:11:44,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=22.5 2024-09-19 00:11:50,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=551240.0, ans=0.125 2024-09-19 00:12:00,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=551280.0, ans=0.05 2024-09-19 00:12:04,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=12.0 2024-09-19 00:12:14,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=551280.0, ans=0.0 2024-09-19 00:12:18,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=551320.0, ans=10.0 2024-09-19 00:12:18,695 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:12:21,280 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 8.532e+01 9.006e+01 9.562e+01 1.976e+02, threshold=1.801e+02, percent-clipped=1.0 2024-09-19 00:12:33,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=551360.0, ans=0.125 2024-09-19 00:12:33,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=551360.0, ans=10.0 2024-09-19 00:12:37,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.45 vs. limit=15.0 2024-09-19 00:12:49,617 INFO [train.py:1198] (1/2) Epoch 31, batch 2100, loss[loss=0.2306, ctc_loss=0.1084, cr_loss=0.3333, attn_decoder_loss=0.2367, over 29775.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1205, cr_loss=0.3626, attn_decoder_loss=0.2427, over 5800837.14 frames. ], batch size: 81, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:12:49,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=551400.0, ans=0.025 2024-09-19 00:12:55,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=551400.0, ans=0.125 2024-09-19 00:13:01,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=551400.0, ans=0.2 2024-09-19 00:13:32,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=22.5 2024-09-19 00:13:33,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=551520.0, ans=0.2 2024-09-19 00:13:47,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.55 vs. limit=15.0 2024-09-19 00:13:52,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=551560.0, ans=6.0 2024-09-19 00:14:07,015 INFO [train.py:1198] (1/2) Epoch 31, batch 2150, loss[loss=0.2332, ctc_loss=0.1154, cr_loss=0.3604, attn_decoder_loss=0.2383, over 29455.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1196, cr_loss=0.3613, attn_decoder_loss=0.2421, over 5815915.80 frames. ], batch size: 78, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:14:16,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=551600.0, ans=0.125 2024-09-19 00:14:21,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=551640.0, ans=0.125 2024-09-19 00:14:40,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=551680.0, ans=0.125 2024-09-19 00:14:43,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-19 00:14:49,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=551680.0, ans=0.2 2024-09-19 00:14:57,054 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.560e+01 8.969e+01 9.441e+01 3.216e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-19 00:15:23,015 INFO [train.py:1198] (1/2) Epoch 31, batch 2200, loss[loss=0.2526, ctc_loss=0.1283, cr_loss=0.3794, attn_decoder_loss=0.2579, over 29655.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.12, cr_loss=0.362, attn_decoder_loss=0.2424, over 5812101.83 frames. ], batch size: 86, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:15:35,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=551800.0, ans=0.125 2024-09-19 00:15:44,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=551840.0, ans=0.125 2024-09-19 00:15:49,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=551840.0, ans=0.1 2024-09-19 00:16:10,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=551920.0, ans=0.0 2024-09-19 00:16:29,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551960.0, ans=0.1 2024-09-19 00:16:32,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=551960.0, ans=0.125 2024-09-19 00:16:34,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=551960.0, ans=0.125 2024-09-19 00:16:39,383 INFO [train.py:1198] (1/2) Epoch 31, batch 2250, loss[loss=0.2406, ctc_loss=0.1274, cr_loss=0.3787, attn_decoder_loss=0.2447, over 29684.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1202, cr_loss=0.3623, attn_decoder_loss=0.2424, over 5810615.67 frames. ], batch size: 82, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:16:58,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=552040.0, ans=0.125 2024-09-19 00:17:09,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=552040.0, ans=0.2 2024-09-19 00:17:12,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=552080.0, ans=0.0 2024-09-19 00:17:23,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-09-19 00:17:32,969 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.496e+01 8.948e+01 9.461e+01 2.809e+02, threshold=1.790e+02, percent-clipped=1.0 2024-09-19 00:17:36,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=552120.0, ans=0.025 2024-09-19 00:17:57,278 INFO [train.py:1198] (1/2) Epoch 31, batch 2300, loss[loss=0.2186, ctc_loss=0.1052, cr_loss=0.3342, attn_decoder_loss=0.2238, over 29324.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1199, cr_loss=0.362, attn_decoder_loss=0.2417, over 5799237.52 frames. ], batch size: 71, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:18:17,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=552240.0, ans=0.125 2024-09-19 00:18:23,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=552240.0, ans=0.125 2024-09-19 00:18:37,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=552280.0, ans=0.125 2024-09-19 00:18:57,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=552320.0, ans=0.2 2024-09-19 00:19:04,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=552360.0, ans=0.125 2024-09-19 00:19:15,033 INFO [train.py:1198] (1/2) Epoch 31, batch 2350, loss[loss=0.2483, ctc_loss=0.1264, cr_loss=0.377, attn_decoder_loss=0.2534, over 29706.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.12, cr_loss=0.3627, attn_decoder_loss=0.2419, over 5804697.48 frames. ], batch size: 83, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:19:19,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=552400.0, ans=0.0 2024-09-19 00:19:35,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=552440.0, ans=0.0 2024-09-19 00:19:36,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=552440.0, ans=0.125 2024-09-19 00:19:38,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2024-09-19 00:19:42,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=552440.0, ans=0.025 2024-09-19 00:19:50,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.43 vs. limit=15.0 2024-09-19 00:19:51,784 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:20:03,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552520.0, ans=0.1 2024-09-19 00:20:06,504 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.333e+01 8.603e+01 9.093e+01 9.793e+01 1.880e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-19 00:20:31,122 INFO [train.py:1198] (1/2) Epoch 31, batch 2400, loss[loss=0.2307, ctc_loss=0.1149, cr_loss=0.3482, attn_decoder_loss=0.2358, over 29521.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1205, cr_loss=0.3635, attn_decoder_loss=0.2423, over 5808774.61 frames. ], batch size: 76, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:20:41,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=552600.0, ans=0.125 2024-09-19 00:21:06,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=552680.0, ans=0.0 2024-09-19 00:21:11,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552680.0, ans=0.1 2024-09-19 00:21:35,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=552760.0, ans=0.2 2024-09-19 00:21:36,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=552760.0, ans=0.1 2024-09-19 00:21:42,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=552760.0, ans=0.0 2024-09-19 00:21:43,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2024-09-19 00:21:51,232 INFO [train.py:1198] (1/2) Epoch 31, batch 2450, loss[loss=0.2487, ctc_loss=0.1283, cr_loss=0.3902, attn_decoder_loss=0.2535, over 29696.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.121, cr_loss=0.3643, attn_decoder_loss=0.2429, over 5785748.62 frames. ], batch size: 82, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:22:13,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.76 vs. limit=22.5 2024-09-19 00:22:16,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=552840.0, ans=0.035 2024-09-19 00:22:21,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=552880.0, ans=0.125 2024-09-19 00:22:26,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=552880.0, ans=0.025 2024-09-19 00:22:44,295 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.732e+01 9.075e+01 9.673e+01 2.868e+02, threshold=1.815e+02, percent-clipped=2.0 2024-09-19 00:22:46,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=552920.0, ans=0.125 2024-09-19 00:22:51,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=552960.0, ans=0.125 2024-09-19 00:22:56,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552960.0, ans=0.1 2024-09-19 00:22:56,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=552960.0, ans=0.5 2024-09-19 00:22:59,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=552960.0, ans=0.0 2024-09-19 00:22:59,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=552960.0, ans=0.125 2024-09-19 00:23:05,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=553000.0, ans=0.0 2024-09-19 00:23:06,987 INFO [train.py:1198] (1/2) Epoch 31, batch 2500, loss[loss=0.2464, ctc_loss=0.1229, cr_loss=0.3549, attn_decoder_loss=0.2522, over 29634.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.121, cr_loss=0.3639, attn_decoder_loss=0.243, over 5795138.86 frames. ], batch size: 86, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:23:34,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=553040.0, ans=0.125 2024-09-19 00:23:43,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=553080.0, ans=0.125 2024-09-19 00:23:57,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=553120.0, ans=0.125 2024-09-19 00:24:01,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2024-09-19 00:24:11,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=553160.0, ans=0.125 2024-09-19 00:24:15,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=553160.0, ans=0.125 2024-09-19 00:24:20,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=553160.0, ans=0.025 2024-09-19 00:24:22,890 INFO [train.py:1198] (1/2) Epoch 31, batch 2550, loss[loss=0.2171, ctc_loss=0.1135, cr_loss=0.355, attn_decoder_loss=0.2207, over 29363.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1209, cr_loss=0.3641, attn_decoder_loss=0.243, over 5799705.75 frames. ], batch size: 67, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:24:46,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553240.0, ans=0.1 2024-09-19 00:24:47,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=553240.0, ans=0.0 2024-09-19 00:24:56,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=553280.0, ans=0.125 2024-09-19 00:25:18,171 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.309e+01 8.709e+01 9.331e+01 1.370e+02, threshold=1.742e+02, percent-clipped=0.0 2024-09-19 00:25:27,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=553360.0, ans=0.125 2024-09-19 00:25:43,188 INFO [train.py:1198] (1/2) Epoch 31, batch 2600, loss[loss=0.2442, ctc_loss=0.1277, cr_loss=0.3882, attn_decoder_loss=0.2485, over 29452.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1211, cr_loss=0.3645, attn_decoder_loss=0.2435, over 5796404.26 frames. ], batch size: 78, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:25:45,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.92 vs. limit=12.0 2024-09-19 00:25:58,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=553440.0, ans=0.0 2024-09-19 00:26:04,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=553440.0, ans=0.09899494936611666 2024-09-19 00:26:25,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553480.0, ans=0.1 2024-09-19 00:26:25,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=553480.0, ans=0.125 2024-09-19 00:26:25,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=553480.0, ans=0.1 2024-09-19 00:26:33,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=553520.0, ans=0.125 2024-09-19 00:26:34,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=553520.0, ans=10.0 2024-09-19 00:26:46,955 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:26:49,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=553560.0, ans=0.125 2024-09-19 00:26:55,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=553560.0, ans=0.125 2024-09-19 00:26:58,909 INFO [train.py:1198] (1/2) Epoch 31, batch 2650, loss[loss=0.2535, ctc_loss=0.1294, cr_loss=0.383, attn_decoder_loss=0.2588, over 29229.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.121, cr_loss=0.3642, attn_decoder_loss=0.2438, over 5802079.82 frames. ], batch size: 100, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:27:05,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=553600.0, ans=0.5 2024-09-19 00:27:12,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=553640.0, ans=0.1 2024-09-19 00:27:14,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=553640.0, ans=10.0 2024-09-19 00:27:21,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=553640.0, ans=0.2 2024-09-19 00:27:26,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=553640.0, ans=0.125 2024-09-19 00:27:42,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=553720.0, ans=0.125 2024-09-19 00:27:45,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=553720.0, ans=0.0 2024-09-19 00:27:51,190 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.581e+01 9.000e+01 9.310e+01 1.740e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-19 00:27:56,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553720.0, ans=0.1 2024-09-19 00:28:03,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553760.0, ans=0.1 2024-09-19 00:28:14,274 INFO [train.py:1198] (1/2) Epoch 31, batch 2700, loss[loss=0.2372, ctc_loss=0.1156, cr_loss=0.3488, attn_decoder_loss=0.243, over 29554.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.121, cr_loss=0.3639, attn_decoder_loss=0.2439, over 5797192.43 frames. ], batch size: 87, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:28:25,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2024-09-19 00:28:29,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=553840.0, ans=0.125 2024-09-19 00:28:41,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.36 vs. limit=15.0 2024-09-19 00:28:54,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=553880.0, ans=0.025 2024-09-19 00:29:03,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=553920.0, ans=0.05 2024-09-19 00:29:08,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=553920.0, ans=0.2 2024-09-19 00:29:15,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=553960.0, ans=0.025 2024-09-19 00:29:15,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=553960.0, ans=0.5 2024-09-19 00:29:20,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-09-19 00:29:23,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.03 vs. limit=10.0 2024-09-19 00:29:24,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=553960.0, ans=0.125 2024-09-19 00:29:28,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=553960.0, ans=0.125 2024-09-19 00:29:34,469 INFO [train.py:1198] (1/2) Epoch 31, batch 2750, loss[loss=0.2328, ctc_loss=0.1226, cr_loss=0.3863, attn_decoder_loss=0.2365, over 29509.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1207, cr_loss=0.3628, attn_decoder_loss=0.2427, over 5795212.28 frames. ], batch size: 75, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:29:39,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=554000.0, ans=0.0 2024-09-19 00:29:48,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554040.0, ans=0.125 2024-09-19 00:29:50,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=554040.0, ans=0.125 2024-09-19 00:29:52,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2024-09-19 00:29:57,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554040.0, ans=0.125 2024-09-19 00:30:27,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.490e+01 8.925e+01 9.454e+01 1.870e+02, threshold=1.785e+02, percent-clipped=1.0 2024-09-19 00:30:32,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=554120.0, ans=0.125 2024-09-19 00:30:50,894 INFO [train.py:1198] (1/2) Epoch 31, batch 2800, loss[loss=0.2507, ctc_loss=0.1397, cr_loss=0.3644, attn_decoder_loss=0.255, over 20100.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.121, cr_loss=0.3632, attn_decoder_loss=0.2431, over 5776741.82 frames. ], batch size: 210, lr: 3.56e-03, grad_scale: 16.0 2024-09-19 00:31:08,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.28 vs. limit=15.0 2024-09-19 00:31:11,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.20 vs. limit=15.0 2024-09-19 00:31:20,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=22.5 2024-09-19 00:31:21,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=554280.0, ans=0.125 2024-09-19 00:31:28,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=554280.0, ans=0.125 2024-09-19 00:31:34,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=554320.0, ans=0.125 2024-09-19 00:31:42,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=554320.0, ans=0.125 2024-09-19 00:32:00,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=554360.0, ans=0.125 2024-09-19 00:32:06,642 INFO [train.py:1198] (1/2) Epoch 31, batch 2850, loss[loss=0.2369, ctc_loss=0.1273, cr_loss=0.4052, attn_decoder_loss=0.2401, over 29524.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1218, cr_loss=0.3645, attn_decoder_loss=0.2434, over 5762930.82 frames. ], batch size: 77, lr: 3.56e-03, grad_scale: 16.0 2024-09-19 00:32:46,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=554480.0, ans=0.125 2024-09-19 00:32:56,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=22.5 2024-09-19 00:33:03,235 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.652e+01 9.121e+01 9.681e+01 2.307e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 00:33:05,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=554520.0, ans=0.1 2024-09-19 00:33:06,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=554520.0, ans=0.2 2024-09-19 00:33:11,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=554560.0, ans=0.125 2024-09-19 00:33:26,764 INFO [train.py:1198] (1/2) Epoch 31, batch 2900, loss[loss=0.2241, ctc_loss=0.1108, cr_loss=0.3459, attn_decoder_loss=0.229, over 29457.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1222, cr_loss=0.3662, attn_decoder_loss=0.2445, over 5788574.44 frames. ], batch size: 79, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:33:27,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=554600.0, ans=0.125 2024-09-19 00:33:28,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=554600.0, ans=0.0 2024-09-19 00:34:06,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=554680.0, ans=10.0 2024-09-19 00:34:06,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=554680.0, ans=0.0 2024-09-19 00:34:18,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=554720.0, ans=0.0 2024-09-19 00:34:30,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554760.0, ans=0.125 2024-09-19 00:34:42,840 INFO [train.py:1198] (1/2) Epoch 31, batch 2950, loss[loss=0.2255, ctc_loss=0.1098, cr_loss=0.358, attn_decoder_loss=0.2304, over 29530.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1212, cr_loss=0.3639, attn_decoder_loss=0.2432, over 5782601.73 frames. ], batch size: 75, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:35:28,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=554920.0, ans=0.0 2024-09-19 00:35:33,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=554920.0, ans=0.0 2024-09-19 00:35:33,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.42 vs. limit=10.0 2024-09-19 00:35:37,422 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.492e+01 9.114e+01 9.567e+01 2.273e+02, threshold=1.823e+02, percent-clipped=2.0 2024-09-19 00:35:42,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=554960.0, ans=0.0 2024-09-19 00:35:45,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=554960.0, ans=6.0 2024-09-19 00:35:56,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=554960.0, ans=0.125 2024-09-19 00:35:56,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.46 vs. limit=10.0 2024-09-19 00:35:59,046 INFO [train.py:1198] (1/2) Epoch 31, batch 3000, loss[loss=0.2373, ctc_loss=0.1256, cr_loss=0.378, attn_decoder_loss=0.2414, over 29755.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.121, cr_loss=0.3632, attn_decoder_loss=0.2431, over 5783491.37 frames. ], batch size: 81, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:35:59,047 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 00:36:15,055 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9543, 3.8489, 3.9176, 3.2570, 3.4097, 3.6364, 3.7075, 4.0390], device='cuda:1') 2024-09-19 00:36:19,672 INFO [train.py:1230] (1/2) Epoch 31, validation: loss=0.2117, ctc_loss=0.03748, cr_loss=5.925e-15, attn_decoder_loss=0.2311, over 944034.00 frames. 2024-09-19 00:36:19,673 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 00:37:07,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=555120.0, ans=0.0 2024-09-19 00:37:15,622 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:37:24,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=555160.0, ans=0.1 2024-09-19 00:37:37,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.87 vs. limit=15.0 2024-09-19 00:37:38,275 INFO [train.py:1198] (1/2) Epoch 31, batch 3050, loss[loss=0.2285, ctc_loss=0.1147, cr_loss=0.3592, attn_decoder_loss=0.2332, over 29518.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1213, cr_loss=0.3637, attn_decoder_loss=0.2436, over 5777797.33 frames. ], batch size: 76, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:37:49,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=555200.0, ans=0.125 2024-09-19 00:37:52,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555240.0, ans=0.1 2024-09-19 00:38:05,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2024-09-19 00:38:23,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=555320.0, ans=6.0 2024-09-19 00:38:30,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=555320.0, ans=0.0 2024-09-19 00:38:32,840 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.564e+01 9.261e+01 9.873e+01 2.101e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-19 00:38:34,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=555320.0, ans=0.125 2024-09-19 00:38:34,838 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:38:52,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=555400.0, ans=0.125 2024-09-19 00:38:53,997 INFO [train.py:1198] (1/2) Epoch 31, batch 3100, loss[loss=0.252, ctc_loss=0.127, cr_loss=0.3687, attn_decoder_loss=0.2576, over 29231.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1212, cr_loss=0.3637, attn_decoder_loss=0.2432, over 5777004.07 frames. ], batch size: 100, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:39:15,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=555440.0, ans=0.125 2024-09-19 00:39:20,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=555440.0, ans=0.5 2024-09-19 00:39:24,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=555480.0, ans=0.125 2024-09-19 00:39:26,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=555480.0, ans=0.1 2024-09-19 00:39:30,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-19 00:40:10,307 INFO [train.py:1198] (1/2) Epoch 31, batch 3150, loss[loss=0.25, ctc_loss=0.128, cr_loss=0.3713, attn_decoder_loss=0.2553, over 28858.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1212, cr_loss=0.3639, attn_decoder_loss=0.2433, over 5783324.51 frames. ], batch size: 104, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:40:13,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=22.5 2024-09-19 00:40:26,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=555640.0, ans=0.0 2024-09-19 00:40:57,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=555680.0, ans=0.2 2024-09-19 00:41:09,174 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.684e+01 9.147e+01 9.580e+01 2.256e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-19 00:41:21,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=555760.0, ans=0.125 2024-09-19 00:41:24,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=555760.0, ans=0.125 2024-09-19 00:41:30,413 INFO [train.py:1198] (1/2) Epoch 31, batch 3200, loss[loss=0.2293, ctc_loss=0.1064, cr_loss=0.3258, attn_decoder_loss=0.2358, over 29426.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1202, cr_loss=0.3618, attn_decoder_loss=0.2422, over 5793653.55 frames. ], batch size: 79, lr: 3.56e-03, grad_scale: 16.0 2024-09-19 00:41:33,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=555800.0, ans=0.0 2024-09-19 00:41:42,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2024-09-19 00:41:43,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.15 vs. limit=22.5 2024-09-19 00:42:12,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=555880.0, ans=0.0 2024-09-19 00:42:28,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=555920.0, ans=0.025 2024-09-19 00:42:43,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2024-09-19 00:42:46,903 INFO [train.py:1198] (1/2) Epoch 31, batch 3250, loss[loss=0.2393, ctc_loss=0.124, cr_loss=0.3959, attn_decoder_loss=0.2433, over 29702.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1203, cr_loss=0.3629, attn_decoder_loss=0.2427, over 5800740.32 frames. ], batch size: 84, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:43:27,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=556080.0, ans=0.025 2024-09-19 00:43:28,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.09 vs. limit=22.5 2024-09-19 00:43:30,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=556120.0, ans=0.5 2024-09-19 00:43:35,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=556120.0, ans=0.125 2024-09-19 00:43:42,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.541e+01 8.932e+01 9.487e+01 3.275e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-19 00:43:53,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556160.0, ans=0.1 2024-09-19 00:44:02,476 INFO [train.py:1198] (1/2) Epoch 31, batch 3300, loss[loss=0.247, ctc_loss=0.1253, cr_loss=0.3734, attn_decoder_loss=0.2522, over 28158.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.12, cr_loss=0.3615, attn_decoder_loss=0.2419, over 5798002.30 frames. ], batch size: 111, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:44:06,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=556200.0, ans=0.0 2024-09-19 00:44:14,116 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:44:44,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=556280.0, ans=0.0 2024-09-19 00:45:02,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=556320.0, ans=0.125 2024-09-19 00:45:10,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-19 00:45:11,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=556360.0, ans=0.125 2024-09-19 00:45:21,977 INFO [train.py:1198] (1/2) Epoch 31, batch 3350, loss[loss=0.2551, ctc_loss=0.128, cr_loss=0.3779, attn_decoder_loss=0.2608, over 28898.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1202, cr_loss=0.3618, attn_decoder_loss=0.2423, over 5773586.36 frames. ], batch size: 104, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:45:34,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=556400.0, ans=0.0 2024-09-19 00:45:50,648 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.72 vs. limit=10.0 2024-09-19 00:46:09,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=556520.0, ans=0.2 2024-09-19 00:46:12,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=556520.0, ans=0.025 2024-09-19 00:46:18,240 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.469e+01 9.027e+01 9.591e+01 1.739e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 00:46:23,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-09-19 00:46:26,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=12.0 2024-09-19 00:46:29,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.91 vs. limit=22.5 2024-09-19 00:46:30,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=556560.0, ans=0.125 2024-09-19 00:46:38,052 INFO [train.py:1198] (1/2) Epoch 31, batch 3400, loss[loss=0.211, ctc_loss=0.09763, cr_loss=0.3095, attn_decoder_loss=0.2167, over 29302.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1207, cr_loss=0.3625, attn_decoder_loss=0.2424, over 5766738.28 frames. ], batch size: 67, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:46:48,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=22.5 2024-09-19 00:46:52,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=556640.0, ans=0.125 2024-09-19 00:46:54,126 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=11.76 vs. limit=15.0 2024-09-19 00:46:55,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=556640.0, ans=0.125 2024-09-19 00:47:05,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=556640.0, ans=0.1 2024-09-19 00:47:06,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-19 00:47:45,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=556760.0, ans=0.05 2024-09-19 00:47:48,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2024-09-19 00:47:54,290 INFO [train.py:1198] (1/2) Epoch 31, batch 3450, loss[loss=0.255, ctc_loss=0.1324, cr_loss=0.3826, attn_decoder_loss=0.2601, over 28401.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1208, cr_loss=0.3629, attn_decoder_loss=0.2429, over 5775296.89 frames. ], batch size: 111, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:47:58,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=556800.0, ans=15.0 2024-09-19 00:48:05,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=556800.0, ans=0.125 2024-09-19 00:48:06,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=22.5 2024-09-19 00:48:09,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.89 vs. limit=15.0 2024-09-19 00:48:27,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556880.0, ans=0.1 2024-09-19 00:48:45,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=556920.0, ans=0.025 2024-09-19 00:48:54,517 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.684e+01 9.223e+01 9.839e+01 1.576e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-19 00:49:13,825 INFO [train.py:1198] (1/2) Epoch 31, batch 3500, loss[loss=0.2153, ctc_loss=0.1081, cr_loss=0.3398, attn_decoder_loss=0.2197, over 29318.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1202, cr_loss=0.362, attn_decoder_loss=0.2423, over 5776099.53 frames. ], batch size: 71, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:49:24,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=557000.0, ans=0.125 2024-09-19 00:49:26,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=557000.0, ans=0.125 2024-09-19 00:49:30,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=557040.0, ans=0.2 2024-09-19 00:49:43,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.35 vs. limit=10.0 2024-09-19 00:49:45,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=557080.0, ans=0.025 2024-09-19 00:49:53,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.95 vs. limit=15.0 2024-09-19 00:50:06,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=557120.0, ans=0.1 2024-09-19 00:50:19,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=557160.0, ans=0.125 2024-09-19 00:50:27,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=557200.0, ans=0.125 2024-09-19 00:50:28,513 INFO [train.py:1198] (1/2) Epoch 31, batch 3550, loss[loss=0.2462, ctc_loss=0.1176, cr_loss=0.3599, attn_decoder_loss=0.2525, over 29733.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.12, cr_loss=0.3619, attn_decoder_loss=0.2423, over 5783385.81 frames. ], batch size: 89, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:50:37,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=557200.0, ans=0.0 2024-09-19 00:50:55,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=557240.0, ans=0.0 2024-09-19 00:51:00,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.56 vs. limit=10.0 2024-09-19 00:51:10,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=557280.0, ans=0.125 2024-09-19 00:51:15,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.84 vs. limit=22.5 2024-09-19 00:51:23,621 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.575e+01 9.105e+01 9.496e+01 5.708e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-19 00:51:43,113 INFO [train.py:1198] (1/2) Epoch 31, batch 3600, loss[loss=0.2346, ctc_loss=0.12, cr_loss=0.377, attn_decoder_loss=0.2389, over 29534.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1202, cr_loss=0.3623, attn_decoder_loss=0.2426, over 5790902.96 frames. ], batch size: 77, lr: 3.55e-03, grad_scale: 16.0 2024-09-19 00:51:55,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=557400.0, ans=0.1 2024-09-19 00:52:06,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=557440.0, ans=0.125 2024-09-19 00:52:09,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.46 vs. limit=15.0 2024-09-19 00:52:15,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.68 vs. limit=12.0 2024-09-19 00:52:47,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=557560.0, ans=0.125 2024-09-19 00:52:58,317 INFO [train.py:1198] (1/2) Epoch 31, batch 3650, loss[loss=0.2578, ctc_loss=0.1316, cr_loss=0.3982, attn_decoder_loss=0.263, over 29476.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1198, cr_loss=0.3614, attn_decoder_loss=0.2421, over 5793492.75 frames. ], batch size: 90, lr: 3.55e-03, grad_scale: 16.0 2024-09-19 00:52:58,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=557600.0, ans=0.035 2024-09-19 00:53:08,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=557600.0, ans=0.025 2024-09-19 00:53:19,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=557640.0, ans=0.125 2024-09-19 00:53:36,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=557680.0, ans=0.1 2024-09-19 00:53:51,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=557720.0, ans=0.04949747468305833 2024-09-19 00:53:55,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=22.5 2024-09-19 00:53:55,291 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.468e+01 9.016e+01 9.512e+01 1.613e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-19 00:54:16,695 INFO [train.py:1198] (1/2) Epoch 31, batch 3700, loss[loss=0.2396, ctc_loss=0.122, cr_loss=0.3661, attn_decoder_loss=0.2446, over 29716.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1199, cr_loss=0.3621, attn_decoder_loss=0.2424, over 5802756.73 frames. ], batch size: 84, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:54:37,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=557840.0, ans=0.125 2024-09-19 00:54:40,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=557840.0, ans=0.0 2024-09-19 00:54:48,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=557880.0, ans=0.2 2024-09-19 00:54:57,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.76 vs. limit=15.0 2024-09-19 00:55:16,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=557960.0, ans=0.0 2024-09-19 00:55:30,750 INFO [train.py:1198] (1/2) Epoch 31, batch 3750, loss[loss=0.2151, ctc_loss=0.1035, cr_loss=0.3047, attn_decoder_loss=0.2207, over 29366.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1202, cr_loss=0.3624, attn_decoder_loss=0.2422, over 5807317.47 frames. ], batch size: 67, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:56:10,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-09-19 00:56:26,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=558120.0, ans=0.125 2024-09-19 00:56:27,501 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.551e+01 9.139e+01 9.962e+01 3.532e+02, threshold=1.828e+02, percent-clipped=2.0 2024-09-19 00:56:30,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=558160.0, ans=0.0 2024-09-19 00:56:41,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558160.0, ans=0.1 2024-09-19 00:56:41,314 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:56:45,311 INFO [train.py:1198] (1/2) Epoch 31, batch 3800, loss[loss=0.241, ctc_loss=0.1145, cr_loss=0.344, attn_decoder_loss=0.2474, over 29641.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.12, cr_loss=0.3622, attn_decoder_loss=0.2419, over 5798016.22 frames. ], batch size: 86, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:56:52,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.71 vs. limit=22.5 2024-09-19 00:57:23,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=558280.0, ans=0.2 2024-09-19 00:57:30,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=558320.0, ans=10.0 2024-09-19 00:57:45,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.31 vs. limit=10.0 2024-09-19 00:57:46,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=558360.0, ans=0.0 2024-09-19 00:57:46,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=558360.0, ans=0.95 2024-09-19 00:57:54,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=558360.0, ans=0.0 2024-09-19 00:57:59,982 INFO [train.py:1198] (1/2) Epoch 31, batch 3850, loss[loss=0.2492, ctc_loss=0.1241, cr_loss=0.3775, attn_decoder_loss=0.2548, over 29321.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1203, cr_loss=0.3627, attn_decoder_loss=0.242, over 5811445.49 frames. ], batch size: 100, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:58:44,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=558520.0, ans=0.0 2024-09-19 00:58:53,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=558520.0, ans=0.125 2024-09-19 00:58:55,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=558520.0, ans=0.025 2024-09-19 00:58:56,236 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.602e+01 9.005e+01 9.471e+01 1.448e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-19 00:59:06,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=558560.0, ans=0.5 2024-09-19 00:59:08,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=22.5 2024-09-19 00:59:14,164 INFO [train.py:1198] (1/2) Epoch 31, batch 3900, loss[loss=0.2472, ctc_loss=0.1179, cr_loss=0.3598, attn_decoder_loss=0.2535, over 29623.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1209, cr_loss=0.364, attn_decoder_loss=0.2427, over 5815640.84 frames. ], batch size: 86, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:59:32,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=558640.0, ans=0.0 2024-09-19 01:00:14,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=558760.0, ans=0.0 2024-09-19 01:00:30,737 INFO [train.py:1198] (1/2) Epoch 31, batch 3950, loss[loss=0.2497, ctc_loss=0.1264, cr_loss=0.3742, attn_decoder_loss=0.2551, over 29468.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1203, cr_loss=0.3628, attn_decoder_loss=0.2428, over 5835209.89 frames. ], batch size: 97, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 01:00:34,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.97 vs. limit=22.5 2024-09-19 01:01:03,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=558880.0, ans=0.1 2024-09-19 01:01:15,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=558920.0, ans=0.125 2024-09-19 01:01:18,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=558920.0, ans=0.125 2024-09-19 01:01:28,225 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.418e+01 8.953e+01 9.483e+01 1.231e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 01:01:29,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=558960.0, ans=0.0 2024-09-19 01:01:37,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=558960.0, ans=0.025 2024-09-19 01:01:44,268 INFO [train.py:1198] (1/2) Epoch 31, batch 4000, loss[loss=0.2178, ctc_loss=0.1078, cr_loss=0.3483, attn_decoder_loss=0.2223, over 29525.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.12, cr_loss=0.3619, attn_decoder_loss=0.2424, over 5812707.43 frames. ], batch size: 74, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 01:01:44,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=559000.0, ans=0.0 2024-09-19 01:01:47,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=559000.0, ans=0.0 2024-09-19 01:01:49,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=559000.0, ans=0.125 2024-09-19 01:01:58,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2024-09-19 01:02:03,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=559040.0, ans=0.125 2024-09-19 01:02:06,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-09-19 01:02:12,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=559080.0, ans=0.0 2024-09-19 01:02:23,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=559080.0, ans=0.025 2024-09-19 01:02:38,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-19 01:02:41,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2024-09-19 01:02:43,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.88 vs. limit=15.0 2024-09-19 01:02:48,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=559160.0, ans=0.125 2024-09-19 01:02:56,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=559160.0, ans=0.125 2024-09-19 01:02:59,113 INFO [train.py:1198] (1/2) Epoch 31, batch 4050, loss[loss=0.2625, ctc_loss=0.1618, cr_loss=0.3864, attn_decoder_loss=0.2651, over 20564.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1199, cr_loss=0.3615, attn_decoder_loss=0.242, over 5797052.13 frames. ], batch size: 209, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 01:03:05,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-09-19 01:03:18,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=559240.0, ans=0.1 2024-09-19 01:03:50,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=559320.0, ans=0.2 2024-09-19 01:03:56,622 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.819e+01 8.784e+01 9.351e+01 1.021e+02 4.182e+02, threshold=1.870e+02, percent-clipped=0.0 2024-09-19 01:03:57,042 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:03:59,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.64 vs. limit=15.0 2024-09-19 01:04:14,230 INFO [train.py:1198] (1/2) Epoch 31, batch 4100, loss[loss=0.2496, ctc_loss=0.1376, cr_loss=0.4048, attn_decoder_loss=0.2531, over 29499.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1205, cr_loss=0.3622, attn_decoder_loss=0.2424, over 5793051.86 frames. ], batch size: 90, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:04:30,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=559440.0, ans=0.5 2024-09-19 01:04:37,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=559440.0, ans=0.1 2024-09-19 01:05:02,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559520.0, ans=0.1 2024-09-19 01:05:03,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=559520.0, ans=0.125 2024-09-19 01:05:09,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=559520.0, ans=0.025 2024-09-19 01:05:24,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=559560.0, ans=0.0 2024-09-19 01:05:28,645 INFO [train.py:1198] (1/2) Epoch 31, batch 4150, loss[loss=0.2303, ctc_loss=0.1166, cr_loss=0.356, attn_decoder_loss=0.235, over 29490.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1206, cr_loss=0.3625, attn_decoder_loss=0.2424, over 5798932.95 frames. ], batch size: 77, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:05:42,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=559640.0, ans=0.125 2024-09-19 01:05:45,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=559640.0, ans=0.025 2024-09-19 01:06:26,030 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.331e+01 8.844e+01 9.480e+01 1.340e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-19 01:06:29,385 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:06:42,216 INFO [train.py:1198] (1/2) Epoch 31, batch 4200, loss[loss=0.2539, ctc_loss=0.1346, cr_loss=0.399, attn_decoder_loss=0.2583, over 29499.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1206, cr_loss=0.3629, attn_decoder_loss=0.2425, over 5801081.41 frames. ], batch size: 90, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:07:08,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-09-19 01:07:13,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=559880.0, ans=0.0 2024-09-19 01:08:04,342 INFO [train.py:1198] (1/2) Epoch 31, batch 4250, loss[loss=0.2192, ctc_loss=0.1085, cr_loss=0.3337, attn_decoder_loss=0.2241, over 29506.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1205, cr_loss=0.3627, attn_decoder_loss=0.2428, over 5806568.61 frames. ], batch size: 74, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:08:37,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=560080.0, ans=0.125 2024-09-19 01:08:49,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=560120.0, ans=0.025 2024-09-19 01:09:03,973 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.697e+01 9.428e+01 9.992e+01 2.936e+02, threshold=1.886e+02, percent-clipped=1.0 2024-09-19 01:09:04,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.19 vs. limit=12.0 2024-09-19 01:09:08,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=560160.0, ans=0.125 2024-09-19 01:09:14,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=560160.0, ans=0.0 2024-09-19 01:09:20,361 INFO [train.py:1198] (1/2) Epoch 31, batch 4300, loss[loss=0.2435, ctc_loss=0.1222, cr_loss=0.3794, attn_decoder_loss=0.2486, over 29517.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1202, cr_loss=0.3624, attn_decoder_loss=0.243, over 5797183.43 frames. ], batch size: 87, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:09:23,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=560200.0, ans=0.125 2024-09-19 01:09:26,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=560200.0, ans=0.0 2024-09-19 01:09:30,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-09-19 01:09:53,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=560280.0, ans=0.125 2024-09-19 01:09:59,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=560280.0, ans=0.0 2024-09-19 01:10:06,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560320.0, ans=0.1 2024-09-19 01:10:06,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=560320.0, ans=0.125 2024-09-19 01:10:21,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=560360.0, ans=0.125 2024-09-19 01:10:34,929 INFO [train.py:1198] (1/2) Epoch 31, batch 4350, loss[loss=0.2433, ctc_loss=0.1125, cr_loss=0.345, attn_decoder_loss=0.2502, over 29502.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1229, cr_loss=0.3679, attn_decoder_loss=0.2461, over 5798658.32 frames. ], batch size: 97, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:10:38,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=560400.0, ans=0.0 2024-09-19 01:11:05,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.84 vs. limit=22.5 2024-09-19 01:11:14,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=560480.0, ans=0.0 2024-09-19 01:11:18,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=560520.0, ans=0.0 2024-09-19 01:11:33,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 8.755e+01 9.230e+01 9.613e+01 3.743e+02, threshold=1.846e+02, percent-clipped=1.0 2024-09-19 01:11:50,367 INFO [train.py:1198] (1/2) Epoch 31, batch 4400, loss[loss=0.2555, ctc_loss=0.1313, cr_loss=0.3872, attn_decoder_loss=0.2607, over 27534.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1247, cr_loss=0.3714, attn_decoder_loss=0.2485, over 5769354.64 frames. ], batch size: 125, lr: 3.54e-03, grad_scale: 16.0 2024-09-19 01:11:50,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560600.0, ans=0.1 2024-09-19 01:12:01,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2024-09-19 01:12:18,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=560680.0, ans=0.0 2024-09-19 01:12:47,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.30 vs. limit=22.5 2024-09-19 01:12:51,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=22.5 2024-09-19 01:13:03,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=560800.0, ans=0.125 2024-09-19 01:13:05,140 INFO [train.py:1198] (1/2) Epoch 31, batch 4450, loss[loss=0.2638, ctc_loss=0.1615, cr_loss=0.4109, attn_decoder_loss=0.2661, over 19488.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1287, cr_loss=0.3768, attn_decoder_loss=0.2508, over 5571923.75 frames. ], batch size: 209, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:13:24,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=560840.0, ans=15.0 2024-09-19 01:13:29,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.53 vs. limit=22.5 2024-09-19 01:13:35,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=560880.0, ans=0.0 2024-09-19 01:13:38,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=560880.0, ans=0.07 2024-09-19 01:13:48,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=560880.0, ans=0.125 2024-09-19 01:13:48,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=12.0 2024-09-19 01:13:53,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560920.0, ans=0.1 2024-09-19 01:14:03,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=560920.0, ans=0.0 2024-09-19 01:14:06,069 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.287e+01 9.493e+01 1.082e+02 1.219e+02 3.408e+02, threshold=2.163e+02, percent-clipped=1.0 2024-09-19 01:14:09,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=560960.0, ans=0.125 2024-09-19 01:14:21,379 INFO [train.py:1198] (1/2) Epoch 31, batch 4500, loss[loss=0.2567, ctc_loss=0.1467, cr_loss=0.4029, attn_decoder_loss=0.2599, over 20243.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1323, cr_loss=0.379, attn_decoder_loss=0.2528, over 5226104.06 frames. ], batch size: 209, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:14:23,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2024-09-19 01:14:42,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=561040.0, ans=0.2 2024-09-19 01:15:44,694 INFO [train.py:1198] (1/2) Epoch 32, batch 0, loss[loss=0.2177, ctc_loss=0.1069, cr_loss=0.3304, attn_decoder_loss=0.2226, over 29622.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1069, cr_loss=0.3304, attn_decoder_loss=0.2226, over 29622.00 frames. ], batch size: 73, lr: 3.48e-03, grad_scale: 16.0 2024-09-19 01:15:44,695 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 01:16:03,114 INFO [train.py:1230] (1/2) Epoch 32, validation: loss=0.2127, ctc_loss=0.03714, cr_loss=6.101e-15, attn_decoder_loss=0.2322, over 944034.00 frames. 2024-09-19 01:16:03,114 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 01:16:12,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=561100.0, ans=0.1 2024-09-19 01:16:26,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561140.0, ans=0.1 2024-09-19 01:16:27,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=561140.0, ans=0.125 2024-09-19 01:16:44,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=561180.0, ans=0.125 2024-09-19 01:16:53,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.57 vs. limit=10.0 2024-09-19 01:16:59,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=561220.0, ans=0.125 2024-09-19 01:17:06,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=561260.0, ans=0.125 2024-09-19 01:17:18,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=561260.0, ans=0.05 2024-09-19 01:17:19,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=561300.0, ans=0.0 2024-09-19 01:17:20,693 INFO [train.py:1198] (1/2) Epoch 32, batch 50, loss[loss=0.2211, ctc_loss=0.1071, cr_loss=0.3411, attn_decoder_loss=0.2262, over 29427.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1233, cr_loss=0.3663, attn_decoder_loss=0.2441, over 1268232.14 frames. ], batch size: 70, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:17:45,065 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.848e+01 9.833e+01 1.147e+02 1.812e+02, threshold=1.967e+02, percent-clipped=0.0 2024-09-19 01:18:02,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=561380.0, ans=0.125 2024-09-19 01:18:12,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=561420.0, ans=0.1 2024-09-19 01:18:17,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561420.0, ans=0.1 2024-09-19 01:18:23,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=561460.0, ans=0.1 2024-09-19 01:18:36,351 INFO [train.py:1198] (1/2) Epoch 32, batch 100, loss[loss=0.2275, ctc_loss=0.1199, cr_loss=0.364, attn_decoder_loss=0.2313, over 29525.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1236, cr_loss=0.3691, attn_decoder_loss=0.2456, over 2251558.89 frames. ], batch size: 76, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:18:48,390 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.55 vs. limit=15.0 2024-09-19 01:19:09,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2024-09-19 01:19:14,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=561580.0, ans=0.0 2024-09-19 01:19:29,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=15.0 2024-09-19 01:19:32,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.80 vs. limit=15.0 2024-09-19 01:19:36,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=561620.0, ans=0.09899494936611666 2024-09-19 01:19:36,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561620.0, ans=0.1 2024-09-19 01:19:51,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=561660.0, ans=15.0 2024-09-19 01:19:53,957 INFO [train.py:1198] (1/2) Epoch 32, batch 150, loss[loss=0.2086, ctc_loss=0.09687, cr_loss=0.3205, attn_decoder_loss=0.2139, over 29439.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1205, cr_loss=0.3641, attn_decoder_loss=0.2434, over 3047836.54 frames. ], batch size: 70, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:19:57,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=561700.0, ans=0.0 2024-09-19 01:19:57,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=561700.0, ans=0.125 2024-09-19 01:19:57,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2024-09-19 01:19:58,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=561700.0, ans=0.035 2024-09-19 01:20:18,127 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.368e+01 8.757e+01 9.262e+01 1.493e+02, threshold=1.751e+02, percent-clipped=0.0 2024-09-19 01:20:38,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=561820.0, ans=0.0 2024-09-19 01:21:04,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=561860.0, ans=0.125 2024-09-19 01:21:11,621 INFO [train.py:1198] (1/2) Epoch 32, batch 200, loss[loss=0.2471, ctc_loss=0.1276, cr_loss=0.3825, attn_decoder_loss=0.2519, over 27409.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.12, cr_loss=0.3633, attn_decoder_loss=0.2429, over 3659966.13 frames. ], batch size: 124, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:21:26,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.86 vs. limit=5.0 2024-09-19 01:21:36,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-09-19 01:21:48,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=561980.0, ans=0.1 2024-09-19 01:22:24,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2024-09-19 01:22:27,308 INFO [train.py:1198] (1/2) Epoch 32, batch 250, loss[loss=0.2479, ctc_loss=0.1272, cr_loss=0.3866, attn_decoder_loss=0.2528, over 29279.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1196, cr_loss=0.3625, attn_decoder_loss=0.2425, over 4142435.04 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:22:49,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=562140.0, ans=10.0 2024-09-19 01:22:53,928 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.466e+01 9.044e+01 9.662e+01 1.743e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 01:23:03,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=562180.0, ans=0.125 2024-09-19 01:23:35,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.51 vs. limit=15.0 2024-09-19 01:23:45,499 INFO [train.py:1198] (1/2) Epoch 32, batch 300, loss[loss=0.2535, ctc_loss=0.1341, cr_loss=0.386, attn_decoder_loss=0.2582, over 29553.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1195, cr_loss=0.3624, attn_decoder_loss=0.2423, over 4511285.76 frames. ], batch size: 92, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:24:27,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=562380.0, ans=0.0 2024-09-19 01:24:27,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=562380.0, ans=0.0 2024-09-19 01:24:27,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.46 vs. limit=15.0 2024-09-19 01:24:42,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=562420.0, ans=22.5 2024-09-19 01:25:04,396 INFO [train.py:1198] (1/2) Epoch 32, batch 350, loss[loss=0.2183, ctc_loss=0.105, cr_loss=0.3237, attn_decoder_loss=0.2237, over 29299.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.12, cr_loss=0.3632, attn_decoder_loss=0.2426, over 4796318.03 frames. ], batch size: 71, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:25:28,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.336e+01 8.922e+01 9.619e+01 6.149e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-19 01:25:42,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=562580.0, ans=0.125 2024-09-19 01:26:03,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562660.0, ans=0.125 2024-09-19 01:26:05,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=562660.0, ans=0.0 2024-09-19 01:26:11,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562660.0, ans=0.125 2024-09-19 01:26:11,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=562660.0, ans=0.125 2024-09-19 01:26:15,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=562660.0, ans=0.125 2024-09-19 01:26:19,651 INFO [train.py:1198] (1/2) Epoch 32, batch 400, loss[loss=0.2467, ctc_loss=0.1267, cr_loss=0.3814, attn_decoder_loss=0.2516, over 29696.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1197, cr_loss=0.363, attn_decoder_loss=0.2424, over 5024609.59 frames. ], batch size: 82, lr: 3.48e-03, grad_scale: 16.0 2024-09-19 01:26:31,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-09-19 01:26:45,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=562740.0, ans=0.0 2024-09-19 01:26:51,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=562780.0, ans=0.0 2024-09-19 01:26:55,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-09-19 01:26:57,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.48 vs. limit=15.0 2024-09-19 01:27:38,243 INFO [train.py:1198] (1/2) Epoch 32, batch 450, loss[loss=0.2435, ctc_loss=0.1228, cr_loss=0.3571, attn_decoder_loss=0.249, over 29700.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1199, cr_loss=0.363, attn_decoder_loss=0.2425, over 5186077.29 frames. ], batch size: 83, lr: 3.48e-03, grad_scale: 16.0 2024-09-19 01:27:49,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=22.5 2024-09-19 01:27:57,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.71 vs. limit=15.0 2024-09-19 01:28:02,737 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.493e+01 8.894e+01 9.370e+01 1.465e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-19 01:28:06,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=562940.0, ans=0.125 2024-09-19 01:28:07,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=562980.0, ans=0.125 2024-09-19 01:28:28,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=563020.0, ans=0.125 2024-09-19 01:28:36,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=563020.0, ans=0.035 2024-09-19 01:28:39,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=563060.0, ans=0.05 2024-09-19 01:28:40,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=563060.0, ans=0.0 2024-09-19 01:28:56,474 INFO [train.py:1198] (1/2) Epoch 32, batch 500, loss[loss=0.2514, ctc_loss=0.1275, cr_loss=0.3715, attn_decoder_loss=0.2569, over 29444.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1195, cr_loss=0.3621, attn_decoder_loss=0.2419, over 5329552.85 frames. ], batch size: 94, lr: 3.48e-03, grad_scale: 16.0 2024-09-19 01:29:16,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=563140.0, ans=0.125 2024-09-19 01:29:47,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=563220.0, ans=0.125 2024-09-19 01:30:07,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2024-09-19 01:30:08,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=563260.0, ans=0.04949747468305833 2024-09-19 01:30:09,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.22 vs. limit=15.0 2024-09-19 01:30:10,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.39 vs. limit=15.0 2024-09-19 01:30:12,542 INFO [train.py:1198] (1/2) Epoch 32, batch 550, loss[loss=0.2555, ctc_loss=0.1307, cr_loss=0.3769, attn_decoder_loss=0.261, over 28925.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1199, cr_loss=0.3626, attn_decoder_loss=0.2421, over 5422748.35 frames. ], batch size: 104, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:30:29,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=563340.0, ans=0.2 2024-09-19 01:30:36,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=563340.0, ans=0.1 2024-09-19 01:30:40,429 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.423e+01 9.076e+01 9.566e+01 2.311e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 01:30:48,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=563380.0, ans=0.0 2024-09-19 01:31:18,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=563460.0, ans=0.035 2024-09-19 01:31:30,748 INFO [train.py:1198] (1/2) Epoch 32, batch 600, loss[loss=0.2503, ctc_loss=0.1303, cr_loss=0.3827, attn_decoder_loss=0.2551, over 29250.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1196, cr_loss=0.3622, attn_decoder_loss=0.2422, over 5507978.67 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:31:35,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.71 vs. limit=10.0 2024-09-19 01:31:44,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=563540.0, ans=0.0 2024-09-19 01:31:47,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=563540.0, ans=0.0 2024-09-19 01:31:58,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-09-19 01:32:04,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=563580.0, ans=0.0 2024-09-19 01:32:04,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=563580.0, ans=0.5 2024-09-19 01:32:05,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=563580.0, ans=0.0 2024-09-19 01:32:23,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=563620.0, ans=0.125 2024-09-19 01:32:25,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=563620.0, ans=0.125 2024-09-19 01:32:28,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=563620.0, ans=0.125 2024-09-19 01:32:29,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=563660.0, ans=0.125 2024-09-19 01:32:33,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2024-09-19 01:32:35,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.99 vs. limit=22.5 2024-09-19 01:32:40,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=563660.0, ans=0.2 2024-09-19 01:32:45,913 INFO [train.py:1198] (1/2) Epoch 32, batch 650, loss[loss=0.2411, ctc_loss=0.1182, cr_loss=0.3545, attn_decoder_loss=0.2468, over 29759.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1189, cr_loss=0.3604, attn_decoder_loss=0.2415, over 5584808.02 frames. ], batch size: 81, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:33:02,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=563740.0, ans=0.0 2024-09-19 01:33:02,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=563740.0, ans=0.2 2024-09-19 01:33:05,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=563740.0, ans=0.125 2024-09-19 01:33:14,107 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.421e+01 8.815e+01 9.543e+01 5.182e+02, threshold=1.763e+02, percent-clipped=1.0 2024-09-19 01:33:14,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=563740.0, ans=0.125 2024-09-19 01:33:14,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=563740.0, ans=0.125 2024-09-19 01:33:40,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=563820.0, ans=0.125 2024-09-19 01:33:49,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=563860.0, ans=0.0 2024-09-19 01:33:53,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=563860.0, ans=0.2 2024-09-19 01:34:04,049 INFO [train.py:1198] (1/2) Epoch 32, batch 700, loss[loss=0.2295, ctc_loss=0.1169, cr_loss=0.3487, attn_decoder_loss=0.2343, over 29512.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1196, cr_loss=0.3617, attn_decoder_loss=0.2421, over 5635022.05 frames. ], batch size: 76, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:34:19,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=563940.0, ans=0.025 2024-09-19 01:34:19,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=563940.0, ans=0.0 2024-09-19 01:34:19,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-19 01:34:25,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=563940.0, ans=0.0 2024-09-19 01:34:30,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563940.0, ans=0.1 2024-09-19 01:34:40,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=563980.0, ans=0.0 2024-09-19 01:34:57,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-09-19 01:35:01,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564020.0, ans=0.1 2024-09-19 01:35:22,669 INFO [train.py:1198] (1/2) Epoch 32, batch 750, loss[loss=0.2374, ctc_loss=0.1128, cr_loss=0.3246, attn_decoder_loss=0.2441, over 29696.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1191, cr_loss=0.3602, attn_decoder_loss=0.2416, over 5673396.83 frames. ], batch size: 82, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:35:24,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=564100.0, ans=0.0 2024-09-19 01:35:42,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=564140.0, ans=0.07 2024-09-19 01:35:43,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=564140.0, ans=0.0 2024-09-19 01:35:48,088 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.447e+01 8.933e+01 9.518e+01 3.479e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-19 01:35:57,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564180.0, ans=0.1 2024-09-19 01:36:06,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=564220.0, ans=0.09899494936611666 2024-09-19 01:36:07,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.90 vs. limit=15.0 2024-09-19 01:36:38,475 INFO [train.py:1198] (1/2) Epoch 32, batch 800, loss[loss=0.2222, ctc_loss=0.1056, cr_loss=0.3371, attn_decoder_loss=0.2277, over 29606.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1189, cr_loss=0.3598, attn_decoder_loss=0.2415, over 5704904.62 frames. ], batch size: 73, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:36:55,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.17 vs. limit=10.0 2024-09-19 01:37:29,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=564420.0, ans=0.125 2024-09-19 01:37:42,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=564460.0, ans=0.0 2024-09-19 01:37:56,055 INFO [train.py:1198] (1/2) Epoch 32, batch 850, loss[loss=0.2436, ctc_loss=0.1123, cr_loss=0.3292, attn_decoder_loss=0.2509, over 29693.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1186, cr_loss=0.3589, attn_decoder_loss=0.2411, over 5733824.02 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:37:59,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564500.0, ans=0.1 2024-09-19 01:38:12,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=564540.0, ans=0.125 2024-09-19 01:38:15,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=564540.0, ans=0.1 2024-09-19 01:38:21,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=564540.0, ans=0.125 2024-09-19 01:38:23,001 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.587e+01 9.050e+01 9.701e+01 1.930e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 01:38:30,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=564580.0, ans=0.125 2024-09-19 01:38:33,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=564580.0, ans=0.2 2024-09-19 01:38:34,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=564580.0, ans=0.0 2024-09-19 01:39:13,904 INFO [train.py:1198] (1/2) Epoch 32, batch 900, loss[loss=0.2225, ctc_loss=0.1074, cr_loss=0.3309, attn_decoder_loss=0.228, over 29570.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1192, cr_loss=0.3601, attn_decoder_loss=0.2417, over 5739613.87 frames. ], batch size: 73, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:39:18,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=564700.0, ans=0.125 2024-09-19 01:39:21,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=564700.0, ans=0.07 2024-09-19 01:39:23,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=564700.0, ans=0.125 2024-09-19 01:39:26,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-09-19 01:39:26,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.34 vs. limit=22.5 2024-09-19 01:39:30,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=564740.0, ans=0.05 2024-09-19 01:39:32,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2024-09-19 01:39:34,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-19 01:39:58,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=564820.0, ans=0.125 2024-09-19 01:40:27,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=564860.0, ans=0.0 2024-09-19 01:40:28,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=564900.0, ans=0.025 2024-09-19 01:40:29,655 INFO [train.py:1198] (1/2) Epoch 32, batch 950, loss[loss=0.2154, ctc_loss=0.1033, cr_loss=0.3099, attn_decoder_loss=0.221, over 29494.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1197, cr_loss=0.3608, attn_decoder_loss=0.2419, over 5742741.67 frames. ], batch size: 74, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:40:40,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=564900.0, ans=0.0 2024-09-19 01:40:57,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=564940.0, ans=0.2 2024-09-19 01:40:59,193 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.678e+01 9.053e+01 9.826e+01 2.124e+02, threshold=1.811e+02, percent-clipped=3.0 2024-09-19 01:41:07,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=564980.0, ans=0.025 2024-09-19 01:41:07,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2024-09-19 01:41:20,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=565020.0, ans=0.125 2024-09-19 01:41:22,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=565020.0, ans=0.0 2024-09-19 01:41:32,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=565060.0, ans=0.0 2024-09-19 01:41:40,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=565060.0, ans=0.0 2024-09-19 01:41:47,539 INFO [train.py:1198] (1/2) Epoch 32, batch 1000, loss[loss=0.2248, ctc_loss=0.1127, cr_loss=0.3493, attn_decoder_loss=0.2295, over 29524.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1201, cr_loss=0.3612, attn_decoder_loss=0.2423, over 5737400.70 frames. ], batch size: 77, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:41:49,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=565100.0, ans=0.0 2024-09-19 01:41:50,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=565100.0, ans=0.125 2024-09-19 01:42:19,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.90 vs. limit=15.0 2024-09-19 01:42:23,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=565180.0, ans=0.125 2024-09-19 01:42:43,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=565220.0, ans=0.0 2024-09-19 01:42:45,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.26 vs. limit=15.0 2024-09-19 01:42:46,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=565220.0, ans=10.0 2024-09-19 01:42:55,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=565260.0, ans=0.1 2024-09-19 01:43:02,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=565260.0, ans=0.0 2024-09-19 01:43:02,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=565260.0, ans=0.125 2024-09-19 01:43:05,367 INFO [train.py:1198] (1/2) Epoch 32, batch 1050, loss[loss=0.2395, ctc_loss=0.1177, cr_loss=0.3417, attn_decoder_loss=0.2455, over 29674.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.12, cr_loss=0.3612, attn_decoder_loss=0.2418, over 5745808.82 frames. ], batch size: 85, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:43:10,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=565300.0, ans=0.07 2024-09-19 01:43:23,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=565340.0, ans=0.125 2024-09-19 01:43:32,745 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.597e+01 9.014e+01 9.453e+01 2.467e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 01:43:44,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.93 vs. limit=12.0 2024-09-19 01:44:21,449 INFO [train.py:1198] (1/2) Epoch 32, batch 1100, loss[loss=0.2424, ctc_loss=0.1246, cr_loss=0.384, attn_decoder_loss=0.2469, over 29461.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.12, cr_loss=0.3614, attn_decoder_loss=0.2419, over 5758217.20 frames. ], batch size: 78, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:44:33,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2024-09-19 01:44:45,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.61 vs. limit=12.0 2024-09-19 01:44:47,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=565540.0, ans=0.125 2024-09-19 01:45:24,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.89 vs. limit=10.0 2024-09-19 01:45:40,260 INFO [train.py:1198] (1/2) Epoch 32, batch 1150, loss[loss=0.2406, ctc_loss=0.1257, cr_loss=0.3965, attn_decoder_loss=0.2446, over 29474.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1201, cr_loss=0.3614, attn_decoder_loss=0.2421, over 5755811.72 frames. ], batch size: 78, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:45:58,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=565740.0, ans=0.125 2024-09-19 01:46:08,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=565740.0, ans=0.2 2024-09-19 01:46:10,194 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.460e+01 8.830e+01 9.335e+01 1.572e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-19 01:46:36,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=565820.0, ans=0.2 2024-09-19 01:46:39,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=565820.0, ans=0.125 2024-09-19 01:46:48,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565860.0, ans=0.1 2024-09-19 01:46:57,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=565900.0, ans=0.1 2024-09-19 01:46:58,700 INFO [train.py:1198] (1/2) Epoch 32, batch 1200, loss[loss=0.2387, ctc_loss=0.1182, cr_loss=0.3575, attn_decoder_loss=0.2442, over 29689.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1204, cr_loss=0.3622, attn_decoder_loss=0.2426, over 5748055.40 frames. ], batch size: 85, lr: 3.47e-03, grad_scale: 16.0 2024-09-19 01:47:17,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-19 01:47:46,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=566020.0, ans=0.125 2024-09-19 01:48:14,673 INFO [train.py:1198] (1/2) Epoch 32, batch 1250, loss[loss=0.2524, ctc_loss=0.1357, cr_loss=0.3849, attn_decoder_loss=0.2568, over 29538.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1207, cr_loss=0.3634, attn_decoder_loss=0.2431, over 5775316.87 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:48:24,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.56 vs. limit=15.0 2024-09-19 01:48:28,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=566140.0, ans=0.125 2024-09-19 01:48:39,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=566140.0, ans=0.035 2024-09-19 01:48:43,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.626e+01 9.127e+01 9.598e+01 1.741e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-19 01:49:22,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=566260.0, ans=0.125 2024-09-19 01:49:22,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=566260.0, ans=0.0 2024-09-19 01:49:32,739 INFO [train.py:1198] (1/2) Epoch 32, batch 1300, loss[loss=0.2434, ctc_loss=0.1178, cr_loss=0.3582, attn_decoder_loss=0.2494, over 28607.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.12, cr_loss=0.3613, attn_decoder_loss=0.2422, over 5778606.81 frames. ], batch size: 112, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:49:48,423 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:50:36,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=566460.0, ans=0.125 2024-09-19 01:50:44,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.12 vs. limit=15.0 2024-09-19 01:50:51,304 INFO [train.py:1198] (1/2) Epoch 32, batch 1350, loss[loss=0.2367, ctc_loss=0.1201, cr_loss=0.3608, attn_decoder_loss=0.2417, over 29768.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1195, cr_loss=0.3612, attn_decoder_loss=0.2418, over 5796842.38 frames. ], batch size: 81, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:50:53,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=566500.0, ans=0.125 2024-09-19 01:51:06,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=566540.0, ans=0.125 2024-09-19 01:51:19,498 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.117e+01 8.702e+01 9.337e+01 1.229e+02, threshold=1.740e+02, percent-clipped=0.0 2024-09-19 01:51:40,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=12.0 2024-09-19 01:51:58,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.69 vs. limit=10.0 2024-09-19 01:52:02,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=566660.0, ans=0.125 2024-09-19 01:52:04,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-19 01:52:06,533 INFO [train.py:1198] (1/2) Epoch 32, batch 1400, loss[loss=0.2058, ctc_loss=0.0915, cr_loss=0.2935, attn_decoder_loss=0.212, over 29560.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1193, cr_loss=0.3607, attn_decoder_loss=0.2417, over 5807902.58 frames. ], batch size: 69, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:52:26,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.86 vs. limit=12.0 2024-09-19 01:52:46,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.83 vs. limit=15.0 2024-09-19 01:52:47,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=566780.0, ans=0.1 2024-09-19 01:52:55,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-19 01:53:08,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=566860.0, ans=0.1 2024-09-19 01:53:12,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566860.0, ans=0.1 2024-09-19 01:53:14,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=566860.0, ans=0.0 2024-09-19 01:53:24,541 INFO [train.py:1198] (1/2) Epoch 32, batch 1450, loss[loss=0.2552, ctc_loss=0.1349, cr_loss=0.4089, attn_decoder_loss=0.2595, over 29432.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1193, cr_loss=0.3609, attn_decoder_loss=0.242, over 5804066.30 frames. ], batch size: 94, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 01:53:38,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=566940.0, ans=0.0 2024-09-19 01:53:41,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566940.0, ans=0.1 2024-09-19 01:53:47,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=566940.0, ans=0.1 2024-09-19 01:53:53,180 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.580e+01 8.959e+01 9.480e+01 1.633e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 01:54:10,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=567020.0, ans=0.125 2024-09-19 01:54:27,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=567060.0, ans=0.0 2024-09-19 01:54:28,918 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:54:31,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=567060.0, ans=0.125 2024-09-19 01:54:42,377 INFO [train.py:1198] (1/2) Epoch 32, batch 1500, loss[loss=0.2416, ctc_loss=0.1199, cr_loss=0.3541, attn_decoder_loss=0.2472, over 29642.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1194, cr_loss=0.3612, attn_decoder_loss=0.2423, over 5804806.62 frames. ], batch size: 86, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 01:54:46,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.62 vs. limit=15.0 2024-09-19 01:54:48,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=567100.0, ans=0.0 2024-09-19 01:55:02,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=567140.0, ans=0.2 2024-09-19 01:55:38,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-19 01:55:49,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=567260.0, ans=0.125 2024-09-19 01:55:52,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=567260.0, ans=0.125 2024-09-19 01:55:52,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=567260.0, ans=0.125 2024-09-19 01:55:57,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.05 vs. limit=22.5 2024-09-19 01:55:58,510 INFO [train.py:1198] (1/2) Epoch 32, batch 1550, loss[loss=0.2547, ctc_loss=0.1322, cr_loss=0.3922, attn_decoder_loss=0.2597, over 29528.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1196, cr_loss=0.361, attn_decoder_loss=0.242, over 5779497.81 frames. ], batch size: 90, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 01:55:58,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=567300.0, ans=0.2 2024-09-19 01:56:11,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=567300.0, ans=0.125 2024-09-19 01:56:13,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=567340.0, ans=0.0 2024-09-19 01:56:17,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=567340.0, ans=0.125 2024-09-19 01:56:18,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=567340.0, ans=0.0 2024-09-19 01:56:27,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.583e+01 9.090e+01 9.539e+01 2.299e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-19 01:56:39,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=567380.0, ans=0.125 2024-09-19 01:57:00,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.31 vs. limit=15.0 2024-09-19 01:57:16,197 INFO [train.py:1198] (1/2) Epoch 32, batch 1600, loss[loss=0.2394, ctc_loss=0.1142, cr_loss=0.3444, attn_decoder_loss=0.2457, over 29714.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1194, cr_loss=0.3606, attn_decoder_loss=0.2419, over 5760809.96 frames. ], batch size: 85, lr: 3.46e-03, grad_scale: 16.0 2024-09-19 01:57:25,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=567500.0, ans=0.025 2024-09-19 01:57:34,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=567540.0, ans=0.125 2024-09-19 01:57:49,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=567580.0, ans=0.05 2024-09-19 01:57:55,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=567580.0, ans=0.0 2024-09-19 01:58:25,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.69 vs. limit=15.0 2024-09-19 01:58:28,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=567660.0, ans=0.0 2024-09-19 01:58:28,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=567660.0, ans=0.0 2024-09-19 01:58:34,175 INFO [train.py:1198] (1/2) Epoch 32, batch 1650, loss[loss=0.2503, ctc_loss=0.1224, cr_loss=0.3683, attn_decoder_loss=0.2564, over 29726.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1193, cr_loss=0.3606, attn_decoder_loss=0.2419, over 5756351.97 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 01:58:42,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=567700.0, ans=0.125 2024-09-19 01:58:47,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=567740.0, ans=0.125 2024-09-19 01:59:04,229 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.351e+01 8.988e+01 9.892e+01 1.504e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-19 01:59:19,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=567820.0, ans=0.0 2024-09-19 01:59:30,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567820.0, ans=0.1 2024-09-19 01:59:33,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=567860.0, ans=0.0 2024-09-19 01:59:36,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=567860.0, ans=0.125 2024-09-19 01:59:49,521 INFO [train.py:1198] (1/2) Epoch 32, batch 1700, loss[loss=0.2067, ctc_loss=0.09634, cr_loss=0.3241, attn_decoder_loss=0.2118, over 29566.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1191, cr_loss=0.3607, attn_decoder_loss=0.2418, over 5778192.71 frames. ], batch size: 69, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:00:00,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=567900.0, ans=0.125 2024-09-19 02:00:10,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2024-09-19 02:00:15,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=567940.0, ans=0.0 2024-09-19 02:00:19,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=567980.0, ans=0.125 2024-09-19 02:01:05,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=568060.0, ans=0.125 2024-09-19 02:01:07,013 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:01:08,136 INFO [train.py:1198] (1/2) Epoch 32, batch 1750, loss[loss=0.2128, ctc_loss=0.1049, cr_loss=0.3156, attn_decoder_loss=0.2178, over 29342.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1187, cr_loss=0.3597, attn_decoder_loss=0.2414, over 5786781.04 frames. ], batch size: 67, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:01:08,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=568100.0, ans=0.0 2024-09-19 02:01:31,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=568140.0, ans=0.125 2024-09-19 02:01:38,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=568140.0, ans=0.5 2024-09-19 02:01:38,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2024-09-19 02:01:40,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.621e+01 8.991e+01 9.586e+01 2.043e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-19 02:01:42,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=568180.0, ans=0.0 2024-09-19 02:01:48,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=568180.0, ans=0.025 2024-09-19 02:01:54,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=568220.0, ans=0.125 2024-09-19 02:02:10,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=568260.0, ans=0.025 2024-09-19 02:02:25,097 INFO [train.py:1198] (1/2) Epoch 32, batch 1800, loss[loss=0.2428, ctc_loss=0.1209, cr_loss=0.3542, attn_decoder_loss=0.2485, over 29703.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.119, cr_loss=0.3601, attn_decoder_loss=0.2417, over 5789430.45 frames. ], batch size: 83, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:03:41,086 INFO [train.py:1198] (1/2) Epoch 32, batch 1850, loss[loss=0.2474, ctc_loss=0.1265, cr_loss=0.3531, attn_decoder_loss=0.253, over 29607.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.119, cr_loss=0.3597, attn_decoder_loss=0.2413, over 5796580.05 frames. ], batch size: 86, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:04:00,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=568540.0, ans=0.2 2024-09-19 02:04:11,110 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 8.550e+01 9.044e+01 9.477e+01 1.404e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 02:04:16,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=568580.0, ans=0.125 2024-09-19 02:04:19,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.21 vs. limit=15.0 2024-09-19 02:04:32,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=568620.0, ans=0.0 2024-09-19 02:04:35,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=568620.0, ans=0.0 2024-09-19 02:04:58,431 INFO [train.py:1198] (1/2) Epoch 32, batch 1900, loss[loss=0.2465, ctc_loss=0.1175, cr_loss=0.3559, attn_decoder_loss=0.253, over 29690.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1197, cr_loss=0.3613, attn_decoder_loss=0.2423, over 5804582.92 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:05:17,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2024-09-19 02:05:18,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=568740.0, ans=0.0 2024-09-19 02:05:39,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=568780.0, ans=0.2 2024-09-19 02:05:46,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.36 vs. limit=15.0 2024-09-19 02:05:53,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=568820.0, ans=0.0 2024-09-19 02:06:04,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=12.0 2024-09-19 02:06:08,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=568860.0, ans=0.0 2024-09-19 02:06:09,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=568860.0, ans=0.2 2024-09-19 02:06:16,861 INFO [train.py:1198] (1/2) Epoch 32, batch 1950, loss[loss=0.231, ctc_loss=0.113, cr_loss=0.3555, attn_decoder_loss=0.2362, over 29435.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1204, cr_loss=0.3635, attn_decoder_loss=0.2436, over 5819255.24 frames. ], batch size: 78, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:06:23,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=568900.0, ans=0.0 2024-09-19 02:06:47,104 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.678e+01 9.081e+01 9.709e+01 1.589e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 02:06:52,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=568980.0, ans=0.0 2024-09-19 02:06:59,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=568980.0, ans=0.1 2024-09-19 02:07:11,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=569020.0, ans=0.0 2024-09-19 02:07:14,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569020.0, ans=0.1 2024-09-19 02:07:23,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=569060.0, ans=0.125 2024-09-19 02:07:32,375 INFO [train.py:1198] (1/2) Epoch 32, batch 2000, loss[loss=0.2143, ctc_loss=0.1028, cr_loss=0.3357, attn_decoder_loss=0.2192, over 29352.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1208, cr_loss=0.3647, attn_decoder_loss=0.244, over 5795424.58 frames. ], batch size: 67, lr: 3.46e-03, grad_scale: 16.0 2024-09-19 02:07:32,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=569100.0, ans=0.125 2024-09-19 02:08:13,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=569180.0, ans=0.025 2024-09-19 02:08:24,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569220.0, ans=0.1 2024-09-19 02:08:50,444 INFO [train.py:1198] (1/2) Epoch 32, batch 2050, loss[loss=0.2145, ctc_loss=0.1033, cr_loss=0.3354, attn_decoder_loss=0.2194, over 29418.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1203, cr_loss=0.3638, attn_decoder_loss=0.243, over 5788467.14 frames. ], batch size: 70, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:08:50,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=569300.0, ans=0.0 2024-09-19 02:08:58,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=569300.0, ans=0.1 2024-09-19 02:09:04,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.27 vs. limit=15.0 2024-09-19 02:09:24,445 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.440e+01 8.893e+01 9.652e+01 5.207e+02, threshold=1.779e+02, percent-clipped=1.0 2024-09-19 02:09:45,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=569420.0, ans=0.1 2024-09-19 02:09:50,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=569420.0, ans=0.0 2024-09-19 02:09:59,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=569460.0, ans=0.125 2024-09-19 02:10:00,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=569460.0, ans=0.0 2024-09-19 02:10:08,278 INFO [train.py:1198] (1/2) Epoch 32, batch 2100, loss[loss=0.2304, ctc_loss=0.1111, cr_loss=0.3432, attn_decoder_loss=0.2361, over 29767.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1197, cr_loss=0.3626, attn_decoder_loss=0.2424, over 5799187.24 frames. ], batch size: 81, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:10:42,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.53 vs. limit=15.0 2024-09-19 02:10:47,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=569580.0, ans=0.1 2024-09-19 02:10:52,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=569620.0, ans=0.125 2024-09-19 02:10:55,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=569620.0, ans=0.125 2024-09-19 02:10:56,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=569620.0, ans=0.125 2024-09-19 02:11:02,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=569620.0, ans=0.1 2024-09-19 02:11:08,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=569660.0, ans=0.125 2024-09-19 02:11:11,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=569660.0, ans=0.0 2024-09-19 02:11:17,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=569660.0, ans=0.0 2024-09-19 02:11:19,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569660.0, ans=0.125 2024-09-19 02:11:23,547 INFO [train.py:1198] (1/2) Epoch 32, batch 2150, loss[loss=0.2366, ctc_loss=0.1221, cr_loss=0.383, attn_decoder_loss=0.2408, over 29413.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1193, cr_loss=0.3619, attn_decoder_loss=0.2419, over 5813551.73 frames. ], batch size: 78, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:11:25,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=569700.0, ans=0.125 2024-09-19 02:11:25,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=569700.0, ans=0.0 2024-09-19 02:11:36,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=569700.0, ans=0.125 2024-09-19 02:11:42,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.72 vs. limit=10.0 2024-09-19 02:11:48,218 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:11:55,487 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.476e+01 8.874e+01 9.335e+01 1.569e+02, threshold=1.775e+02, percent-clipped=0.0 2024-09-19 02:11:58,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=569780.0, ans=0.125 2024-09-19 02:12:12,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=569820.0, ans=0.0 2024-09-19 02:12:41,307 INFO [train.py:1198] (1/2) Epoch 32, batch 2200, loss[loss=0.2444, ctc_loss=0.1175, cr_loss=0.3385, attn_decoder_loss=0.251, over 29631.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1196, cr_loss=0.363, attn_decoder_loss=0.2423, over 5810036.18 frames. ], batch size: 86, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:12:42,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.93 vs. limit=15.0 2024-09-19 02:12:45,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.45 vs. limit=22.5 2024-09-19 02:13:05,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=569940.0, ans=0.125 2024-09-19 02:13:08,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=569940.0, ans=0.0 2024-09-19 02:13:45,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=12.14 vs. limit=12.0 2024-09-19 02:13:46,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=570060.0, ans=0.025 2024-09-19 02:13:50,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=570060.0, ans=0.5 2024-09-19 02:13:51,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=570060.0, ans=0.125 2024-09-19 02:13:55,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=570060.0, ans=15.0 2024-09-19 02:13:59,411 INFO [train.py:1198] (1/2) Epoch 32, batch 2250, loss[loss=0.2492, ctc_loss=0.1238, cr_loss=0.3727, attn_decoder_loss=0.2548, over 29703.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1194, cr_loss=0.3624, attn_decoder_loss=0.2421, over 5808327.24 frames. ], batch size: 82, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:13:59,716 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:14:31,248 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.789e+01 8.483e+01 9.181e+01 9.844e+01 2.065e+02, threshold=1.836e+02, percent-clipped=2.0 2024-09-19 02:14:31,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=570180.0, ans=0.0 2024-09-19 02:14:44,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.60 vs. limit=15.0 2024-09-19 02:14:49,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=570220.0, ans=0.0 2024-09-19 02:14:59,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.37 vs. limit=22.5 2024-09-19 02:15:08,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=570260.0, ans=0.125 2024-09-19 02:15:11,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=22.5 2024-09-19 02:15:12,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=570260.0, ans=0.0 2024-09-19 02:15:15,151 INFO [train.py:1198] (1/2) Epoch 32, batch 2300, loss[loss=0.2126, ctc_loss=0.1079, cr_loss=0.3394, attn_decoder_loss=0.2167, over 29316.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1186, cr_loss=0.3604, attn_decoder_loss=0.2409, over 5798343.45 frames. ], batch size: 71, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:15:17,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=570300.0, ans=0.025 2024-09-19 02:15:17,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570300.0, ans=0.1 2024-09-19 02:15:30,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=570340.0, ans=0.1 2024-09-19 02:15:41,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=570340.0, ans=0.09899494936611666 2024-09-19 02:15:59,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=570420.0, ans=0.125 2024-09-19 02:16:05,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=570420.0, ans=0.0 2024-09-19 02:16:10,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-09-19 02:16:31,331 INFO [train.py:1198] (1/2) Epoch 32, batch 2350, loss[loss=0.2414, ctc_loss=0.1201, cr_loss=0.3595, attn_decoder_loss=0.2469, over 29705.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1188, cr_loss=0.3608, attn_decoder_loss=0.2413, over 5805316.17 frames. ], batch size: 83, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:16:47,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=570540.0, ans=0.125 2024-09-19 02:17:07,393 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.588e+01 9.289e+01 9.851e+01 1.770e+02, threshold=1.858e+02, percent-clipped=0.0 2024-09-19 02:17:21,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570620.0, ans=0.1 2024-09-19 02:17:22,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=570620.0, ans=0.2 2024-09-19 02:17:24,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=570620.0, ans=0.07 2024-09-19 02:17:27,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=570620.0, ans=0.125 2024-09-19 02:17:34,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=570660.0, ans=0.05 2024-09-19 02:17:50,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-09-19 02:17:51,211 INFO [train.py:1198] (1/2) Epoch 32, batch 2400, loss[loss=0.2365, ctc_loss=0.1248, cr_loss=0.3762, attn_decoder_loss=0.2405, over 29550.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1195, cr_loss=0.3627, attn_decoder_loss=0.2421, over 5808896.73 frames. ], batch size: 76, lr: 3.45e-03, grad_scale: 16.0 2024-09-19 02:19:07,315 INFO [train.py:1198] (1/2) Epoch 32, batch 2450, loss[loss=0.2428, ctc_loss=0.1244, cr_loss=0.3658, attn_decoder_loss=0.2478, over 29740.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1203, cr_loss=0.3637, attn_decoder_loss=0.2431, over 5784935.90 frames. ], batch size: 82, lr: 3.45e-03, grad_scale: 16.0 2024-09-19 02:19:17,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2024-09-19 02:19:24,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.42 vs. limit=15.0 2024-09-19 02:19:25,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=570940.0, ans=0.0 2024-09-19 02:19:26,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.26 vs. limit=15.0 2024-09-19 02:19:30,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=570940.0, ans=0.125 2024-09-19 02:19:33,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=570940.0, ans=0.125 2024-09-19 02:19:39,211 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.392e+01 8.947e+01 9.639e+01 2.320e+02, threshold=1.789e+02, percent-clipped=2.0 2024-09-19 02:19:39,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-09-19 02:19:44,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.43 vs. limit=15.0 2024-09-19 02:19:48,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570980.0, ans=0.1 2024-09-19 02:19:56,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=571020.0, ans=0.2 2024-09-19 02:20:06,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=571060.0, ans=0.0 2024-09-19 02:20:20,781 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:20:23,375 INFO [train.py:1198] (1/2) Epoch 32, batch 2500, loss[loss=0.2498, ctc_loss=0.1282, cr_loss=0.3881, attn_decoder_loss=0.2547, over 29629.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1204, cr_loss=0.3641, attn_decoder_loss=0.2429, over 5796255.92 frames. ], batch size: 86, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:20:37,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-09-19 02:20:42,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=571140.0, ans=0.1 2024-09-19 02:20:44,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=571140.0, ans=0.2 2024-09-19 02:20:46,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-09-19 02:20:58,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=571180.0, ans=0.125 2024-09-19 02:21:11,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=571180.0, ans=0.1 2024-09-19 02:21:17,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=571220.0, ans=0.125 2024-09-19 02:21:27,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=571260.0, ans=0.5 2024-09-19 02:21:32,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=571260.0, ans=0.2 2024-09-19 02:21:38,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=571260.0, ans=0.09899494936611666 2024-09-19 02:21:43,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=571300.0, ans=0.0 2024-09-19 02:21:44,430 INFO [train.py:1198] (1/2) Epoch 32, batch 2550, loss[loss=0.2122, ctc_loss=0.1056, cr_loss=0.3379, attn_decoder_loss=0.2165, over 29373.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1199, cr_loss=0.3631, attn_decoder_loss=0.2427, over 5798459.11 frames. ], batch size: 67, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:21:44,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=571300.0, ans=0.125 2024-09-19 02:21:49,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=571300.0, ans=0.1 2024-09-19 02:21:55,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=12.0 2024-09-19 02:22:17,610 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.677e+01 9.042e+01 9.632e+01 1.838e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 02:22:19,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=571380.0, ans=0.125 2024-09-19 02:22:40,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-19 02:23:00,326 INFO [train.py:1198] (1/2) Epoch 32, batch 2600, loss[loss=0.2246, ctc_loss=0.1035, cr_loss=0.3373, attn_decoder_loss=0.2305, over 29448.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1199, cr_loss=0.363, attn_decoder_loss=0.2429, over 5795152.38 frames. ], batch size: 78, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:23:01,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2024-09-19 02:23:04,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-09-19 02:23:18,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=571540.0, ans=0.125 2024-09-19 02:23:19,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=571540.0, ans=0.2 2024-09-19 02:23:21,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.13 vs. limit=15.0 2024-09-19 02:23:39,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=571580.0, ans=0.1 2024-09-19 02:23:45,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=571620.0, ans=0.0 2024-09-19 02:24:15,302 INFO [train.py:1198] (1/2) Epoch 32, batch 2650, loss[loss=0.2461, ctc_loss=0.1267, cr_loss=0.3712, attn_decoder_loss=0.2511, over 29238.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1197, cr_loss=0.3623, attn_decoder_loss=0.243, over 5800936.48 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:24:23,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.57 vs. limit=22.5 2024-09-19 02:24:32,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571740.0, ans=0.1 2024-09-19 02:24:34,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=571740.0, ans=0.5 2024-09-19 02:24:52,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.436e+01 8.918e+01 9.348e+01 1.627e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 02:24:57,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=571780.0, ans=0.125 2024-09-19 02:25:07,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.49 vs. limit=8.0 2024-09-19 02:25:10,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=15.0 2024-09-19 02:25:34,874 INFO [train.py:1198] (1/2) Epoch 32, batch 2700, loss[loss=0.2463, ctc_loss=0.1181, cr_loss=0.3562, attn_decoder_loss=0.2526, over 29539.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1199, cr_loss=0.3629, attn_decoder_loss=0.2432, over 5796303.19 frames. ], batch size: 87, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:25:38,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=571900.0, ans=0.2 2024-09-19 02:25:44,245 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:25:53,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=571940.0, ans=0.125 2024-09-19 02:25:59,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.43 vs. limit=10.0 2024-09-19 02:26:18,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-09-19 02:26:30,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=572020.0, ans=0.0 2024-09-19 02:26:51,238 INFO [train.py:1198] (1/2) Epoch 32, batch 2750, loss[loss=0.2288, ctc_loss=0.1088, cr_loss=0.333, attn_decoder_loss=0.2348, over 29499.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1198, cr_loss=0.3623, attn_decoder_loss=0.2425, over 5795174.42 frames. ], batch size: 75, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:27:00,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=572100.0, ans=0.025 2024-09-19 02:27:04,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=572140.0, ans=0.125 2024-09-19 02:27:20,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=572180.0, ans=0.04949747468305833 2024-09-19 02:27:24,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.589e+01 9.060e+01 9.796e+01 2.270e+02, threshold=1.812e+02, percent-clipped=2.0 2024-09-19 02:27:30,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=572180.0, ans=0.025 2024-09-19 02:27:41,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=572220.0, ans=0.125 2024-09-19 02:27:48,752 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:27:50,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=572260.0, ans=15.0 2024-09-19 02:28:07,093 INFO [train.py:1198] (1/2) Epoch 32, batch 2800, loss[loss=0.2688, ctc_loss=0.159, cr_loss=0.4005, attn_decoder_loss=0.2721, over 20354.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1204, cr_loss=0.3628, attn_decoder_loss=0.2428, over 5776632.83 frames. ], batch size: 210, lr: 3.45e-03, grad_scale: 16.0 2024-09-19 02:28:24,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=572340.0, ans=0.1 2024-09-19 02:28:24,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=572340.0, ans=0.125 2024-09-19 02:28:45,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572380.0, ans=0.1 2024-09-19 02:28:46,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=572380.0, ans=0.1 2024-09-19 02:29:01,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=572420.0, ans=15.0 2024-09-19 02:29:16,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=572460.0, ans=0.125 2024-09-19 02:29:18,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=572460.0, ans=0.125 2024-09-19 02:29:18,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=572460.0, ans=0.0 2024-09-19 02:29:26,879 INFO [train.py:1198] (1/2) Epoch 32, batch 2850, loss[loss=0.2346, ctc_loss=0.1169, cr_loss=0.3705, attn_decoder_loss=0.2394, over 29513.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1208, cr_loss=0.3636, attn_decoder_loss=0.2432, over 5762638.59 frames. ], batch size: 77, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:29:37,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=572500.0, ans=0.07 2024-09-19 02:29:42,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=572540.0, ans=0.2 2024-09-19 02:30:01,860 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.715e+01 9.222e+01 9.934e+01 2.539e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-19 02:30:33,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=572660.0, ans=0.1 2024-09-19 02:30:36,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=572660.0, ans=0.125 2024-09-19 02:30:42,439 INFO [train.py:1198] (1/2) Epoch 32, batch 2900, loss[loss=0.2279, ctc_loss=0.1103, cr_loss=0.3538, attn_decoder_loss=0.2331, over 29423.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.121, cr_loss=0.3647, attn_decoder_loss=0.2439, over 5788095.47 frames. ], batch size: 79, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:30:47,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=572700.0, ans=0.1 2024-09-19 02:30:58,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=572740.0, ans=0.125 2024-09-19 02:31:01,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=572740.0, ans=0.1 2024-09-19 02:31:01,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.31 vs. limit=10.0 2024-09-19 02:31:21,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=572780.0, ans=0.125 2024-09-19 02:31:32,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.09 vs. limit=22.5 2024-09-19 02:31:41,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-09-19 02:31:43,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=572860.0, ans=0.125 2024-09-19 02:31:48,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2024-09-19 02:31:51,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=572860.0, ans=0.125 2024-09-19 02:31:54,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.78 vs. limit=22.5 2024-09-19 02:31:58,401 INFO [train.py:1198] (1/2) Epoch 32, batch 2950, loss[loss=0.2282, ctc_loss=0.1055, cr_loss=0.3398, attn_decoder_loss=0.2343, over 29496.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1201, cr_loss=0.3626, attn_decoder_loss=0.2428, over 5783252.10 frames. ], batch size: 75, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:32:09,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=572900.0, ans=0.2 2024-09-19 02:32:10,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.99 vs. limit=22.5 2024-09-19 02:32:37,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.68 vs. limit=15.0 2024-09-19 02:32:37,784 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.544e+01 8.997e+01 9.588e+01 2.155e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-19 02:32:50,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2024-09-19 02:33:18,465 INFO [train.py:1198] (1/2) Epoch 32, batch 3000, loss[loss=0.236, ctc_loss=0.1229, cr_loss=0.386, attn_decoder_loss=0.24, over 29751.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1199, cr_loss=0.3625, attn_decoder_loss=0.2427, over 5784397.95 frames. ], batch size: 81, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:33:18,466 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 02:33:36,935 INFO [train.py:1230] (1/2) Epoch 32, validation: loss=0.2117, ctc_loss=0.0367, cr_loss=5.626e-15, attn_decoder_loss=0.2311, over 944034.00 frames. 2024-09-19 02:33:36,936 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 02:33:40,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=573100.0, ans=0.125 2024-09-19 02:33:42,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=573100.0, ans=0.0 2024-09-19 02:33:46,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=573100.0, ans=0.025 2024-09-19 02:34:04,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=573140.0, ans=0.0 2024-09-19 02:34:10,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=573180.0, ans=0.0 2024-09-19 02:34:12,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=573180.0, ans=0.125 2024-09-19 02:34:24,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=573220.0, ans=0.0 2024-09-19 02:34:46,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=573260.0, ans=0.125 2024-09-19 02:34:52,854 INFO [train.py:1198] (1/2) Epoch 32, batch 3050, loss[loss=0.2339, ctc_loss=0.1211, cr_loss=0.3697, attn_decoder_loss=0.2382, over 29506.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1206, cr_loss=0.3642, attn_decoder_loss=0.2435, over 5777657.56 frames. ], batch size: 76, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:35:14,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=573340.0, ans=0.07 2024-09-19 02:35:19,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=573340.0, ans=22.5 2024-09-19 02:35:27,632 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.755e+01 9.253e+01 9.957e+01 1.667e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-19 02:35:57,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=573460.0, ans=0.125 2024-09-19 02:36:12,281 INFO [train.py:1198] (1/2) Epoch 32, batch 3100, loss[loss=0.2593, ctc_loss=0.1339, cr_loss=0.3858, attn_decoder_loss=0.2647, over 29238.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1203, cr_loss=0.3635, attn_decoder_loss=0.2431, over 5777900.20 frames. ], batch size: 100, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:36:42,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=573580.0, ans=0.0 2024-09-19 02:36:48,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=573580.0, ans=0.0 2024-09-19 02:36:55,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=573580.0, ans=10.0 2024-09-19 02:37:03,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.60 vs. limit=22.5 2024-09-19 02:37:12,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573660.0, ans=0.1 2024-09-19 02:37:18,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=573660.0, ans=0.2 2024-09-19 02:37:25,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=573660.0, ans=0.125 2024-09-19 02:37:27,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-09-19 02:37:28,501 INFO [train.py:1198] (1/2) Epoch 32, batch 3150, loss[loss=0.2576, ctc_loss=0.1327, cr_loss=0.4066, attn_decoder_loss=0.2624, over 28766.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1202, cr_loss=0.3632, attn_decoder_loss=0.2432, over 5784085.46 frames. ], batch size: 104, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:37:31,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=573700.0, ans=0.0 2024-09-19 02:37:33,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573700.0, ans=0.1 2024-09-19 02:37:53,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=573740.0, ans=0.125 2024-09-19 02:37:59,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=573780.0, ans=0.0 2024-09-19 02:37:59,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=573780.0, ans=0.125 2024-09-19 02:38:03,501 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.454e+01 8.918e+01 9.492e+01 5.119e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-19 02:38:43,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=573900.0, ans=0.0 2024-09-19 02:38:44,610 INFO [train.py:1198] (1/2) Epoch 32, batch 3200, loss[loss=0.2323, ctc_loss=0.1112, cr_loss=0.3302, attn_decoder_loss=0.2384, over 29794.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.12, cr_loss=0.3627, attn_decoder_loss=0.2426, over 5794932.88 frames. ], batch size: 80, lr: 3.44e-03, grad_scale: 16.0 2024-09-19 02:39:07,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=573940.0, ans=0.0 2024-09-19 02:39:40,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=574020.0, ans=0.125 2024-09-19 02:40:00,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-19 02:40:01,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=574100.0, ans=0.2 2024-09-19 02:40:04,581 INFO [train.py:1198] (1/2) Epoch 32, batch 3250, loss[loss=0.2391, ctc_loss=0.1234, cr_loss=0.3573, attn_decoder_loss=0.244, over 29686.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1202, cr_loss=0.3631, attn_decoder_loss=0.2429, over 5801767.25 frames. ], batch size: 84, lr: 3.44e-03, grad_scale: 16.0 2024-09-19 02:40:16,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=574100.0, ans=0.0 2024-09-19 02:40:19,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=574140.0, ans=0.125 2024-09-19 02:40:40,323 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.537e+01 9.027e+01 9.508e+01 1.850e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-19 02:40:52,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=574220.0, ans=0.2 2024-09-19 02:41:00,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=574220.0, ans=0.125 2024-09-19 02:41:19,691 INFO [train.py:1198] (1/2) Epoch 32, batch 3300, loss[loss=0.2409, ctc_loss=0.1147, cr_loss=0.3546, attn_decoder_loss=0.2471, over 28497.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1196, cr_loss=0.3614, attn_decoder_loss=0.2417, over 5798945.00 frames. ], batch size: 112, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:41:23,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=574300.0, ans=0.125 2024-09-19 02:41:42,727 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:41:54,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=574380.0, ans=0.125 2024-09-19 02:42:04,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=574420.0, ans=0.125 2024-09-19 02:42:07,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=574420.0, ans=0.125 2024-09-19 02:42:15,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2024-09-19 02:42:23,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=574460.0, ans=0.125 2024-09-19 02:42:35,355 INFO [train.py:1198] (1/2) Epoch 32, batch 3350, loss[loss=0.2505, ctc_loss=0.1264, cr_loss=0.372, attn_decoder_loss=0.256, over 28896.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1201, cr_loss=0.3619, attn_decoder_loss=0.2423, over 5775821.77 frames. ], batch size: 104, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:42:47,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=574500.0, ans=0.125 2024-09-19 02:42:53,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-09-19 02:43:11,945 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.605e+01 8.988e+01 9.712e+01 2.177e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 02:43:37,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=574660.0, ans=0.0 2024-09-19 02:43:37,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2024-09-19 02:43:55,671 INFO [train.py:1198] (1/2) Epoch 32, batch 3400, loss[loss=0.2079, ctc_loss=0.1045, cr_loss=0.3348, attn_decoder_loss=0.2119, over 29335.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1203, cr_loss=0.3624, attn_decoder_loss=0.2423, over 5768110.21 frames. ], batch size: 67, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:44:32,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=574780.0, ans=0.125 2024-09-19 02:44:42,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=574820.0, ans=0.0 2024-09-19 02:44:56,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=574860.0, ans=0.125 2024-09-19 02:44:56,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=574860.0, ans=0.0 2024-09-19 02:45:10,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=574900.0, ans=0.0 2024-09-19 02:45:11,592 INFO [train.py:1198] (1/2) Epoch 32, batch 3450, loss[loss=0.2575, ctc_loss=0.1416, cr_loss=0.3936, attn_decoder_loss=0.2617, over 28198.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1201, cr_loss=0.3624, attn_decoder_loss=0.2428, over 5775461.25 frames. ], batch size: 111, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:45:36,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=574940.0, ans=0.125 2024-09-19 02:45:40,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=574980.0, ans=0.125 2024-09-19 02:45:43,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=574980.0, ans=0.025 2024-09-19 02:45:43,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=574980.0, ans=0.2 2024-09-19 02:45:47,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.523e+01 9.077e+01 9.652e+01 1.976e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 02:45:48,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.96 vs. limit=15.0 2024-09-19 02:45:51,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=574980.0, ans=0.0 2024-09-19 02:45:55,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=575020.0, ans=0.2 2024-09-19 02:45:55,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575020.0, ans=0.1 2024-09-19 02:45:58,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=575020.0, ans=0.125 2024-09-19 02:46:00,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=575020.0, ans=0.2 2024-09-19 02:46:04,955 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:46:07,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=575020.0, ans=0.125 2024-09-19 02:46:27,033 INFO [train.py:1198] (1/2) Epoch 32, batch 3500, loss[loss=0.2174, ctc_loss=0.1113, cr_loss=0.3448, attn_decoder_loss=0.2215, over 29345.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1198, cr_loss=0.3616, attn_decoder_loss=0.2421, over 5776240.91 frames. ], batch size: 71, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:46:33,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=575100.0, ans=0.2 2024-09-19 02:46:49,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-19 02:46:58,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=575180.0, ans=10.0 2024-09-19 02:47:03,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=575180.0, ans=0.125 2024-09-19 02:47:15,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=575220.0, ans=0.125 2024-09-19 02:47:34,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=575260.0, ans=0.2 2024-09-19 02:47:42,061 INFO [train.py:1198] (1/2) Epoch 32, batch 3550, loss[loss=0.247, ctc_loss=0.1204, cr_loss=0.37, attn_decoder_loss=0.2528, over 29713.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1194, cr_loss=0.3612, attn_decoder_loss=0.242, over 5782130.66 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:47:48,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten.whitening_limit, batch_count=575300.0, ans=22.5 2024-09-19 02:48:12,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=575380.0, ans=0.125 2024-09-19 02:48:19,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.288e+01 8.961e+01 9.598e+01 1.614e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 02:48:19,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=575380.0, ans=0.0 2024-09-19 02:48:30,248 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:48:40,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=575420.0, ans=10.0 2024-09-19 02:48:42,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=575420.0, ans=0.125 2024-09-19 02:48:48,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=575460.0, ans=0.125 2024-09-19 02:48:58,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=575500.0, ans=0.0 2024-09-19 02:49:00,121 INFO [train.py:1198] (1/2) Epoch 32, batch 3600, loss[loss=0.2405, ctc_loss=0.1323, cr_loss=0.3838, attn_decoder_loss=0.244, over 29518.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1193, cr_loss=0.3612, attn_decoder_loss=0.2421, over 5791136.87 frames. ], batch size: 77, lr: 3.44e-03, grad_scale: 16.0 2024-09-19 02:49:22,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=575540.0, ans=0.0 2024-09-19 02:49:24,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.96 vs. limit=22.5 2024-09-19 02:49:31,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=575580.0, ans=0.1 2024-09-19 02:49:48,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=575620.0, ans=0.0 2024-09-19 02:49:55,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=575620.0, ans=0.125 2024-09-19 02:50:11,917 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:50:14,517 INFO [train.py:1198] (1/2) Epoch 32, batch 3650, loss[loss=0.2472, ctc_loss=0.1269, cr_loss=0.3707, attn_decoder_loss=0.2524, over 29508.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1185, cr_loss=0.3597, attn_decoder_loss=0.2413, over 5792489.84 frames. ], batch size: 90, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:50:17,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=575700.0, ans=0.125 2024-09-19 02:50:17,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=575700.0, ans=0.2 2024-09-19 02:50:37,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-19 02:50:40,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-19 02:50:51,847 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 8.427e+01 8.894e+01 9.403e+01 1.898e+02, threshold=1.779e+02, percent-clipped=1.0 2024-09-19 02:51:01,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=575820.0, ans=0.125 2024-09-19 02:51:06,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=575820.0, ans=0.0 2024-09-19 02:51:18,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575860.0, ans=0.1 2024-09-19 02:51:27,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=575900.0, ans=0.125 2024-09-19 02:51:29,100 INFO [train.py:1198] (1/2) Epoch 32, batch 3700, loss[loss=0.241, ctc_loss=0.1155, cr_loss=0.3478, attn_decoder_loss=0.2472, over 29726.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1184, cr_loss=0.3595, attn_decoder_loss=0.2416, over 5802902.32 frames. ], batch size: 84, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:51:37,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2024-09-19 02:51:50,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=575940.0, ans=0.025 2024-09-19 02:52:31,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=576020.0, ans=0.04949747468305833 2024-09-19 02:52:51,297 INFO [train.py:1198] (1/2) Epoch 32, batch 3750, loss[loss=0.2079, ctc_loss=0.09606, cr_loss=0.2971, attn_decoder_loss=0.2137, over 29328.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1185, cr_loss=0.3598, attn_decoder_loss=0.2414, over 5805987.73 frames. ], batch size: 67, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:52:58,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.78 vs. limit=22.5 2024-09-19 02:53:09,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=576140.0, ans=0.1 2024-09-19 02:53:15,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=576140.0, ans=0.2 2024-09-19 02:53:22,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=576180.0, ans=0.1 2024-09-19 02:53:28,123 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.053e+01 8.581e+01 9.002e+01 9.610e+01 1.544e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-19 02:53:42,038 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:53:53,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.20 vs. limit=15.0 2024-09-19 02:54:01,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=576260.0, ans=0.0 2024-09-19 02:54:05,647 INFO [train.py:1198] (1/2) Epoch 32, batch 3800, loss[loss=0.2458, ctc_loss=0.1212, cr_loss=0.38, attn_decoder_loss=0.2511, over 29618.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1186, cr_loss=0.3596, attn_decoder_loss=0.2413, over 5798146.61 frames. ], batch size: 86, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:54:20,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=576340.0, ans=0.125 2024-09-19 02:54:25,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=576340.0, ans=0.1 2024-09-19 02:54:38,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=576380.0, ans=0.125 2024-09-19 02:54:58,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=576420.0, ans=0.125 2024-09-19 02:55:08,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=576460.0, ans=0.2 2024-09-19 02:55:13,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=576460.0, ans=0.0 2024-09-19 02:55:19,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=576460.0, ans=0.05 2024-09-19 02:55:19,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=576460.0, ans=0.125 2024-09-19 02:55:23,188 INFO [train.py:1198] (1/2) Epoch 32, batch 3850, loss[loss=0.2501, ctc_loss=0.1311, cr_loss=0.3795, attn_decoder_loss=0.2549, over 29322.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1186, cr_loss=0.3597, attn_decoder_loss=0.2413, over 5812429.52 frames. ], batch size: 100, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:55:41,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=576540.0, ans=0.0 2024-09-19 02:55:53,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=576580.0, ans=0.125 2024-09-19 02:56:00,149 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.855e+01 8.481e+01 8.994e+01 9.437e+01 1.418e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-19 02:56:01,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-09-19 02:56:30,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=576660.0, ans=0.125 2024-09-19 02:56:34,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=576660.0, ans=0.025 2024-09-19 02:56:37,440 INFO [train.py:1198] (1/2) Epoch 32, batch 3900, loss[loss=0.2483, ctc_loss=0.126, cr_loss=0.3776, attn_decoder_loss=0.2535, over 29628.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1189, cr_loss=0.3604, attn_decoder_loss=0.2418, over 5816952.15 frames. ], batch size: 86, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:57:05,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=576780.0, ans=0.125 2024-09-19 02:57:15,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=576780.0, ans=0.0 2024-09-19 02:57:30,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=576820.0, ans=10.0 2024-09-19 02:57:42,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=576860.0, ans=0.125 2024-09-19 02:57:52,055 INFO [train.py:1198] (1/2) Epoch 32, batch 3950, loss[loss=0.2512, ctc_loss=0.1293, cr_loss=0.3935, attn_decoder_loss=0.256, over 29462.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1187, cr_loss=0.3596, attn_decoder_loss=0.242, over 5836193.83 frames. ], batch size: 97, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 02:58:19,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2024-09-19 02:58:26,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=576980.0, ans=0.125 2024-09-19 02:58:28,774 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.521e+01 9.029e+01 9.542e+01 2.820e+02, threshold=1.806e+02, percent-clipped=2.0 2024-09-19 02:58:39,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=577020.0, ans=0.125 2024-09-19 02:58:39,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=577020.0, ans=0.2 2024-09-19 02:59:05,617 INFO [train.py:1198] (1/2) Epoch 32, batch 4000, loss[loss=0.2268, ctc_loss=0.1116, cr_loss=0.3399, attn_decoder_loss=0.232, over 29513.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1189, cr_loss=0.3594, attn_decoder_loss=0.2419, over 5812916.01 frames. ], batch size: 74, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 02:59:08,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=577100.0, ans=0.125 2024-09-19 02:59:19,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.90 vs. limit=22.5 2024-09-19 02:59:26,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-09-19 02:59:29,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=577140.0, ans=0.0 2024-09-19 03:00:05,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=577220.0, ans=0.125 2024-09-19 03:00:09,649 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:00:13,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=577260.0, ans=0.2 2024-09-19 03:00:22,474 INFO [train.py:1198] (1/2) Epoch 32, batch 4050, loss[loss=0.2561, ctc_loss=0.1495, cr_loss=0.3869, attn_decoder_loss=0.2594, over 20518.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.119, cr_loss=0.3595, attn_decoder_loss=0.2422, over 5796751.46 frames. ], batch size: 213, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:00:28,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=577300.0, ans=0.0 2024-09-19 03:00:30,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=577300.0, ans=0.125 2024-09-19 03:00:33,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.36 vs. limit=15.0 2024-09-19 03:00:59,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.593e+01 9.238e+01 9.964e+01 1.548e+02, threshold=1.848e+02, percent-clipped=0.0 2024-09-19 03:01:25,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=577460.0, ans=0.025 2024-09-19 03:01:28,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2024-09-19 03:01:36,060 INFO [train.py:1198] (1/2) Epoch 32, batch 4100, loss[loss=0.2557, ctc_loss=0.132, cr_loss=0.383, attn_decoder_loss=0.2609, over 29506.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1192, cr_loss=0.3601, attn_decoder_loss=0.2423, over 5792163.52 frames. ], batch size: 90, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:01:36,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=577500.0, ans=0.0 2024-09-19 03:01:38,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-09-19 03:01:56,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577540.0, ans=0.1 2024-09-19 03:01:59,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=577540.0, ans=0.125 2024-09-19 03:02:35,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=577660.0, ans=0.025 2024-09-19 03:02:39,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-09-19 03:02:50,335 INFO [train.py:1198] (1/2) Epoch 32, batch 4150, loss[loss=0.2321, ctc_loss=0.1145, cr_loss=0.3647, attn_decoder_loss=0.2371, over 29505.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.119, cr_loss=0.3596, attn_decoder_loss=0.2419, over 5797756.07 frames. ], batch size: 77, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:02:54,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=577700.0, ans=0.09899494936611666 2024-09-19 03:03:02,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=577700.0, ans=0.0 2024-09-19 03:03:05,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577740.0, ans=0.1 2024-09-19 03:03:26,923 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.412e+01 8.911e+01 9.455e+01 1.648e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-19 03:03:48,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.94 vs. limit=12.0 2024-09-19 03:03:50,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=577860.0, ans=0.5 2024-09-19 03:03:53,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=577860.0, ans=0.125 2024-09-19 03:04:04,951 INFO [train.py:1198] (1/2) Epoch 32, batch 4200, loss[loss=0.2464, ctc_loss=0.1241, cr_loss=0.3817, attn_decoder_loss=0.2515, over 29510.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1192, cr_loss=0.3599, attn_decoder_loss=0.242, over 5799916.52 frames. ], batch size: 90, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:04:31,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=577940.0, ans=0.2 2024-09-19 03:04:41,949 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-09-19 03:04:48,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=578020.0, ans=0.0 2024-09-19 03:05:06,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=578060.0, ans=0.125 2024-09-19 03:05:11,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.48 vs. limit=15.0 2024-09-19 03:05:15,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=578060.0, ans=0.125 2024-09-19 03:05:19,406 INFO [train.py:1198] (1/2) Epoch 32, batch 4250, loss[loss=0.2212, ctc_loss=0.1022, cr_loss=0.3159, attn_decoder_loss=0.2274, over 29513.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1194, cr_loss=0.3603, attn_decoder_loss=0.2425, over 5804922.34 frames. ], batch size: 74, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:05:51,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578180.0, ans=0.1 2024-09-19 03:05:57,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.485e+01 9.060e+01 9.670e+01 1.862e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 03:06:33,353 INFO [train.py:1198] (1/2) Epoch 32, batch 4300, loss[loss=0.2447, ctc_loss=0.1151, cr_loss=0.3596, attn_decoder_loss=0.2511, over 29551.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1196, cr_loss=0.3611, attn_decoder_loss=0.2428, over 5793936.63 frames. ], batch size: 87, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:06:35,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2024-09-19 03:06:36,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578300.0, ans=0.1 2024-09-19 03:06:38,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=578300.0, ans=0.125 2024-09-19 03:06:40,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2024-09-19 03:06:54,100 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:07:04,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=578380.0, ans=0.125 2024-09-19 03:07:21,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=3.98 vs. limit=12.0 2024-09-19 03:07:25,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=578420.0, ans=0.125 2024-09-19 03:07:38,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=578460.0, ans=0.125 2024-09-19 03:07:40,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=578460.0, ans=0.125 2024-09-19 03:07:41,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=578460.0, ans=0.07 2024-09-19 03:07:45,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=578460.0, ans=0.125 2024-09-19 03:07:50,080 INFO [train.py:1198] (1/2) Epoch 32, batch 4350, loss[loss=0.2423, ctc_loss=0.1217, cr_loss=0.3675, attn_decoder_loss=0.2475, over 29459.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1221, cr_loss=0.3664, attn_decoder_loss=0.2459, over 5797182.97 frames. ], batch size: 97, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:07:58,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.06 vs. limit=10.0 2024-09-19 03:08:03,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=578540.0, ans=0.2 2024-09-19 03:08:25,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=578580.0, ans=0.125 2024-09-19 03:08:28,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 8.949e+01 9.418e+01 9.976e+01 1.682e+02, threshold=1.884e+02, percent-clipped=0.0 2024-09-19 03:08:28,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=578580.0, ans=0.125 2024-09-19 03:08:31,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=578580.0, ans=0.025 2024-09-19 03:08:50,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=578660.0, ans=0.125 2024-09-19 03:08:52,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=578660.0, ans=0.125 2024-09-19 03:08:59,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=578660.0, ans=0.125 2024-09-19 03:09:03,611 INFO [train.py:1198] (1/2) Epoch 32, batch 4400, loss[loss=0.2477, ctc_loss=0.1308, cr_loss=0.3755, attn_decoder_loss=0.2523, over 27288.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1235, cr_loss=0.3693, attn_decoder_loss=0.2477, over 5768670.57 frames. ], batch size: 124, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:09:17,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=578740.0, ans=0.125 2024-09-19 03:09:56,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.08 vs. limit=10.0 2024-09-19 03:10:02,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=578860.0, ans=0.0 2024-09-19 03:10:14,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=578860.0, ans=0.125 2024-09-19 03:10:15,944 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:10:18,545 INFO [train.py:1198] (1/2) Epoch 32, batch 4450, loss[loss=0.2586, ctc_loss=0.151, cr_loss=0.4019, attn_decoder_loss=0.2616, over 19631.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1275, cr_loss=0.3748, attn_decoder_loss=0.25, over 5573098.13 frames. ], batch size: 209, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:10:23,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=578900.0, ans=0.2 2024-09-19 03:10:32,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=578940.0, ans=0.0 2024-09-19 03:10:56,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=578980.0, ans=0.0 2024-09-19 03:10:58,946 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.256e+01 9.217e+01 9.990e+01 1.147e+02 3.633e+02, threshold=1.998e+02, percent-clipped=4.0 2024-09-19 03:11:19,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=12.0 2024-09-19 03:11:29,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-09-19 03:11:34,138 INFO [train.py:1198] (1/2) Epoch 32, batch 4500, loss[loss=0.2594, ctc_loss=0.1541, cr_loss=0.3806, attn_decoder_loss=0.2627, over 19317.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1308, cr_loss=0.3776, attn_decoder_loss=0.2519, over 5232297.40 frames. ], batch size: 209, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:11:37,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=579100.0, ans=0.0 2024-09-19 03:11:43,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=579100.0, ans=0.2 2024-09-19 03:13:03,884 INFO [train.py:1198] (1/2) Epoch 33, batch 0, loss[loss=0.2149, ctc_loss=0.09693, cr_loss=0.3242, attn_decoder_loss=0.2208, over 29598.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.09693, cr_loss=0.3242, attn_decoder_loss=0.2208, over 29598.00 frames. ], batch size: 73, lr: 3.37e-03, grad_scale: 16.0 2024-09-19 03:13:03,885 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 03:13:20,655 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8306, 3.4001, 3.6408, 3.2934], device='cuda:1') 2024-09-19 03:13:22,384 INFO [train.py:1230] (1/2) Epoch 33, validation: loss=0.2131, ctc_loss=0.03625, cr_loss=6.2e-15, attn_decoder_loss=0.2327, over 944034.00 frames. 2024-09-19 03:13:22,385 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 03:13:24,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=579200.0, ans=0.125 2024-09-19 03:13:26,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=579200.0, ans=0.0 2024-09-19 03:13:43,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=579240.0, ans=0.125 2024-09-19 03:14:06,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.54 vs. limit=15.0 2024-09-19 03:14:16,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=579320.0, ans=0.0 2024-09-19 03:14:30,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2024-09-19 03:14:31,539 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:14:38,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=12.0 2024-09-19 03:14:38,706 INFO [train.py:1198] (1/2) Epoch 33, batch 50, loss[loss=0.2122, ctc_loss=0.1034, cr_loss=0.3276, attn_decoder_loss=0.2171, over 29444.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1235, cr_loss=0.371, attn_decoder_loss=0.2444, over 1269184.28 frames. ], batch size: 70, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:14:43,364 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.277e+01 1.031e+02 1.119e+02 2.001e+02, threshold=2.062e+02, percent-clipped=1.0 2024-09-19 03:14:46,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=22.5 2024-09-19 03:15:05,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2024-09-19 03:15:24,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=579520.0, ans=0.0 2024-09-19 03:15:26,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.46 vs. limit=15.0 2024-09-19 03:15:38,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=579560.0, ans=0.2 2024-09-19 03:15:54,966 INFO [train.py:1198] (1/2) Epoch 33, batch 100, loss[loss=0.2324, ctc_loss=0.1117, cr_loss=0.3458, attn_decoder_loss=0.2382, over 29540.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1236, cr_loss=0.3706, attn_decoder_loss=0.2458, over 2253565.37 frames. ], batch size: 76, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:15:55,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=579600.0, ans=22.5 2024-09-19 03:15:59,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=579600.0, ans=0.125 2024-09-19 03:16:03,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-09-19 03:16:22,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=579640.0, ans=0.0 2024-09-19 03:16:24,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=579640.0, ans=0.5 2024-09-19 03:16:27,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=579680.0, ans=0.125 2024-09-19 03:16:28,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=579680.0, ans=0.125 2024-09-19 03:16:31,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-09-19 03:16:46,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=579720.0, ans=0.015 2024-09-19 03:16:58,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-09-19 03:17:07,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579760.0, ans=0.1 2024-09-19 03:17:10,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=579800.0, ans=0.125 2024-09-19 03:17:11,875 INFO [train.py:1198] (1/2) Epoch 33, batch 150, loss[loss=0.2111, ctc_loss=0.09349, cr_loss=0.3069, attn_decoder_loss=0.2174, over 29431.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1214, cr_loss=0.3652, attn_decoder_loss=0.2437, over 3047867.68 frames. ], batch size: 70, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:17:12,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=579800.0, ans=0.125 2024-09-19 03:17:16,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.477e+01 8.945e+01 9.593e+01 9.750e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-19 03:17:22,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579800.0, ans=0.1 2024-09-19 03:17:28,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=579840.0, ans=0.125 2024-09-19 03:17:47,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=579880.0, ans=10.0 2024-09-19 03:18:23,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=579960.0, ans=0.125 2024-09-19 03:18:25,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=579960.0, ans=0.025 2024-09-19 03:18:28,278 INFO [train.py:1198] (1/2) Epoch 33, batch 200, loss[loss=0.2557, ctc_loss=0.139, cr_loss=0.4011, attn_decoder_loss=0.2597, over 27339.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1201, cr_loss=0.3623, attn_decoder_loss=0.2424, over 3659146.54 frames. ], batch size: 124, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:18:52,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=580040.0, ans=0.2 2024-09-19 03:18:57,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-19 03:18:59,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.27 vs. limit=15.0 2024-09-19 03:19:13,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=580120.0, ans=0.125 2024-09-19 03:19:30,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=580160.0, ans=0.025 2024-09-19 03:19:38,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=580160.0, ans=0.025 2024-09-19 03:19:43,961 INFO [train.py:1198] (1/2) Epoch 33, batch 250, loss[loss=0.2611, ctc_loss=0.1305, cr_loss=0.3892, attn_decoder_loss=0.267, over 29214.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1197, cr_loss=0.3618, attn_decoder_loss=0.2423, over 4140597.47 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:19:48,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.257e+01 8.698e+01 9.269e+01 2.011e+02, threshold=1.740e+02, percent-clipped=1.0 2024-09-19 03:19:58,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=580240.0, ans=0.0 2024-09-19 03:20:00,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2024-09-19 03:20:16,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=580280.0, ans=0.025 2024-09-19 03:20:17,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=580280.0, ans=0.025 2024-09-19 03:20:44,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=580320.0, ans=0.025 2024-09-19 03:20:44,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=580320.0, ans=0.0 2024-09-19 03:20:47,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=580360.0, ans=0.0 2024-09-19 03:21:02,339 INFO [train.py:1198] (1/2) Epoch 33, batch 300, loss[loss=0.2559, ctc_loss=0.1421, cr_loss=0.4226, attn_decoder_loss=0.2592, over 29543.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1188, cr_loss=0.3601, attn_decoder_loss=0.2418, over 4509419.61 frames. ], batch size: 92, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:21:17,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=580440.0, ans=0.0 2024-09-19 03:21:22,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=580440.0, ans=0.0 2024-09-19 03:21:22,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=580440.0, ans=0.2 2024-09-19 03:21:26,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=580440.0, ans=0.2 2024-09-19 03:21:30,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.20 vs. limit=15.0 2024-09-19 03:21:36,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=580480.0, ans=0.125 2024-09-19 03:21:36,513 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:21:37,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-09-19 03:21:37,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=580480.0, ans=0.0 2024-09-19 03:21:42,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=580480.0, ans=0.0 2024-09-19 03:22:12,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=15.0 2024-09-19 03:22:20,296 INFO [train.py:1198] (1/2) Epoch 33, batch 350, loss[loss=0.2136, ctc_loss=0.09791, cr_loss=0.3235, attn_decoder_loss=0.2192, over 29352.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1185, cr_loss=0.3595, attn_decoder_loss=0.2419, over 4794280.08 frames. ], batch size: 71, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:22:24,712 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.463e+01 8.888e+01 9.398e+01 1.588e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 03:22:29,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=580600.0, ans=0.125 2024-09-19 03:22:40,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.50 vs. limit=15.0 2024-09-19 03:23:24,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=580760.0, ans=0.0 2024-09-19 03:23:28,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=580760.0, ans=0.025 2024-09-19 03:23:36,283 INFO [train.py:1198] (1/2) Epoch 33, batch 400, loss[loss=0.2433, ctc_loss=0.1243, cr_loss=0.3752, attn_decoder_loss=0.2482, over 29699.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1182, cr_loss=0.3593, attn_decoder_loss=0.2417, over 5024396.91 frames. ], batch size: 82, lr: 3.37e-03, grad_scale: 16.0 2024-09-19 03:23:41,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.65 vs. limit=15.0 2024-09-19 03:23:44,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=580800.0, ans=0.025 2024-09-19 03:23:50,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=580840.0, ans=0.2 2024-09-19 03:24:06,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580880.0, ans=0.1 2024-09-19 03:24:30,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=580920.0, ans=0.125 2024-09-19 03:24:37,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-09-19 03:24:54,730 INFO [train.py:1198] (1/2) Epoch 33, batch 450, loss[loss=0.2407, ctc_loss=0.1226, cr_loss=0.3685, attn_decoder_loss=0.2456, over 29700.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1187, cr_loss=0.3604, attn_decoder_loss=0.242, over 5186917.37 frames. ], batch size: 83, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:25:00,690 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.447e+01 9.007e+01 9.616e+01 1.601e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-19 03:25:04,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=581000.0, ans=0.025 2024-09-19 03:25:05,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=581000.0, ans=0.0 2024-09-19 03:25:25,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=581080.0, ans=0.2 2024-09-19 03:25:27,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.55 vs. limit=15.0 2024-09-19 03:25:49,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=581120.0, ans=0.125 2024-09-19 03:26:12,851 INFO [train.py:1198] (1/2) Epoch 33, batch 500, loss[loss=0.251, ctc_loss=0.1247, cr_loss=0.3725, attn_decoder_loss=0.2567, over 29446.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1184, cr_loss=0.3597, attn_decoder_loss=0.2414, over 5329791.36 frames. ], batch size: 94, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:26:13,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=581200.0, ans=0.125 2024-09-19 03:26:17,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=581200.0, ans=0.125 2024-09-19 03:26:37,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=581240.0, ans=0.0 2024-09-19 03:26:54,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.59 vs. limit=15.0 2024-09-19 03:26:57,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=581320.0, ans=0.125 2024-09-19 03:27:19,686 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:27:25,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=581360.0, ans=0.025 2024-09-19 03:27:28,657 INFO [train.py:1198] (1/2) Epoch 33, batch 550, loss[loss=0.2518, ctc_loss=0.1303, cr_loss=0.3871, attn_decoder_loss=0.2567, over 28820.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1187, cr_loss=0.3601, attn_decoder_loss=0.2417, over 5422255.15 frames. ], batch size: 104, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:27:34,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.333e+01 9.017e+01 9.436e+01 4.024e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 03:27:41,154 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:27:54,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.37 vs. limit=12.0 2024-09-19 03:28:02,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=581480.0, ans=0.125 2024-09-19 03:28:03,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=581480.0, ans=0.125 2024-09-19 03:28:10,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=581480.0, ans=0.0 2024-09-19 03:28:27,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=581520.0, ans=0.125 2024-09-19 03:28:41,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581560.0, ans=0.1 2024-09-19 03:28:47,663 INFO [train.py:1198] (1/2) Epoch 33, batch 600, loss[loss=0.2467, ctc_loss=0.1229, cr_loss=0.3601, attn_decoder_loss=0.2524, over 29236.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1188, cr_loss=0.3605, attn_decoder_loss=0.2419, over 5508141.45 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:28:49,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=581600.0, ans=0.2 2024-09-19 03:29:10,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=581640.0, ans=0.125 2024-09-19 03:29:28,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2024-09-19 03:29:45,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=581720.0, ans=0.125 2024-09-19 03:29:56,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=581760.0, ans=0.0 2024-09-19 03:30:05,157 INFO [train.py:1198] (1/2) Epoch 33, batch 650, loss[loss=0.2369, ctc_loss=0.1189, cr_loss=0.3673, attn_decoder_loss=0.2419, over 29758.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.118, cr_loss=0.3591, attn_decoder_loss=0.2412, over 5586103.12 frames. ], batch size: 81, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:30:11,217 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.577e+01 8.986e+01 9.488e+01 1.360e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-19 03:30:37,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=581880.0, ans=0.0 2024-09-19 03:31:21,279 INFO [train.py:1198] (1/2) Epoch 33, batch 700, loss[loss=0.2284, ctc_loss=0.1127, cr_loss=0.3519, attn_decoder_loss=0.2335, over 29567.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1183, cr_loss=0.3602, attn_decoder_loss=0.2416, over 5636971.74 frames. ], batch size: 76, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:32:22,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=582160.0, ans=0.0 2024-09-19 03:32:30,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.12 vs. limit=10.0 2024-09-19 03:32:39,391 INFO [train.py:1198] (1/2) Epoch 33, batch 750, loss[loss=0.2439, ctc_loss=0.121, cr_loss=0.3616, attn_decoder_loss=0.2495, over 29720.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1183, cr_loss=0.36, attn_decoder_loss=0.2415, over 5677162.30 frames. ], batch size: 82, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:32:42,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=582200.0, ans=0.0 2024-09-19 03:32:44,033 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:32:46,693 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.541e+01 8.897e+01 9.394e+01 1.704e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-19 03:32:50,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582200.0, ans=0.1 2024-09-19 03:32:56,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=582240.0, ans=0.125 2024-09-19 03:33:03,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=582240.0, ans=0.025 2024-09-19 03:33:06,497 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:33:09,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=582280.0, ans=0.2 2024-09-19 03:33:26,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.34 vs. limit=10.0 2024-09-19 03:33:47,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=582360.0, ans=0.125 2024-09-19 03:33:48,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=582360.0, ans=0.125 2024-09-19 03:33:50,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=582360.0, ans=0.125 2024-09-19 03:33:57,742 INFO [train.py:1198] (1/2) Epoch 33, batch 800, loss[loss=0.2155, ctc_loss=0.09827, cr_loss=0.3034, attn_decoder_loss=0.2218, over 29612.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1186, cr_loss=0.3606, attn_decoder_loss=0.2417, over 5707187.33 frames. ], batch size: 73, lr: 3.37e-03, grad_scale: 16.0 2024-09-19 03:34:13,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=582440.0, ans=0.125 2024-09-19 03:34:20,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=582440.0, ans=0.125 2024-09-19 03:34:41,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=22.5 2024-09-19 03:34:42,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582520.0, ans=0.1 2024-09-19 03:34:57,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=12.0 2024-09-19 03:34:58,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=582560.0, ans=0.0 2024-09-19 03:35:00,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=582560.0, ans=0.1 2024-09-19 03:35:13,258 INFO [train.py:1198] (1/2) Epoch 33, batch 850, loss[loss=0.2407, ctc_loss=0.1174, cr_loss=0.3703, attn_decoder_loss=0.2461, over 29706.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1181, cr_loss=0.3596, attn_decoder_loss=0.2411, over 5736417.53 frames. ], batch size: 89, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:35:17,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=582600.0, ans=0.025 2024-09-19 03:35:20,678 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.452e+01 8.956e+01 9.635e+01 2.624e+02, threshold=1.791e+02, percent-clipped=1.0 2024-09-19 03:35:21,095 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:35:25,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=582600.0, ans=10.0 2024-09-19 03:35:36,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=12.0 2024-09-19 03:35:49,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=582680.0, ans=0.125 2024-09-19 03:35:49,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=582680.0, ans=0.125 2024-09-19 03:36:01,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=22.5 2024-09-19 03:36:05,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=582720.0, ans=0.125 2024-09-19 03:36:27,163 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:36:31,364 INFO [train.py:1198] (1/2) Epoch 33, batch 900, loss[loss=0.2278, ctc_loss=0.1089, cr_loss=0.3543, attn_decoder_loss=0.2331, over 29573.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1183, cr_loss=0.3596, attn_decoder_loss=0.2414, over 5741989.94 frames. ], batch size: 73, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:36:31,687 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:36:42,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=582800.0, ans=0.125 2024-09-19 03:36:52,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582840.0, ans=0.1 2024-09-19 03:36:57,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=582840.0, ans=0.125 2024-09-19 03:36:59,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=15.0 2024-09-19 03:37:01,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=582880.0, ans=0.0 2024-09-19 03:37:06,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=582880.0, ans=0.125 2024-09-19 03:37:19,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.62 vs. limit=15.0 2024-09-19 03:37:22,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=582920.0, ans=0.125 2024-09-19 03:37:26,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=582920.0, ans=0.5 2024-09-19 03:37:48,790 INFO [train.py:1198] (1/2) Epoch 33, batch 950, loss[loss=0.2234, ctc_loss=0.1137, cr_loss=0.3533, attn_decoder_loss=0.2278, over 29517.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.118, cr_loss=0.359, attn_decoder_loss=0.2413, over 5743358.79 frames. ], batch size: 74, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:37:52,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583000.0, ans=0.1 2024-09-19 03:37:54,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=583000.0, ans=0.025 2024-09-19 03:37:56,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.465e+01 9.060e+01 1.004e+02 2.208e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 03:38:20,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=583080.0, ans=0.125 2024-09-19 03:39:02,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=12.0 2024-09-19 03:39:04,616 INFO [train.py:1198] (1/2) Epoch 33, batch 1000, loss[loss=0.2411, ctc_loss=0.1179, cr_loss=0.3611, attn_decoder_loss=0.2468, over 29508.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1184, cr_loss=0.3599, attn_decoder_loss=0.2419, over 5737188.52 frames. ], batch size: 77, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:39:07,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=583200.0, ans=0.2 2024-09-19 03:39:12,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=583200.0, ans=0.05 2024-09-19 03:39:23,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=583240.0, ans=0.125 2024-09-19 03:39:23,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=583240.0, ans=0.0 2024-09-19 03:39:26,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=583240.0, ans=0.1 2024-09-19 03:39:29,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=583240.0, ans=0.125 2024-09-19 03:39:32,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=583240.0, ans=0.0 2024-09-19 03:39:41,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=583280.0, ans=0.125 2024-09-19 03:40:04,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=583360.0, ans=0.0 2024-09-19 03:40:05,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=583360.0, ans=0.125 2024-09-19 03:40:05,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=583360.0, ans=0.07 2024-09-19 03:40:15,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=583360.0, ans=0.125 2024-09-19 03:40:22,696 INFO [train.py:1198] (1/2) Epoch 33, batch 1050, loss[loss=0.2419, ctc_loss=0.1177, cr_loss=0.3643, attn_decoder_loss=0.2476, over 29676.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1183, cr_loss=0.3602, attn_decoder_loss=0.2414, over 5744706.77 frames. ], batch size: 85, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:40:32,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=583400.0, ans=0.0 2024-09-19 03:40:33,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.604e+01 9.076e+01 9.577e+01 3.537e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 03:40:42,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-09-19 03:40:43,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2024-09-19 03:40:54,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=583480.0, ans=0.125 2024-09-19 03:40:54,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583480.0, ans=0.1 2024-09-19 03:41:32,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.95 vs. limit=15.0 2024-09-19 03:41:40,572 INFO [train.py:1198] (1/2) Epoch 33, batch 1100, loss[loss=0.2375, ctc_loss=0.1219, cr_loss=0.3793, attn_decoder_loss=0.242, over 29458.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1182, cr_loss=0.3594, attn_decoder_loss=0.2411, over 5756686.69 frames. ], batch size: 78, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:41:54,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=583640.0, ans=0.025 2024-09-19 03:41:54,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=583640.0, ans=0.125 2024-09-19 03:42:03,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=583640.0, ans=0.5 2024-09-19 03:42:08,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=583640.0, ans=0.1 2024-09-19 03:42:23,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.94 vs. limit=22.5 2024-09-19 03:42:29,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=583720.0, ans=0.125 2024-09-19 03:42:38,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=583720.0, ans=0.0 2024-09-19 03:42:52,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=583760.0, ans=0.07 2024-09-19 03:42:55,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583800.0, ans=0.1 2024-09-19 03:42:56,359 INFO [train.py:1198] (1/2) Epoch 33, batch 1150, loss[loss=0.2363, ctc_loss=0.1126, cr_loss=0.3451, attn_decoder_loss=0.2423, over 29448.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1183, cr_loss=0.3596, attn_decoder_loss=0.2413, over 5754090.82 frames. ], batch size: 78, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:43:06,973 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.417e+01 8.891e+01 9.458e+01 2.719e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 03:43:10,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=583840.0, ans=0.2 2024-09-19 03:43:33,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=583880.0, ans=0.0 2024-09-19 03:43:43,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=583920.0, ans=0.07 2024-09-19 03:43:47,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-09-19 03:44:13,008 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:44:14,585 INFO [train.py:1198] (1/2) Epoch 33, batch 1200, loss[loss=0.2515, ctc_loss=0.1275, cr_loss=0.3982, attn_decoder_loss=0.2564, over 29678.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1189, cr_loss=0.3607, attn_decoder_loss=0.242, over 5745627.05 frames. ], batch size: 85, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:44:27,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584000.0, ans=0.1 2024-09-19 03:45:04,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=584120.0, ans=0.125 2024-09-19 03:45:05,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584120.0, ans=0.1 2024-09-19 03:45:28,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=584160.0, ans=0.0 2024-09-19 03:45:32,641 INFO [train.py:1198] (1/2) Epoch 33, batch 1250, loss[loss=0.2432, ctc_loss=0.1245, cr_loss=0.3578, attn_decoder_loss=0.2485, over 29551.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1189, cr_loss=0.3608, attn_decoder_loss=0.2424, over 5773684.37 frames. ], batch size: 92, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:45:37,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=584200.0, ans=0.125 2024-09-19 03:45:43,420 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.568e+01 9.117e+01 9.876e+01 2.169e+02, threshold=1.823e+02, percent-clipped=3.0 2024-09-19 03:45:43,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=584200.0, ans=0.125 2024-09-19 03:45:51,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2024-09-19 03:46:40,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=584360.0, ans=0.125 2024-09-19 03:46:45,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.55 vs. limit=10.0 2024-09-19 03:46:48,666 INFO [train.py:1198] (1/2) Epoch 33, batch 1300, loss[loss=0.2372, ctc_loss=0.1099, cr_loss=0.3165, attn_decoder_loss=0.2444, over 28349.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1188, cr_loss=0.3604, attn_decoder_loss=0.2418, over 5779039.59 frames. ], batch size: 111, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:47:01,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=584400.0, ans=0.125 2024-09-19 03:47:04,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=584440.0, ans=0.125 2024-09-19 03:47:27,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=584480.0, ans=0.1 2024-09-19 03:47:54,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.63 vs. limit=15.0 2024-09-19 03:48:03,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=584600.0, ans=0.95 2024-09-19 03:48:04,741 INFO [train.py:1198] (1/2) Epoch 33, batch 1350, loss[loss=0.2269, ctc_loss=0.1076, cr_loss=0.3323, attn_decoder_loss=0.2327, over 29754.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1183, cr_loss=0.3595, attn_decoder_loss=0.2416, over 5795690.74 frames. ], batch size: 81, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:48:17,533 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.274e+01 8.749e+01 9.320e+01 1.394e+02, threshold=1.750e+02, percent-clipped=0.0 2024-09-19 03:48:35,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=584680.0, ans=0.125 2024-09-19 03:48:35,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=584680.0, ans=0.125 2024-09-19 03:48:50,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=584720.0, ans=0.125 2024-09-19 03:49:03,313 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:49:07,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=584760.0, ans=0.125 2024-09-19 03:49:24,971 INFO [train.py:1198] (1/2) Epoch 33, batch 1400, loss[loss=0.2152, ctc_loss=0.1061, cr_loss=0.3207, attn_decoder_loss=0.2201, over 29564.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1182, cr_loss=0.3592, attn_decoder_loss=0.2415, over 5807023.37 frames. ], batch size: 69, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:49:29,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=584800.0, ans=0.125 2024-09-19 03:49:46,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=584840.0, ans=0.0 2024-09-19 03:49:50,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=584840.0, ans=0.2 2024-09-19 03:49:59,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=584880.0, ans=0.125 2024-09-19 03:50:13,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=584920.0, ans=0.1 2024-09-19 03:50:22,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=584920.0, ans=0.04949747468305833 2024-09-19 03:50:40,324 INFO [train.py:1198] (1/2) Epoch 33, batch 1450, loss[loss=0.2566, ctc_loss=0.1416, cr_loss=0.4125, attn_decoder_loss=0.2602, over 29446.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1182, cr_loss=0.3591, attn_decoder_loss=0.2417, over 5803881.22 frames. ], batch size: 94, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:50:41,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.92 vs. limit=22.5 2024-09-19 03:50:50,830 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.392e+01 8.954e+01 9.384e+01 1.541e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 03:50:53,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.47 vs. limit=15.0 2024-09-19 03:50:55,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=12.0 2024-09-19 03:51:03,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=585040.0, ans=0.0 2024-09-19 03:51:12,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=585080.0, ans=0.0 2024-09-19 03:51:22,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=585080.0, ans=0.125 2024-09-19 03:51:29,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=585120.0, ans=0.125 2024-09-19 03:51:48,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=585160.0, ans=0.0 2024-09-19 03:51:51,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=585160.0, ans=0.2 2024-09-19 03:51:52,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=585160.0, ans=0.0 2024-09-19 03:51:55,821 INFO [train.py:1198] (1/2) Epoch 33, batch 1500, loss[loss=0.2453, ctc_loss=0.1242, cr_loss=0.3541, attn_decoder_loss=0.2508, over 29637.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1184, cr_loss=0.3594, attn_decoder_loss=0.242, over 5805098.72 frames. ], batch size: 86, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:52:02,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=585200.0, ans=0.0 2024-09-19 03:52:19,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=585240.0, ans=0.125 2024-09-19 03:52:33,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-09-19 03:52:54,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=585320.0, ans=0.2 2024-09-19 03:52:54,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=585320.0, ans=0.2 2024-09-19 03:53:06,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=585360.0, ans=0.125 2024-09-19 03:53:15,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.16 vs. limit=15.0 2024-09-19 03:53:16,547 INFO [train.py:1198] (1/2) Epoch 33, batch 1550, loss[loss=0.2597, ctc_loss=0.1411, cr_loss=0.4137, attn_decoder_loss=0.2636, over 29535.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.119, cr_loss=0.3604, attn_decoder_loss=0.2421, over 5780775.08 frames. ], batch size: 90, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:53:27,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=585400.0, ans=0.1 2024-09-19 03:53:28,682 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.624e+01 9.001e+01 9.537e+01 4.675e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-19 03:53:32,059 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:53:53,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=585480.0, ans=0.125 2024-09-19 03:54:05,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=585520.0, ans=0.125 2024-09-19 03:54:10,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=585520.0, ans=0.125 2024-09-19 03:54:12,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.54 vs. limit=10.0 2024-09-19 03:54:30,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=585600.0, ans=0.0 2024-09-19 03:54:32,528 INFO [train.py:1198] (1/2) Epoch 33, batch 1600, loss[loss=0.2473, ctc_loss=0.1219, cr_loss=0.3764, attn_decoder_loss=0.2528, over 29671.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1192, cr_loss=0.3605, attn_decoder_loss=0.2422, over 5764620.16 frames. ], batch size: 85, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:54:36,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.49 vs. limit=10.0 2024-09-19 03:54:37,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=585600.0, ans=0.0 2024-09-19 03:54:38,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=15.0 2024-09-19 03:54:55,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=585640.0, ans=0.0 2024-09-19 03:55:06,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=585680.0, ans=0.125 2024-09-19 03:55:21,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=585720.0, ans=0.0 2024-09-19 03:55:35,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=585760.0, ans=0.125 2024-09-19 03:55:48,309 INFO [train.py:1198] (1/2) Epoch 33, batch 1650, loss[loss=0.2494, ctc_loss=0.1255, cr_loss=0.3929, attn_decoder_loss=0.2544, over 29682.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1192, cr_loss=0.3611, attn_decoder_loss=0.2422, over 5759240.85 frames. ], batch size: 89, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:56:02,791 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.668e+01 9.020e+01 9.711e+01 1.996e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-19 03:56:03,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=585800.0, ans=0.125 2024-09-19 03:56:12,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=585840.0, ans=0.025 2024-09-19 03:56:57,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-19 03:56:57,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-19 03:56:59,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=585960.0, ans=0.125 2024-09-19 03:57:08,345 INFO [train.py:1198] (1/2) Epoch 33, batch 1700, loss[loss=0.2102, ctc_loss=0.09916, cr_loss=0.3241, attn_decoder_loss=0.2154, over 29560.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1186, cr_loss=0.3599, attn_decoder_loss=0.2419, over 5780665.42 frames. ], batch size: 69, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:57:16,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=586000.0, ans=0.125 2024-09-19 03:57:46,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=586080.0, ans=0.1 2024-09-19 03:57:56,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=586120.0, ans=0.125 2024-09-19 03:58:23,847 INFO [train.py:1198] (1/2) Epoch 33, batch 1750, loss[loss=0.2098, ctc_loss=0.09804, cr_loss=0.3183, attn_decoder_loss=0.2152, over 29321.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1185, cr_loss=0.36, attn_decoder_loss=0.2415, over 5789751.53 frames. ], batch size: 67, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 03:58:37,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.452e+01 8.998e+01 9.448e+01 1.573e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-19 03:58:56,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=586280.0, ans=0.0 2024-09-19 03:59:02,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=586280.0, ans=0.125 2024-09-19 03:59:23,804 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:59:31,380 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:59:34,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=586360.0, ans=0.125 2024-09-19 03:59:40,295 INFO [train.py:1198] (1/2) Epoch 33, batch 1800, loss[loss=0.2462, ctc_loss=0.1233, cr_loss=0.386, attn_decoder_loss=0.2513, over 29706.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1186, cr_loss=0.3607, attn_decoder_loss=0.2418, over 5791937.49 frames. ], batch size: 83, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 03:59:45,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=586400.0, ans=0.0 2024-09-19 04:00:00,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=586440.0, ans=0.125 2024-09-19 04:00:11,916 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:00:26,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=586520.0, ans=0.125 2024-09-19 04:00:39,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=586520.0, ans=0.125 2024-09-19 04:01:00,448 INFO [train.py:1198] (1/2) Epoch 33, batch 1850, loss[loss=0.2537, ctc_loss=0.1255, cr_loss=0.3782, attn_decoder_loss=0.2595, over 29657.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.119, cr_loss=0.3612, attn_decoder_loss=0.2419, over 5795500.11 frames. ], batch size: 86, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:01:00,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=586600.0, ans=0.1 2024-09-19 04:01:12,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=586600.0, ans=0.2 2024-09-19 04:01:13,844 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 8.429e+01 8.891e+01 9.502e+01 1.976e+02, threshold=1.778e+02, percent-clipped=1.0 2024-09-19 04:01:30,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=586680.0, ans=0.0 2024-09-19 04:01:48,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=586720.0, ans=0.125 2024-09-19 04:01:51,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.65 vs. limit=15.0 2024-09-19 04:02:11,635 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:02:13,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=586760.0, ans=0.0 2024-09-19 04:02:13,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=586760.0, ans=0.125 2024-09-19 04:02:15,710 INFO [train.py:1198] (1/2) Epoch 33, batch 1900, loss[loss=0.2498, ctc_loss=0.1303, cr_loss=0.3985, attn_decoder_loss=0.2542, over 29685.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1189, cr_loss=0.3613, attn_decoder_loss=0.2421, over 5802932.21 frames. ], batch size: 89, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:02:19,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586800.0, ans=0.1 2024-09-19 04:02:44,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=586880.0, ans=0.125 2024-09-19 04:03:04,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=586920.0, ans=0.125 2024-09-19 04:03:20,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2024-09-19 04:03:31,173 INFO [train.py:1198] (1/2) Epoch 33, batch 1950, loss[loss=0.2398, ctc_loss=0.1337, cr_loss=0.3891, attn_decoder_loss=0.2429, over 29435.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1194, cr_loss=0.3625, attn_decoder_loss=0.2433, over 5817845.72 frames. ], batch size: 78, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:03:39,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=587000.0, ans=0.025 2024-09-19 04:03:44,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.530e+01 9.165e+01 9.739e+01 1.607e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-19 04:03:46,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2024-09-19 04:04:03,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=587080.0, ans=0.5 2024-09-19 04:04:14,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=587080.0, ans=0.0 2024-09-19 04:04:41,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=587160.0, ans=0.125 2024-09-19 04:04:42,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=587160.0, ans=0.125 2024-09-19 04:04:43,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=587160.0, ans=0.0 2024-09-19 04:04:51,439 INFO [train.py:1198] (1/2) Epoch 33, batch 2000, loss[loss=0.2089, ctc_loss=0.09566, cr_loss=0.31, attn_decoder_loss=0.2146, over 29370.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1197, cr_loss=0.3631, attn_decoder_loss=0.2435, over 5796192.22 frames. ], batch size: 67, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:05:06,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2024-09-19 04:05:08,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-09-19 04:05:14,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2024-09-19 04:05:18,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=587240.0, ans=0.125 2024-09-19 04:05:37,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=587320.0, ans=0.125 2024-09-19 04:05:57,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=587360.0, ans=0.0 2024-09-19 04:06:07,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.40 vs. limit=10.0 2024-09-19 04:06:07,787 INFO [train.py:1198] (1/2) Epoch 33, batch 2050, loss[loss=0.2144, ctc_loss=0.1048, cr_loss=0.327, attn_decoder_loss=0.2193, over 29441.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1194, cr_loss=0.3621, attn_decoder_loss=0.2426, over 5789336.00 frames. ], batch size: 70, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:06:11,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=587400.0, ans=0.0 2024-09-19 04:06:21,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.725e+01 9.262e+01 9.868e+01 2.043e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-19 04:06:29,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=587440.0, ans=0.125 2024-09-19 04:06:30,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=587440.0, ans=0.125 2024-09-19 04:07:05,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=587520.0, ans=0.0 2024-09-19 04:07:19,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=587560.0, ans=0.025 2024-09-19 04:07:23,428 INFO [train.py:1198] (1/2) Epoch 33, batch 2100, loss[loss=0.2385, ctc_loss=0.1258, cr_loss=0.3816, attn_decoder_loss=0.2426, over 29746.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1188, cr_loss=0.3605, attn_decoder_loss=0.242, over 5800144.84 frames. ], batch size: 81, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:07:34,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=587600.0, ans=0.1 2024-09-19 04:07:39,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.02 vs. limit=15.0 2024-09-19 04:07:41,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=587640.0, ans=0.125 2024-09-19 04:07:43,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=12.0 2024-09-19 04:07:46,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=587640.0, ans=0.0 2024-09-19 04:07:53,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=587680.0, ans=0.125 2024-09-19 04:07:53,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=587680.0, ans=0.125 2024-09-19 04:08:02,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=587680.0, ans=0.0 2024-09-19 04:08:03,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=587680.0, ans=0.125 2024-09-19 04:08:09,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=587720.0, ans=0.125 2024-09-19 04:08:28,270 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:08:42,922 INFO [train.py:1198] (1/2) Epoch 33, batch 2150, loss[loss=0.2352, ctc_loss=0.1213, cr_loss=0.3526, attn_decoder_loss=0.24, over 29483.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1181, cr_loss=0.3597, attn_decoder_loss=0.2416, over 5814842.63 frames. ], batch size: 78, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:08:47,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=587800.0, ans=0.125 2024-09-19 04:08:55,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=587800.0, ans=0.1 2024-09-19 04:08:56,496 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.476e+01 8.968e+01 9.482e+01 1.071e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 04:08:57,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-19 04:09:40,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=587920.0, ans=0.1 2024-09-19 04:09:58,817 INFO [train.py:1198] (1/2) Epoch 33, batch 2200, loss[loss=0.2518, ctc_loss=0.1339, cr_loss=0.3923, attn_decoder_loss=0.2562, over 29627.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1181, cr_loss=0.3595, attn_decoder_loss=0.2417, over 5811640.34 frames. ], batch size: 86, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:09:59,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588000.0, ans=0.1 2024-09-19 04:10:00,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588000.0, ans=0.1 2024-09-19 04:10:17,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.65 vs. limit=15.0 2024-09-19 04:10:23,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=588040.0, ans=0.2 2024-09-19 04:10:27,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588080.0, ans=0.1 2024-09-19 04:10:31,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=588080.0, ans=10.0 2024-09-19 04:10:42,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588120.0, ans=0.1 2024-09-19 04:10:48,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=588120.0, ans=0.2 2024-09-19 04:11:03,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=588160.0, ans=0.0 2024-09-19 04:11:05,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588160.0, ans=0.1 2024-09-19 04:11:14,406 INFO [train.py:1198] (1/2) Epoch 33, batch 2250, loss[loss=0.2383, ctc_loss=0.116, cr_loss=0.3571, attn_decoder_loss=0.244, over 29692.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1178, cr_loss=0.3591, attn_decoder_loss=0.2414, over 5812258.62 frames. ], batch size: 82, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:11:23,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=588200.0, ans=0.025 2024-09-19 04:11:29,580 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.468e+01 9.055e+01 9.587e+01 2.332e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-19 04:12:09,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=15.0 2024-09-19 04:12:16,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=588320.0, ans=0.95 2024-09-19 04:12:24,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=588360.0, ans=0.125 2024-09-19 04:12:30,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=588360.0, ans=0.1 2024-09-19 04:12:34,490 INFO [train.py:1198] (1/2) Epoch 33, batch 2300, loss[loss=0.2116, ctc_loss=0.102, cr_loss=0.3312, attn_decoder_loss=0.2164, over 29319.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.117, cr_loss=0.3572, attn_decoder_loss=0.2402, over 5799791.46 frames. ], batch size: 71, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:12:48,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=588440.0, ans=0.0 2024-09-19 04:12:48,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=588440.0, ans=0.125 2024-09-19 04:12:54,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=588440.0, ans=0.025 2024-09-19 04:13:04,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=588480.0, ans=0.125 2024-09-19 04:13:09,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588480.0, ans=0.1 2024-09-19 04:13:45,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=588560.0, ans=0.09899494936611666 2024-09-19 04:13:49,846 INFO [train.py:1198] (1/2) Epoch 33, batch 2350, loss[loss=0.2519, ctc_loss=0.1293, cr_loss=0.3951, attn_decoder_loss=0.2567, over 29691.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1171, cr_loss=0.3576, attn_decoder_loss=0.2404, over 5804482.66 frames. ], batch size: 83, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:13:51,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=588600.0, ans=0.0 2024-09-19 04:13:55,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=588600.0, ans=0.0 2024-09-19 04:14:01,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=588600.0, ans=0.05 2024-09-19 04:14:04,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.296e+01 8.859e+01 9.524e+01 1.352e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 04:14:12,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=588640.0, ans=0.0 2024-09-19 04:14:18,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=588680.0, ans=0.2 2024-09-19 04:14:25,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.55 vs. limit=15.0 2024-09-19 04:14:32,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=588680.0, ans=0.125 2024-09-19 04:14:38,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2024-09-19 04:14:45,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=588720.0, ans=0.2 2024-09-19 04:15:03,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=588760.0, ans=0.0 2024-09-19 04:15:04,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=588800.0, ans=0.125 2024-09-19 04:15:06,278 INFO [train.py:1198] (1/2) Epoch 33, batch 2400, loss[loss=0.2253, ctc_loss=0.1106, cr_loss=0.3572, attn_decoder_loss=0.2301, over 29520.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1174, cr_loss=0.3585, attn_decoder_loss=0.2411, over 5808886.09 frames. ], batch size: 76, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:15:10,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.97 vs. limit=22.5 2024-09-19 04:15:15,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=588800.0, ans=0.125 2024-09-19 04:15:16,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=588800.0, ans=0.125 2024-09-19 04:15:17,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=588800.0, ans=0.0 2024-09-19 04:15:46,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=588880.0, ans=0.125 2024-09-19 04:15:50,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=588920.0, ans=0.0 2024-09-19 04:15:58,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588920.0, ans=0.1 2024-09-19 04:16:10,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588960.0, ans=0.1 2024-09-19 04:16:23,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=589000.0, ans=0.125 2024-09-19 04:16:24,261 INFO [train.py:1198] (1/2) Epoch 33, batch 2450, loss[loss=0.2454, ctc_loss=0.1254, cr_loss=0.3852, attn_decoder_loss=0.2502, over 29712.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1185, cr_loss=0.3603, attn_decoder_loss=0.242, over 5785274.10 frames. ], batch size: 82, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:16:36,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=589000.0, ans=0.07 2024-09-19 04:16:37,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-09-19 04:16:39,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=589040.0, ans=0.1 2024-09-19 04:16:40,789 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.663e+01 9.079e+01 9.765e+01 4.096e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 04:16:41,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=589040.0, ans=0.0 2024-09-19 04:16:47,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=589040.0, ans=0.5 2024-09-19 04:17:12,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=589120.0, ans=0.125 2024-09-19 04:17:14,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=589120.0, ans=0.125 2024-09-19 04:17:22,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=589120.0, ans=0.0 2024-09-19 04:17:36,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-19 04:17:40,051 INFO [train.py:1198] (1/2) Epoch 33, batch 2500, loss[loss=0.2479, ctc_loss=0.1226, cr_loss=0.3573, attn_decoder_loss=0.2539, over 29630.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1188, cr_loss=0.361, attn_decoder_loss=0.2421, over 5795747.23 frames. ], batch size: 86, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:18:00,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=589240.0, ans=0.09899494936611666 2024-09-19 04:18:15,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=589280.0, ans=0.125 2024-09-19 04:18:37,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=589320.0, ans=0.2 2024-09-19 04:18:56,403 INFO [train.py:1198] (1/2) Epoch 33, batch 2550, loss[loss=0.2142, ctc_loss=0.09711, cr_loss=0.3315, attn_decoder_loss=0.2199, over 29378.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1184, cr_loss=0.3604, attn_decoder_loss=0.242, over 5798979.64 frames. ], batch size: 67, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:19:03,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-09-19 04:19:07,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=589400.0, ans=0.1 2024-09-19 04:19:12,964 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.542e+01 8.992e+01 9.541e+01 1.643e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-19 04:19:31,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=589480.0, ans=0.125 2024-09-19 04:19:32,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2024-09-19 04:19:37,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=589480.0, ans=0.1 2024-09-19 04:19:43,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=589520.0, ans=0.1 2024-09-19 04:19:56,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=589520.0, ans=0.0 2024-09-19 04:20:02,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=589560.0, ans=0.125 2024-09-19 04:20:10,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=589560.0, ans=0.05 2024-09-19 04:20:16,548 INFO [train.py:1198] (1/2) Epoch 33, batch 2600, loss[loss=0.2382, ctc_loss=0.1253, cr_loss=0.385, attn_decoder_loss=0.2422, over 29455.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1191, cr_loss=0.3615, attn_decoder_loss=0.2427, over 5794605.69 frames. ], batch size: 78, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:20:21,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=589600.0, ans=0.0 2024-09-19 04:20:33,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=589640.0, ans=0.07 2024-09-19 04:20:34,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=589640.0, ans=0.125 2024-09-19 04:21:29,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=15.0 2024-09-19 04:21:30,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589800.0, ans=0.1 2024-09-19 04:21:31,665 INFO [train.py:1198] (1/2) Epoch 33, batch 2650, loss[loss=0.2571, ctc_loss=0.1383, cr_loss=0.381, attn_decoder_loss=0.2619, over 29248.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1193, cr_loss=0.362, attn_decoder_loss=0.2432, over 5800943.80 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:21:48,459 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.528e+01 8.946e+01 9.384e+01 1.299e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-19 04:21:57,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=589840.0, ans=0.09899494936611666 2024-09-19 04:22:27,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-19 04:22:47,777 INFO [train.py:1198] (1/2) Epoch 33, batch 2700, loss[loss=0.2419, ctc_loss=0.119, cr_loss=0.3761, attn_decoder_loss=0.2472, over 29523.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1195, cr_loss=0.3621, attn_decoder_loss=0.2432, over 5797099.48 frames. ], batch size: 87, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:22:56,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=590000.0, ans=0.125 2024-09-19 04:23:04,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=590040.0, ans=0.125 2024-09-19 04:23:09,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=590040.0, ans=0.2 2024-09-19 04:23:44,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=590120.0, ans=0.125 2024-09-19 04:24:06,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590200.0, ans=0.1 2024-09-19 04:24:07,844 INFO [train.py:1198] (1/2) Epoch 33, batch 2750, loss[loss=0.2342, ctc_loss=0.1162, cr_loss=0.3682, attn_decoder_loss=0.2391, over 29535.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1188, cr_loss=0.3611, attn_decoder_loss=0.2422, over 5795181.66 frames. ], batch size: 75, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:24:11,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=590200.0, ans=0.125 2024-09-19 04:24:12,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=590200.0, ans=0.125 2024-09-19 04:24:13,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=12.0 2024-09-19 04:24:24,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.542e+01 8.884e+01 9.570e+01 2.810e+02, threshold=1.777e+02, percent-clipped=3.0 2024-09-19 04:24:30,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=590240.0, ans=0.125 2024-09-19 04:24:35,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=590240.0, ans=0.0 2024-09-19 04:25:02,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=590320.0, ans=0.125 2024-09-19 04:25:02,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=590320.0, ans=0.125 2024-09-19 04:25:05,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=590320.0, ans=0.125 2024-09-19 04:25:14,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=590360.0, ans=0.125 2024-09-19 04:25:24,059 INFO [train.py:1198] (1/2) Epoch 33, batch 2800, loss[loss=0.2549, ctc_loss=0.1445, cr_loss=0.3709, attn_decoder_loss=0.2589, over 20063.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1194, cr_loss=0.3619, attn_decoder_loss=0.2425, over 5775276.59 frames. ], batch size: 209, lr: 3.34e-03, grad_scale: 16.0 2024-09-19 04:25:33,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=590400.0, ans=0.125 2024-09-19 04:25:38,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=590440.0, ans=0.125 2024-09-19 04:25:43,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=590440.0, ans=0.125 2024-09-19 04:25:46,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=590440.0, ans=0.125 2024-09-19 04:25:59,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=590480.0, ans=0.2 2024-09-19 04:26:13,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.40 vs. limit=15.0 2024-09-19 04:26:39,477 INFO [train.py:1198] (1/2) Epoch 33, batch 2850, loss[loss=0.2357, ctc_loss=0.1134, cr_loss=0.351, attn_decoder_loss=0.2414, over 29474.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1199, cr_loss=0.363, attn_decoder_loss=0.2429, over 5761367.24 frames. ], batch size: 77, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:26:57,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.701e+01 9.298e+01 9.945e+01 2.152e+02, threshold=1.860e+02, percent-clipped=1.0 2024-09-19 04:27:33,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=590720.0, ans=0.2 2024-09-19 04:27:35,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=590720.0, ans=0.0 2024-09-19 04:27:36,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=590720.0, ans=0.025 2024-09-19 04:27:40,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-09-19 04:27:45,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-19 04:27:50,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-09-19 04:27:57,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590760.0, ans=0.1 2024-09-19 04:27:59,741 INFO [train.py:1198] (1/2) Epoch 33, batch 2900, loss[loss=0.2358, ctc_loss=0.1158, cr_loss=0.3533, attn_decoder_loss=0.2413, over 29425.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1205, cr_loss=0.3647, attn_decoder_loss=0.244, over 5786526.58 frames. ], batch size: 79, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:28:04,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590800.0, ans=0.1 2024-09-19 04:28:12,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-09-19 04:28:26,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=590840.0, ans=0.125 2024-09-19 04:28:38,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=590880.0, ans=0.2 2024-09-19 04:29:13,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=12.0 2024-09-19 04:29:15,409 INFO [train.py:1198] (1/2) Epoch 33, batch 2950, loss[loss=0.229, ctc_loss=0.1134, cr_loss=0.3457, attn_decoder_loss=0.2341, over 29480.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1189, cr_loss=0.3614, attn_decoder_loss=0.242, over 5781424.51 frames. ], batch size: 75, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:29:33,865 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.422e+01 8.881e+01 9.248e+01 1.525e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-19 04:29:41,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=591040.0, ans=0.0 2024-09-19 04:29:55,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=591080.0, ans=0.05 2024-09-19 04:30:32,266 INFO [train.py:1198] (1/2) Epoch 33, batch 3000, loss[loss=0.2368, ctc_loss=0.1149, cr_loss=0.3485, attn_decoder_loss=0.2426, over 29749.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1186, cr_loss=0.3611, attn_decoder_loss=0.2419, over 5782991.02 frames. ], batch size: 81, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:30:32,267 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 04:30:50,729 INFO [train.py:1230] (1/2) Epoch 33, validation: loss=0.2119, ctc_loss=0.03704, cr_loss=5.931e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-19 04:30:50,730 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 04:30:51,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=591200.0, ans=0.0 2024-09-19 04:31:03,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=591200.0, ans=0.125 2024-09-19 04:31:12,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.72 vs. limit=15.0 2024-09-19 04:31:16,442 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:31:26,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=591280.0, ans=0.5 2024-09-19 04:31:39,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=591320.0, ans=0.0 2024-09-19 04:31:56,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=591360.0, ans=0.125 2024-09-19 04:31:56,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=591360.0, ans=0.0 2024-09-19 04:32:03,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=591360.0, ans=0.125 2024-09-19 04:32:11,228 INFO [train.py:1198] (1/2) Epoch 33, batch 3050, loss[loss=0.2243, ctc_loss=0.1076, cr_loss=0.3469, attn_decoder_loss=0.2296, over 29536.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1191, cr_loss=0.3619, attn_decoder_loss=0.2426, over 5777459.36 frames. ], batch size: 76, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:32:14,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=591400.0, ans=0.0 2024-09-19 04:32:29,479 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.592e+01 9.144e+01 9.827e+01 2.461e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-19 04:32:41,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=591480.0, ans=0.0 2024-09-19 04:32:51,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-09-19 04:33:07,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=591520.0, ans=0.125 2024-09-19 04:33:13,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-09-19 04:33:18,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.17 vs. limit=15.0 2024-09-19 04:33:18,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.04 vs. limit=10.0 2024-09-19 04:33:21,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=591560.0, ans=0.025 2024-09-19 04:33:26,780 INFO [train.py:1198] (1/2) Epoch 33, batch 3100, loss[loss=0.2418, ctc_loss=0.1221, cr_loss=0.3514, attn_decoder_loss=0.2472, over 29279.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1189, cr_loss=0.361, attn_decoder_loss=0.2423, over 5777552.90 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:33:40,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=591640.0, ans=0.0 2024-09-19 04:33:41,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591640.0, ans=0.1 2024-09-19 04:34:10,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=22.5 2024-09-19 04:34:18,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=591720.0, ans=0.0 2024-09-19 04:34:42,673 INFO [train.py:1198] (1/2) Epoch 33, batch 3150, loss[loss=0.2515, ctc_loss=0.1268, cr_loss=0.3772, attn_decoder_loss=0.2569, over 28865.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1191, cr_loss=0.3617, attn_decoder_loss=0.2424, over 5784472.13 frames. ], batch size: 104, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:34:50,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=591800.0, ans=0.2 2024-09-19 04:35:03,070 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.559e+01 9.035e+01 9.509e+01 1.493e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-19 04:35:34,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=591920.0, ans=0.2 2024-09-19 04:35:34,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=591920.0, ans=0.2 2024-09-19 04:35:43,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=591920.0, ans=0.025 2024-09-19 04:35:59,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.69 vs. limit=15.0 2024-09-19 04:36:10,888 INFO [train.py:1198] (1/2) Epoch 33, batch 3200, loss[loss=0.2389, ctc_loss=0.1174, cr_loss=0.3552, attn_decoder_loss=0.2445, over 29405.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1182, cr_loss=0.3599, attn_decoder_loss=0.2416, over 5794521.48 frames. ], batch size: 79, lr: 3.34e-03, grad_scale: 16.0 2024-09-19 04:36:11,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=592000.0, ans=0.0 2024-09-19 04:36:23,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=592000.0, ans=0.125 2024-09-19 04:37:04,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=592120.0, ans=0.0 2024-09-19 04:37:06,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-19 04:37:08,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=22.5 2024-09-19 04:37:10,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=592160.0, ans=0.125 2024-09-19 04:37:15,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-19 04:37:18,228 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:37:26,888 INFO [train.py:1198] (1/2) Epoch 33, batch 3250, loss[loss=0.2416, ctc_loss=0.1123, cr_loss=0.3493, attn_decoder_loss=0.2482, over 29710.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1185, cr_loss=0.3602, attn_decoder_loss=0.242, over 5800956.65 frames. ], batch size: 84, lr: 3.34e-03, grad_scale: 16.0 2024-09-19 04:37:31,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=592200.0, ans=0.1 2024-09-19 04:37:40,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=592240.0, ans=0.125 2024-09-19 04:37:44,970 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.640e+01 9.097e+01 9.766e+01 4.487e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-19 04:37:57,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.48 vs. limit=6.0 2024-09-19 04:38:09,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=592280.0, ans=0.025 2024-09-19 04:38:30,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=592360.0, ans=0.2 2024-09-19 04:38:38,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=592360.0, ans=0.0 2024-09-19 04:38:39,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=592360.0, ans=0.09899494936611666 2024-09-19 04:38:42,510 INFO [train.py:1198] (1/2) Epoch 33, batch 3300, loss[loss=0.2498, ctc_loss=0.1247, cr_loss=0.364, attn_decoder_loss=0.2556, over 28156.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1179, cr_loss=0.3587, attn_decoder_loss=0.241, over 5797368.87 frames. ], batch size: 111, lr: 3.34e-03, grad_scale: 16.0 2024-09-19 04:39:04,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592440.0, ans=0.1 2024-09-19 04:39:17,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=592480.0, ans=0.1 2024-09-19 04:39:17,124 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:39:29,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-09-19 04:39:31,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.55 vs. limit=10.0 2024-09-19 04:40:02,678 INFO [train.py:1198] (1/2) Epoch 33, batch 3350, loss[loss=0.2474, ctc_loss=0.119, cr_loss=0.3675, attn_decoder_loss=0.2535, over 28916.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1188, cr_loss=0.3603, attn_decoder_loss=0.2419, over 5774490.39 frames. ], batch size: 104, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:40:10,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=592600.0, ans=0.0 2024-09-19 04:40:22,488 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.878e+01 9.274e+01 9.993e+01 2.283e+02, threshold=1.855e+02, percent-clipped=2.0 2024-09-19 04:40:25,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=592640.0, ans=0.125 2024-09-19 04:40:43,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=22.5 2024-09-19 04:40:47,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=592720.0, ans=0.125 2024-09-19 04:40:50,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=592720.0, ans=0.125 2024-09-19 04:40:55,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.84 vs. limit=15.0 2024-09-19 04:41:08,518 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:41:11,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=592760.0, ans=0.125 2024-09-19 04:41:17,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=592800.0, ans=0.1 2024-09-19 04:41:19,056 INFO [train.py:1198] (1/2) Epoch 33, batch 3400, loss[loss=0.2064, ctc_loss=0.1017, cr_loss=0.308, attn_decoder_loss=0.2111, over 29395.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1193, cr_loss=0.3613, attn_decoder_loss=0.242, over 5766394.65 frames. ], batch size: 67, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:41:23,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=592800.0, ans=0.0 2024-09-19 04:41:36,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.98 vs. limit=22.5 2024-09-19 04:41:44,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2024-09-19 04:41:51,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=592880.0, ans=0.125 2024-09-19 04:41:54,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=592880.0, ans=0.0 2024-09-19 04:42:04,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=592920.0, ans=0.125 2024-09-19 04:42:11,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.70 vs. limit=15.0 2024-09-19 04:42:14,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=592920.0, ans=0.125 2024-09-19 04:42:34,747 INFO [train.py:1198] (1/2) Epoch 33, batch 3450, loss[loss=0.2426, ctc_loss=0.1153, cr_loss=0.3383, attn_decoder_loss=0.2492, over 28373.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1192, cr_loss=0.3617, attn_decoder_loss=0.2422, over 5775433.81 frames. ], batch size: 111, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:42:35,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-19 04:42:36,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=593000.0, ans=0.1 2024-09-19 04:42:41,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=593000.0, ans=0.125 2024-09-19 04:42:49,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=593000.0, ans=0.125 2024-09-19 04:42:52,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=593040.0, ans=0.0 2024-09-19 04:42:56,753 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.686e+01 9.141e+01 9.790e+01 2.387e+02, threshold=1.828e+02, percent-clipped=2.0 2024-09-19 04:43:06,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=593080.0, ans=0.025 2024-09-19 04:43:11,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593080.0, ans=0.1 2024-09-19 04:43:32,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-19 04:43:55,197 INFO [train.py:1198] (1/2) Epoch 33, batch 3500, loss[loss=0.2134, ctc_loss=0.1043, cr_loss=0.3291, attn_decoder_loss=0.2182, over 29317.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1189, cr_loss=0.361, attn_decoder_loss=0.2417, over 5776904.07 frames. ], batch size: 71, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 04:44:07,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=593200.0, ans=0.5 2024-09-19 04:44:10,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=593240.0, ans=0.125 2024-09-19 04:44:16,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=593240.0, ans=0.07 2024-09-19 04:44:35,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=593280.0, ans=0.125 2024-09-19 04:44:38,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=593320.0, ans=0.0 2024-09-19 04:44:40,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=593320.0, ans=0.125 2024-09-19 04:44:44,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=593320.0, ans=0.025 2024-09-19 04:45:09,797 INFO [train.py:1198] (1/2) Epoch 33, batch 3550, loss[loss=0.2409, ctc_loss=0.1152, cr_loss=0.3337, attn_decoder_loss=0.2475, over 29704.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1187, cr_loss=0.3604, attn_decoder_loss=0.2418, over 5784389.38 frames. ], batch size: 89, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 04:45:12,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=593400.0, ans=0.125 2024-09-19 04:45:28,727 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.555e+01 9.089e+01 9.583e+01 3.040e+02, threshold=1.818e+02, percent-clipped=2.0 2024-09-19 04:45:29,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=593440.0, ans=0.95 2024-09-19 04:45:31,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.36 vs. limit=15.0 2024-09-19 04:45:57,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=593520.0, ans=0.125 2024-09-19 04:46:04,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=593520.0, ans=0.07 2024-09-19 04:46:24,255 INFO [train.py:1198] (1/2) Epoch 33, batch 3600, loss[loss=0.228, ctc_loss=0.1104, cr_loss=0.3533, attn_decoder_loss=0.2332, over 29487.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1182, cr_loss=0.3599, attn_decoder_loss=0.2418, over 5793735.34 frames. ], batch size: 77, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:46:35,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=593600.0, ans=0.125 2024-09-19 04:47:09,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=593720.0, ans=0.125 2024-09-19 04:47:18,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=593720.0, ans=0.0 2024-09-19 04:47:27,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=593760.0, ans=0.07 2024-09-19 04:47:29,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-09-19 04:47:36,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=593760.0, ans=0.125 2024-09-19 04:47:38,881 INFO [train.py:1198] (1/2) Epoch 33, batch 3650, loss[loss=0.2503, ctc_loss=0.1279, cr_loss=0.3627, attn_decoder_loss=0.2558, over 29543.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1176, cr_loss=0.3587, attn_decoder_loss=0.2412, over 5796047.45 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:47:46,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-09-19 04:47:58,206 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.325e+01 8.858e+01 9.502e+01 1.563e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 04:48:33,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=593920.0, ans=0.125 2024-09-19 04:48:33,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=593920.0, ans=0.1 2024-09-19 04:48:35,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2024-09-19 04:48:36,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=593920.0, ans=0.125 2024-09-19 04:48:52,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=593960.0, ans=0.2 2024-09-19 04:48:55,548 INFO [train.py:1198] (1/2) Epoch 33, batch 3700, loss[loss=0.2386, ctc_loss=0.1145, cr_loss=0.3498, attn_decoder_loss=0.2447, over 29701.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1174, cr_loss=0.3587, attn_decoder_loss=0.2411, over 5806163.56 frames. ], batch size: 84, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:49:01,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=594000.0, ans=0.0 2024-09-19 04:49:10,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=594040.0, ans=0.0 2024-09-19 04:49:27,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=594080.0, ans=0.2 2024-09-19 04:49:37,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=15.0 2024-09-19 04:49:40,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=594120.0, ans=0.125 2024-09-19 04:49:41,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-19 04:49:51,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=594120.0, ans=0.0 2024-09-19 04:50:11,530 INFO [train.py:1198] (1/2) Epoch 33, batch 3750, loss[loss=0.2132, ctc_loss=0.1062, cr_loss=0.3311, attn_decoder_loss=0.2178, over 29335.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1177, cr_loss=0.359, attn_decoder_loss=0.241, over 5808759.11 frames. ], batch size: 67, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:50:30,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.561e+01 9.006e+01 9.475e+01 6.465e+02, threshold=1.801e+02, percent-clipped=2.0 2024-09-19 04:50:44,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=594280.0, ans=0.1 2024-09-19 04:50:59,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=594320.0, ans=0.125 2024-09-19 04:51:20,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594360.0, ans=0.1 2024-09-19 04:51:26,168 INFO [train.py:1198] (1/2) Epoch 33, batch 3800, loss[loss=0.2486, ctc_loss=0.1225, cr_loss=0.3763, attn_decoder_loss=0.2542, over 29629.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1178, cr_loss=0.359, attn_decoder_loss=0.241, over 5798906.93 frames. ], batch size: 86, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:51:39,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2024-09-19 04:52:06,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=594480.0, ans=0.0 2024-09-19 04:52:08,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=594480.0, ans=0.125 2024-09-19 04:52:16,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.89 vs. limit=15.0 2024-09-19 04:52:34,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-09-19 04:52:40,416 INFO [train.py:1198] (1/2) Epoch 33, batch 3850, loss[loss=0.2404, ctc_loss=0.1132, cr_loss=0.352, attn_decoder_loss=0.2467, over 29263.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1176, cr_loss=0.3584, attn_decoder_loss=0.241, over 5812991.38 frames. ], batch size: 100, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:52:59,670 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.527e+01 9.047e+01 9.575e+01 1.638e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 04:53:08,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594680.0, ans=0.1 2024-09-19 04:53:13,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=594680.0, ans=0.025 2024-09-19 04:53:38,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=594760.0, ans=0.125 2024-09-19 04:53:41,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=594760.0, ans=0.0 2024-09-19 04:53:46,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=594760.0, ans=0.2 2024-09-19 04:53:56,277 INFO [train.py:1198] (1/2) Epoch 33, batch 3900, loss[loss=0.2458, ctc_loss=0.116, cr_loss=0.3316, attn_decoder_loss=0.2528, over 29630.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1182, cr_loss=0.3595, attn_decoder_loss=0.2415, over 5816673.90 frames. ], batch size: 86, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 04:53:58,097 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:54:12,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=594840.0, ans=0.05 2024-09-19 04:54:24,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=594880.0, ans=0.125 2024-09-19 04:54:30,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=594880.0, ans=0.0 2024-09-19 04:54:39,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.85 vs. limit=12.0 2024-09-19 04:54:40,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=594920.0, ans=0.0 2024-09-19 04:54:53,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=594960.0, ans=0.025 2024-09-19 04:55:04,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=594960.0, ans=0.125 2024-09-19 04:55:11,404 INFO [train.py:1198] (1/2) Epoch 33, batch 3950, loss[loss=0.2504, ctc_loss=0.1255, cr_loss=0.3787, attn_decoder_loss=0.2558, over 29399.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1177, cr_loss=0.3591, attn_decoder_loss=0.2415, over 5836346.21 frames. ], batch size: 97, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 04:55:14,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=595000.0, ans=0.025 2024-09-19 04:55:19,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=595000.0, ans=0.0 2024-09-19 04:55:22,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2024-09-19 04:55:32,033 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.607e+01 9.033e+01 9.637e+01 1.585e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-19 04:55:53,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2024-09-19 04:56:03,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2024-09-19 04:56:12,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=595160.0, ans=0.125 2024-09-19 04:56:15,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=595160.0, ans=0.0 2024-09-19 04:56:15,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=595160.0, ans=0.125 2024-09-19 04:56:25,734 INFO [train.py:1198] (1/2) Epoch 33, batch 4000, loss[loss=0.2174, ctc_loss=0.09819, cr_loss=0.3036, attn_decoder_loss=0.2239, over 29480.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1181, cr_loss=0.3596, attn_decoder_loss=0.2416, over 5813041.27 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:56:36,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=595200.0, ans=0.0 2024-09-19 04:56:44,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=595240.0, ans=0.125 2024-09-19 04:56:46,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595240.0, ans=0.1 2024-09-19 04:56:47,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=595240.0, ans=0.035 2024-09-19 04:57:14,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=595320.0, ans=0.125 2024-09-19 04:57:39,812 INFO [train.py:1198] (1/2) Epoch 33, batch 4050, loss[loss=0.2538, ctc_loss=0.1439, cr_loss=0.3885, attn_decoder_loss=0.2573, over 20212.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1178, cr_loss=0.3589, attn_decoder_loss=0.2413, over 5795729.76 frames. ], batch size: 210, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:57:44,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=595400.0, ans=0.0 2024-09-19 04:57:52,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=15.0 2024-09-19 04:57:56,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=595440.0, ans=0.2 2024-09-19 04:58:00,217 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.573e+01 9.185e+01 9.893e+01 2.518e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-19 04:58:09,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=595480.0, ans=0.125 2024-09-19 04:58:10,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-09-19 04:58:14,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.99 vs. limit=15.0 2024-09-19 04:58:19,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=595480.0, ans=0.125 2024-09-19 04:58:34,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=595520.0, ans=0.0 2024-09-19 04:58:34,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=22.5 2024-09-19 04:58:51,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-19 04:58:55,015 INFO [train.py:1198] (1/2) Epoch 33, batch 4100, loss[loss=0.2609, ctc_loss=0.1384, cr_loss=0.4011, attn_decoder_loss=0.2656, over 29513.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1181, cr_loss=0.359, attn_decoder_loss=0.2416, over 5791149.60 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:58:59,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=595600.0, ans=0.125 2024-09-19 04:59:06,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.62 vs. limit=12.0 2024-09-19 04:59:14,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=595640.0, ans=0.0 2024-09-19 04:59:17,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=595640.0, ans=0.125 2024-09-19 04:59:21,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=595640.0, ans=0.0 2024-09-19 04:59:44,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=595720.0, ans=0.125 2024-09-19 04:59:55,536 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:59:59,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=595760.0, ans=0.125 2024-09-19 05:00:09,892 INFO [train.py:1198] (1/2) Epoch 33, batch 4150, loss[loss=0.2344, ctc_loss=0.1162, cr_loss=0.3635, attn_decoder_loss=0.2394, over 29482.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1181, cr_loss=0.3597, attn_decoder_loss=0.2415, over 5796258.18 frames. ], batch size: 77, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 05:00:15,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=595800.0, ans=0.09899494936611666 2024-09-19 05:00:25,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2024-09-19 05:00:31,921 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.400e+01 8.837e+01 9.482e+01 1.626e+02, threshold=1.767e+02, percent-clipped=0.0 2024-09-19 05:00:35,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=595840.0, ans=0.09899494936611666 2024-09-19 05:00:51,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=595880.0, ans=0.125 2024-09-19 05:01:23,826 INFO [train.py:1198] (1/2) Epoch 33, batch 4200, loss[loss=0.2513, ctc_loss=0.1308, cr_loss=0.3661, attn_decoder_loss=0.2565, over 29501.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.118, cr_loss=0.3589, attn_decoder_loss=0.2418, over 5797695.93 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 05:01:30,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=596000.0, ans=0.0 2024-09-19 05:01:30,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=596000.0, ans=0.1 2024-09-19 05:01:31,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=596000.0, ans=0.125 2024-09-19 05:01:34,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=596000.0, ans=0.125 2024-09-19 05:01:49,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=596040.0, ans=0.125 2024-09-19 05:02:01,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=596080.0, ans=0.0 2024-09-19 05:02:06,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=596120.0, ans=0.125 2024-09-19 05:02:26,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=596160.0, ans=0.125 2024-09-19 05:02:38,335 INFO [train.py:1198] (1/2) Epoch 33, batch 4250, loss[loss=0.2129, ctc_loss=0.09583, cr_loss=0.3014, attn_decoder_loss=0.2192, over 29523.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1183, cr_loss=0.3592, attn_decoder_loss=0.242, over 5803103.05 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 05:02:59,962 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.601e+01 9.024e+01 9.699e+01 1.912e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-19 05:03:52,547 INFO [train.py:1198] (1/2) Epoch 33, batch 4300, loss[loss=0.2478, ctc_loss=0.1269, cr_loss=0.3935, attn_decoder_loss=0.2525, over 29548.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1182, cr_loss=0.359, attn_decoder_loss=0.2422, over 5792414.10 frames. ], batch size: 87, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 05:04:09,205 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:04:10,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=596440.0, ans=0.07 2024-09-19 05:04:28,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=596480.0, ans=0.025 2024-09-19 05:04:32,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-09-19 05:04:34,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=596480.0, ans=0.07 2024-09-19 05:04:46,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=15.0 2024-09-19 05:04:55,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=596560.0, ans=0.1 2024-09-19 05:05:02,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=596560.0, ans=0.1 2024-09-19 05:05:07,035 INFO [train.py:1198] (1/2) Epoch 33, batch 4350, loss[loss=0.2515, ctc_loss=0.1279, cr_loss=0.3784, attn_decoder_loss=0.2569, over 29478.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1206, cr_loss=0.3641, attn_decoder_loss=0.2454, over 5795534.67 frames. ], batch size: 97, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 05:05:07,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=596600.0, ans=0.0 2024-09-19 05:05:25,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=596640.0, ans=0.05 2024-09-19 05:05:29,990 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.368e+01 8.801e+01 9.131e+01 9.765e+01 2.028e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-19 05:05:34,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596640.0, ans=0.1 2024-09-19 05:06:22,392 INFO [train.py:1198] (1/2) Epoch 33, batch 4400, loss[loss=0.2598, ctc_loss=0.1455, cr_loss=0.418, attn_decoder_loss=0.2632, over 27331.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1222, cr_loss=0.3673, attn_decoder_loss=0.2476, over 5764176.27 frames. ], batch size: 124, lr: 3.32e-03, grad_scale: 16.0 2024-09-19 05:06:48,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=596840.0, ans=0.125 2024-09-19 05:07:00,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=596880.0, ans=0.0 2024-09-19 05:07:18,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.93 vs. limit=15.0 2024-09-19 05:07:35,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=597000.0, ans=0.125 2024-09-19 05:07:36,275 INFO [train.py:1198] (1/2) Epoch 33, batch 4450, loss[loss=0.2595, ctc_loss=0.1499, cr_loss=0.398, attn_decoder_loss=0.2628, over 20191.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.126, cr_loss=0.3732, attn_decoder_loss=0.2498, over 5574010.63 frames. ], batch size: 210, lr: 3.32e-03, grad_scale: 8.0 2024-09-19 05:07:47,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=597000.0, ans=0.1 2024-09-19 05:08:00,436 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.114e+01 9.208e+01 9.597e+01 1.124e+02 1.638e+02, threshold=1.919e+02, percent-clipped=0.0 2024-09-19 05:08:06,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-09-19 05:08:10,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=597080.0, ans=0.025 2024-09-19 05:08:52,055 INFO [train.py:1198] (1/2) Epoch 33, batch 4500, loss[loss=0.2562, ctc_loss=0.1461, cr_loss=0.3927, attn_decoder_loss=0.2597, over 20685.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1293, cr_loss=0.3756, attn_decoder_loss=0.2516, over 5232767.99 frames. ], batch size: 209, lr: 3.32e-03, grad_scale: 8.0 2024-09-19 05:09:02,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=597200.0, ans=0.0 2024-09-19 05:09:08,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=597240.0, ans=0.1 2024-09-19 05:09:13,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=597240.0, ans=0.1 2024-09-19 05:09:25,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=597280.0, ans=0.1 2024-09-19 05:10:21,347 INFO [train.py:1198] (1/2) Epoch 34, batch 0, loss[loss=0.22, ctc_loss=0.1063, cr_loss=0.3171, attn_decoder_loss=0.2256, over 29602.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1063, cr_loss=0.3171, attn_decoder_loss=0.2256, over 29602.00 frames. ], batch size: 73, lr: 3.27e-03, grad_scale: 16.0 2024-09-19 05:10:21,348 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 05:10:26,105 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6615, 4.5626, 4.3889, 4.1253], device='cuda:1') 2024-09-19 05:10:39,720 INFO [train.py:1230] (1/2) Epoch 34, validation: loss=0.2115, ctc_loss=0.03706, cr_loss=5.889e-15, attn_decoder_loss=0.2309, over 944034.00 frames. 2024-09-19 05:10:39,721 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 05:10:41,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=597300.0, ans=0.125 2024-09-19 05:10:52,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-19 05:11:10,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=597380.0, ans=0.1 2024-09-19 05:11:16,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=597380.0, ans=0.125 2024-09-19 05:11:28,634 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:11:28,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.67 vs. limit=22.5 2024-09-19 05:11:44,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=597460.0, ans=0.125 2024-09-19 05:11:44,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=597460.0, ans=0.125 2024-09-19 05:11:45,332 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.942e+01 9.532e+01 1.086e+02 1.158e+02 1.194e+03, threshold=2.172e+02, percent-clipped=2.0 2024-09-19 05:11:50,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=597460.0, ans=0.125 2024-09-19 05:11:56,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=597500.0, ans=0.0 2024-09-19 05:11:56,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=597500.0, ans=0.0 2024-09-19 05:11:57,343 INFO [train.py:1198] (1/2) Epoch 34, batch 50, loss[loss=0.2148, ctc_loss=0.09824, cr_loss=0.3137, attn_decoder_loss=0.2208, over 29423.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1203, cr_loss=0.3657, attn_decoder_loss=0.242, over 1268305.87 frames. ], batch size: 70, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:12:02,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=597500.0, ans=0.2 2024-09-19 05:12:05,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=597500.0, ans=0.2 2024-09-19 05:12:11,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-09-19 05:12:23,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=597540.0, ans=0.2 2024-09-19 05:12:25,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=597540.0, ans=0.0 2024-09-19 05:12:39,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=597580.0, ans=0.125 2024-09-19 05:12:47,665 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2024-09-19 05:13:04,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=597660.0, ans=0.09899494936611666 2024-09-19 05:13:04,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=597660.0, ans=0.125 2024-09-19 05:13:10,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2024-09-19 05:13:16,054 INFO [train.py:1198] (1/2) Epoch 34, batch 100, loss[loss=0.2311, ctc_loss=0.1241, cr_loss=0.3598, attn_decoder_loss=0.235, over 29529.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1217, cr_loss=0.3681, attn_decoder_loss=0.2443, over 2251362.13 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:14:16,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=597860.0, ans=0.125 2024-09-19 05:14:18,838 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.662e+01 8.574e+01 9.028e+01 9.395e+01 1.381e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-19 05:14:20,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=597860.0, ans=0.07 2024-09-19 05:14:29,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=597900.0, ans=0.125 2024-09-19 05:14:30,761 INFO [train.py:1198] (1/2) Epoch 34, batch 150, loss[loss=0.2179, ctc_loss=0.1057, cr_loss=0.3351, attn_decoder_loss=0.2229, over 29440.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1189, cr_loss=0.3613, attn_decoder_loss=0.2421, over 3046804.74 frames. ], batch size: 70, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:14:32,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=597900.0, ans=0.125 2024-09-19 05:14:33,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-09-19 05:14:53,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.02 vs. limit=10.0 2024-09-19 05:14:58,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=597940.0, ans=0.09899494936611666 2024-09-19 05:15:03,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-09-19 05:15:04,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=597980.0, ans=0.2 2024-09-19 05:15:20,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=598020.0, ans=0.125 2024-09-19 05:15:48,461 INFO [train.py:1198] (1/2) Epoch 34, batch 200, loss[loss=0.259, ctc_loss=0.1473, cr_loss=0.3996, attn_decoder_loss=0.2625, over 27286.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1186, cr_loss=0.361, attn_decoder_loss=0.2416, over 3658258.91 frames. ], batch size: 124, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:16:03,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=598140.0, ans=0.125 2024-09-19 05:16:12,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2024-09-19 05:16:15,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=598140.0, ans=0.0 2024-09-19 05:16:21,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.45 vs. limit=10.0 2024-09-19 05:16:26,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=598180.0, ans=0.09899494936611666 2024-09-19 05:16:32,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=598220.0, ans=0.0 2024-09-19 05:16:54,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.433e+01 8.957e+01 9.594e+01 1.517e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 05:17:02,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.01 vs. limit=15.0 2024-09-19 05:17:06,375 INFO [train.py:1198] (1/2) Epoch 34, batch 250, loss[loss=0.2593, ctc_loss=0.1338, cr_loss=0.3869, attn_decoder_loss=0.2646, over 29156.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1182, cr_loss=0.3599, attn_decoder_loss=0.2417, over 4141007.04 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:17:27,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=598340.0, ans=0.125 2024-09-19 05:17:29,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=598340.0, ans=6.0 2024-09-19 05:17:47,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=598380.0, ans=0.0 2024-09-19 05:17:58,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=598420.0, ans=0.125 2024-09-19 05:18:10,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=598460.0, ans=0.0 2024-09-19 05:18:12,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598460.0, ans=0.1 2024-09-19 05:18:22,533 INFO [train.py:1198] (1/2) Epoch 34, batch 300, loss[loss=0.2514, ctc_loss=0.132, cr_loss=0.3909, attn_decoder_loss=0.256, over 29544.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1176, cr_loss=0.3595, attn_decoder_loss=0.2413, over 4509506.54 frames. ], batch size: 92, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:18:40,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=598540.0, ans=0.0 2024-09-19 05:18:51,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598580.0, ans=0.1 2024-09-19 05:19:02,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.25 vs. limit=15.0 2024-09-19 05:19:08,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-09-19 05:19:16,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2024-09-19 05:19:20,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=598620.0, ans=0.125 2024-09-19 05:19:26,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.376e+01 8.844e+01 9.262e+01 3.831e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-19 05:19:40,608 INFO [train.py:1198] (1/2) Epoch 34, batch 350, loss[loss=0.2177, ctc_loss=0.1007, cr_loss=0.3216, attn_decoder_loss=0.2235, over 29302.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1181, cr_loss=0.3596, attn_decoder_loss=0.242, over 4792984.62 frames. ], batch size: 71, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:19:48,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=598700.0, ans=0.125 2024-09-19 05:19:50,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.43 vs. limit=15.0 2024-09-19 05:20:07,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=598740.0, ans=0.125 2024-09-19 05:20:13,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=598780.0, ans=0.0 2024-09-19 05:20:26,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=598820.0, ans=0.0 2024-09-19 05:20:27,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=598820.0, ans=0.1 2024-09-19 05:20:35,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=598820.0, ans=0.2 2024-09-19 05:20:37,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2024-09-19 05:20:38,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=598820.0, ans=0.2 2024-09-19 05:20:58,169 INFO [train.py:1198] (1/2) Epoch 34, batch 400, loss[loss=0.2445, ctc_loss=0.1288, cr_loss=0.376, attn_decoder_loss=0.2491, over 29674.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1177, cr_loss=0.359, attn_decoder_loss=0.2415, over 5022933.67 frames. ], batch size: 82, lr: 3.27e-03, grad_scale: 16.0 2024-09-19 05:21:01,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=598900.0, ans=0.025 2024-09-19 05:21:09,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-19 05:21:22,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=598940.0, ans=0.04949747468305833 2024-09-19 05:21:35,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598980.0, ans=0.1 2024-09-19 05:21:36,791 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:21:38,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=598980.0, ans=0.0 2024-09-19 05:21:52,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-19 05:21:58,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2024-09-19 05:22:02,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.485e+01 9.014e+01 9.585e+01 2.227e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 05:22:14,034 INFO [train.py:1198] (1/2) Epoch 34, batch 450, loss[loss=0.2454, ctc_loss=0.1242, cr_loss=0.3717, attn_decoder_loss=0.2507, over 29695.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1183, cr_loss=0.36, attn_decoder_loss=0.2418, over 5184977.31 frames. ], batch size: 83, lr: 3.27e-03, grad_scale: 16.0 2024-09-19 05:22:35,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=599140.0, ans=0.05 2024-09-19 05:22:37,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-19 05:22:41,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=599140.0, ans=0.125 2024-09-19 05:22:43,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=599180.0, ans=0.1 2024-09-19 05:23:30,233 INFO [train.py:1198] (1/2) Epoch 34, batch 500, loss[loss=0.2526, ctc_loss=0.13, cr_loss=0.3879, attn_decoder_loss=0.2577, over 29458.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1178, cr_loss=0.3589, attn_decoder_loss=0.2411, over 5327549.95 frames. ], batch size: 94, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:23:32,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=599300.0, ans=0.2 2024-09-19 05:23:46,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=599340.0, ans=0.5 2024-09-19 05:23:57,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=599340.0, ans=0.1 2024-09-19 05:23:58,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=599340.0, ans=0.025 2024-09-19 05:24:01,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=599380.0, ans=0.125 2024-09-19 05:24:09,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=599380.0, ans=0.125 2024-09-19 05:24:21,476 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:24:25,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=12.0 2024-09-19 05:24:25,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=599420.0, ans=0.125 2024-09-19 05:24:33,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=599460.0, ans=0.09899494936611666 2024-09-19 05:24:37,604 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.516e+01 9.011e+01 9.672e+01 1.492e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 05:24:50,588 INFO [train.py:1198] (1/2) Epoch 34, batch 550, loss[loss=0.2533, ctc_loss=0.1287, cr_loss=0.3732, attn_decoder_loss=0.2588, over 28877.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1179, cr_loss=0.3591, attn_decoder_loss=0.2412, over 5420914.14 frames. ], batch size: 104, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:24:51,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2024-09-19 05:25:04,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=599540.0, ans=0.0 2024-09-19 05:25:34,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=599620.0, ans=0.125 2024-09-19 05:25:52,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.67 vs. limit=15.0 2024-09-19 05:25:54,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=599660.0, ans=0.0 2024-09-19 05:26:05,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=599700.0, ans=0.0 2024-09-19 05:26:06,403 INFO [train.py:1198] (1/2) Epoch 34, batch 600, loss[loss=0.2519, ctc_loss=0.1243, cr_loss=0.3604, attn_decoder_loss=0.2581, over 29209.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.118, cr_loss=0.3598, attn_decoder_loss=0.2415, over 5507686.39 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:26:06,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=599700.0, ans=0.0 2024-09-19 05:26:09,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=599700.0, ans=0.125 2024-09-19 05:26:16,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2024-09-19 05:26:31,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=599740.0, ans=10.0 2024-09-19 05:26:32,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=599740.0, ans=0.05 2024-09-19 05:26:38,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=599780.0, ans=0.0 2024-09-19 05:26:49,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.28 vs. limit=22.5 2024-09-19 05:27:11,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.437e+01 8.830e+01 9.420e+01 2.114e+02, threshold=1.766e+02, percent-clipped=1.0 2024-09-19 05:27:14,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=599860.0, ans=0.125 2024-09-19 05:27:16,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=599860.0, ans=0.125 2024-09-19 05:27:21,949 INFO [train.py:1198] (1/2) Epoch 34, batch 650, loss[loss=0.2424, ctc_loss=0.1193, cr_loss=0.3432, attn_decoder_loss=0.2485, over 29766.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1172, cr_loss=0.3583, attn_decoder_loss=0.2409, over 5585656.74 frames. ], batch size: 81, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:27:23,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2024-09-19 05:27:34,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=599900.0, ans=0.0 2024-09-19 05:27:34,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=599900.0, ans=0.125 2024-09-19 05:27:48,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=599940.0, ans=0.0 2024-09-19 05:27:50,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=599940.0, ans=0.0 2024-09-19 05:27:59,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=599980.0, ans=0.0 2024-09-19 05:28:17,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.41 vs. limit=15.0 2024-09-19 05:28:27,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600060.0, ans=0.1 2024-09-19 05:28:41,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=600100.0, ans=0.0 2024-09-19 05:28:42,767 INFO [train.py:1198] (1/2) Epoch 34, batch 700, loss[loss=0.2257, ctc_loss=0.1092, cr_loss=0.3465, attn_decoder_loss=0.2309, over 29534.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1179, cr_loss=0.3598, attn_decoder_loss=0.2415, over 5636378.91 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:29:39,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=600220.0, ans=0.2 2024-09-19 05:29:44,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=600260.0, ans=0.125 2024-09-19 05:29:48,399 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.364e+01 8.809e+01 9.436e+01 2.463e+02, threshold=1.762e+02, percent-clipped=1.0 2024-09-19 05:29:48,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=600260.0, ans=0.2 2024-09-19 05:29:53,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=600260.0, ans=0.125 2024-09-19 05:29:59,005 INFO [train.py:1198] (1/2) Epoch 34, batch 750, loss[loss=0.2449, ctc_loss=0.122, cr_loss=0.3677, attn_decoder_loss=0.2504, over 29701.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1178, cr_loss=0.3596, attn_decoder_loss=0.2411, over 5676268.68 frames. ], batch size: 82, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:31:03,390 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.87 vs. limit=15.0 2024-09-19 05:31:05,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=600460.0, ans=0.0 2024-09-19 05:31:10,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=600460.0, ans=0.2 2024-09-19 05:31:14,730 INFO [train.py:1198] (1/2) Epoch 34, batch 800, loss[loss=0.2097, ctc_loss=0.1016, cr_loss=0.3299, attn_decoder_loss=0.2144, over 29612.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1177, cr_loss=0.3593, attn_decoder_loss=0.241, over 5707030.17 frames. ], batch size: 73, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:31:15,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=600500.0, ans=0.0 2024-09-19 05:31:26,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=15.0 2024-09-19 05:31:37,683 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:31:56,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=600580.0, ans=0.125 2024-09-19 05:32:21,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.053e+01 8.379e+01 9.063e+01 9.651e+01 1.795e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-19 05:32:23,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=600660.0, ans=0.0 2024-09-19 05:32:32,193 INFO [train.py:1198] (1/2) Epoch 34, batch 850, loss[loss=0.2381, ctc_loss=0.1112, cr_loss=0.3344, attn_decoder_loss=0.2448, over 29730.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1173, cr_loss=0.3581, attn_decoder_loss=0.2407, over 5735695.95 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:32:48,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=600740.0, ans=0.1 2024-09-19 05:32:58,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=600740.0, ans=0.125 2024-09-19 05:33:10,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=600780.0, ans=0.125 2024-09-19 05:33:13,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=600780.0, ans=0.95 2024-09-19 05:33:34,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=600860.0, ans=0.05 2024-09-19 05:33:50,508 INFO [train.py:1198] (1/2) Epoch 34, batch 900, loss[loss=0.2147, ctc_loss=0.1066, cr_loss=0.3374, attn_decoder_loss=0.2192, over 29622.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1175, cr_loss=0.3585, attn_decoder_loss=0.2408, over 5741967.86 frames. ], batch size: 73, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:34:04,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=600940.0, ans=0.0 2024-09-19 05:34:19,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=600980.0, ans=0.125 2024-09-19 05:34:31,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=600980.0, ans=0.0 2024-09-19 05:34:32,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=600980.0, ans=0.125 2024-09-19 05:34:56,767 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.479e+01 9.154e+01 9.598e+01 2.436e+02, threshold=1.831e+02, percent-clipped=2.0 2024-09-19 05:35:05,825 INFO [train.py:1198] (1/2) Epoch 34, batch 950, loss[loss=0.2155, ctc_loss=0.09974, cr_loss=0.3311, attn_decoder_loss=0.221, over 29499.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1177, cr_loss=0.3589, attn_decoder_loss=0.2411, over 5744373.69 frames. ], batch size: 74, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:35:21,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=601140.0, ans=0.125 2024-09-19 05:35:26,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-09-19 05:35:40,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=601180.0, ans=0.05 2024-09-19 05:35:45,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2024-09-19 05:35:48,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=601180.0, ans=6.0 2024-09-19 05:36:01,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601220.0, ans=0.1 2024-09-19 05:36:04,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601220.0, ans=0.1 2024-09-19 05:36:24,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=601300.0, ans=0.125 2024-09-19 05:36:26,135 INFO [train.py:1198] (1/2) Epoch 34, batch 1000, loss[loss=0.2243, ctc_loss=0.1051, cr_loss=0.3403, attn_decoder_loss=0.2299, over 29509.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1182, cr_loss=0.3601, attn_decoder_loss=0.2418, over 5738458.29 frames. ], batch size: 77, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:36:32,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=601300.0, ans=0.0 2024-09-19 05:36:41,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=601340.0, ans=0.0 2024-09-19 05:36:42,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=601340.0, ans=0.125 2024-09-19 05:37:04,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=601380.0, ans=0.125 2024-09-19 05:37:08,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=601380.0, ans=0.125 2024-09-19 05:37:13,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=601420.0, ans=0.09899494936611666 2024-09-19 05:37:14,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=601420.0, ans=0.0 2024-09-19 05:37:17,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=601420.0, ans=0.125 2024-09-19 05:37:26,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601460.0, ans=0.125 2024-09-19 05:37:32,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.521e+01 9.169e+01 9.649e+01 1.531e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-19 05:37:35,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=601460.0, ans=0.125 2024-09-19 05:37:41,620 INFO [train.py:1198] (1/2) Epoch 34, batch 1050, loss[loss=0.2433, ctc_loss=0.1185, cr_loss=0.3637, attn_decoder_loss=0.249, over 29672.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1181, cr_loss=0.3602, attn_decoder_loss=0.241, over 5746146.65 frames. ], batch size: 85, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:38:10,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601580.0, ans=0.1 2024-09-19 05:38:20,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=601580.0, ans=0.125 2024-09-19 05:38:35,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=601620.0, ans=0.05 2024-09-19 05:38:51,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.54 vs. limit=10.0 2024-09-19 05:38:58,092 INFO [train.py:1198] (1/2) Epoch 34, batch 1100, loss[loss=0.2355, ctc_loss=0.1167, cr_loss=0.3485, attn_decoder_loss=0.2409, over 29482.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1181, cr_loss=0.3598, attn_decoder_loss=0.241, over 5757897.53 frames. ], batch size: 78, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:39:07,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.57 vs. limit=22.5 2024-09-19 05:39:11,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=601740.0, ans=0.1 2024-09-19 05:39:21,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=601740.0, ans=0.125 2024-09-19 05:39:24,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=601740.0, ans=0.07 2024-09-19 05:39:36,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=601780.0, ans=0.125 2024-09-19 05:39:39,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=601780.0, ans=0.0 2024-09-19 05:39:43,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=601780.0, ans=0.025 2024-09-19 05:40:04,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=601860.0, ans=0.125 2024-09-19 05:40:06,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.458e+01 9.005e+01 9.723e+01 2.492e+02, threshold=1.801e+02, percent-clipped=1.0 2024-09-19 05:40:18,194 INFO [train.py:1198] (1/2) Epoch 34, batch 1150, loss[loss=0.2359, ctc_loss=0.119, cr_loss=0.373, attn_decoder_loss=0.2406, over 29452.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1181, cr_loss=0.3598, attn_decoder_loss=0.2412, over 5756226.25 frames. ], batch size: 78, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:40:22,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.26 vs. limit=22.5 2024-09-19 05:40:36,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=601940.0, ans=0.2 2024-09-19 05:40:53,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=601980.0, ans=0.125 2024-09-19 05:40:59,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=601980.0, ans=0.125 2024-09-19 05:41:08,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=22.5 2024-09-19 05:41:21,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=602060.0, ans=0.125 2024-09-19 05:41:33,777 INFO [train.py:1198] (1/2) Epoch 34, batch 1200, loss[loss=0.2495, ctc_loss=0.1279, cr_loss=0.3911, attn_decoder_loss=0.2543, over 29685.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1185, cr_loss=0.3612, attn_decoder_loss=0.2416, over 5748633.20 frames. ], batch size: 85, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:41:44,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=602100.0, ans=0.07 2024-09-19 05:42:09,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=602180.0, ans=0.125 2024-09-19 05:42:12,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602180.0, ans=0.1 2024-09-19 05:42:12,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=602180.0, ans=0.125 2024-09-19 05:42:41,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=602260.0, ans=0.0 2024-09-19 05:42:42,361 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.575e+01 9.202e+01 9.867e+01 2.398e+02, threshold=1.840e+02, percent-clipped=1.0 2024-09-19 05:42:49,890 INFO [train.py:1198] (1/2) Epoch 34, batch 1250, loss[loss=0.2523, ctc_loss=0.1318, cr_loss=0.4001, attn_decoder_loss=0.2568, over 29532.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1188, cr_loss=0.3623, attn_decoder_loss=0.2422, over 5775813.13 frames. ], batch size: 92, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:43:43,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602420.0, ans=0.1 2024-09-19 05:44:10,634 INFO [train.py:1198] (1/2) Epoch 34, batch 1300, loss[loss=0.2369, ctc_loss=0.1102, cr_loss=0.3309, attn_decoder_loss=0.2436, over 28228.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1183, cr_loss=0.3609, attn_decoder_loss=0.2418, over 5779305.66 frames. ], batch size: 111, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:44:10,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=602500.0, ans=0.125 2024-09-19 05:44:17,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=22.5 2024-09-19 05:44:18,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=602500.0, ans=0.0 2024-09-19 05:44:24,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=602540.0, ans=0.0 2024-09-19 05:44:26,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-09-19 05:44:47,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=602580.0, ans=0.07 2024-09-19 05:44:57,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=602620.0, ans=0.125 2024-09-19 05:45:01,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.72 vs. limit=15.0 2024-09-19 05:45:09,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=602660.0, ans=0.125 2024-09-19 05:45:11,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=602660.0, ans=0.0 2024-09-19 05:45:18,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.383e+01 8.885e+01 9.572e+01 2.098e+02, threshold=1.777e+02, percent-clipped=1.0 2024-09-19 05:45:19,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=602660.0, ans=0.125 2024-09-19 05:45:19,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.10 vs. limit=15.0 2024-09-19 05:45:26,507 INFO [train.py:1198] (1/2) Epoch 34, batch 1350, loss[loss=0.2312, ctc_loss=0.1096, cr_loss=0.34, attn_decoder_loss=0.2372, over 29740.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1176, cr_loss=0.3596, attn_decoder_loss=0.2414, over 5795621.37 frames. ], batch size: 81, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:45:26,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=602700.0, ans=0.125 2024-09-19 05:45:32,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=602700.0, ans=0.125 2024-09-19 05:45:38,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=602700.0, ans=0.125 2024-09-19 05:45:43,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602740.0, ans=0.1 2024-09-19 05:45:47,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=602740.0, ans=0.125 2024-09-19 05:45:58,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-09-19 05:46:04,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602780.0, ans=0.1 2024-09-19 05:46:26,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=602860.0, ans=0.1 2024-09-19 05:46:37,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=602860.0, ans=0.0 2024-09-19 05:46:41,836 INFO [train.py:1198] (1/2) Epoch 34, batch 1400, loss[loss=0.2171, ctc_loss=0.1063, cr_loss=0.3388, attn_decoder_loss=0.2219, over 29579.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1179, cr_loss=0.3604, attn_decoder_loss=0.2415, over 5806934.97 frames. ], batch size: 69, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:46:46,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=602900.0, ans=0.0 2024-09-19 05:47:00,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=602940.0, ans=0.0 2024-09-19 05:47:00,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.83 vs. limit=15.0 2024-09-19 05:47:35,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=603020.0, ans=0.0 2024-09-19 05:47:49,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=603060.0, ans=0.0 2024-09-19 05:47:51,984 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.415e+01 9.038e+01 9.472e+01 1.467e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 05:47:59,661 INFO [train.py:1198] (1/2) Epoch 34, batch 1450, loss[loss=0.2576, ctc_loss=0.1324, cr_loss=0.3831, attn_decoder_loss=0.263, over 29441.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1188, cr_loss=0.3617, attn_decoder_loss=0.2423, over 5804909.11 frames. ], batch size: 94, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:48:11,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=603100.0, ans=0.125 2024-09-19 05:48:17,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=603140.0, ans=0.04949747468305833 2024-09-19 05:48:17,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=603140.0, ans=0.0 2024-09-19 05:48:31,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2024-09-19 05:48:35,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=603180.0, ans=0.2 2024-09-19 05:48:40,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-19 05:48:42,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=12.0 2024-09-19 05:48:43,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=603180.0, ans=0.0 2024-09-19 05:48:46,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.92 vs. limit=15.0 2024-09-19 05:48:50,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=603220.0, ans=0.125 2024-09-19 05:48:58,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.43 vs. limit=15.0 2024-09-19 05:49:03,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-09-19 05:49:04,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=603260.0, ans=0.04949747468305833 2024-09-19 05:49:13,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=603260.0, ans=0.2 2024-09-19 05:49:17,714 INFO [train.py:1198] (1/2) Epoch 34, batch 1500, loss[loss=0.2546, ctc_loss=0.1284, cr_loss=0.3906, attn_decoder_loss=0.2599, over 29628.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1185, cr_loss=0.3615, attn_decoder_loss=0.2424, over 5805345.54 frames. ], batch size: 86, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:49:27,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=603300.0, ans=0.125 2024-09-19 05:49:50,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603380.0, ans=0.1 2024-09-19 05:49:59,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=603380.0, ans=0.05 2024-09-19 05:50:23,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=603460.0, ans=0.07 2024-09-19 05:50:26,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.633e+01 8.541e+01 9.102e+01 9.733e+01 3.230e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-19 05:50:34,071 INFO [train.py:1198] (1/2) Epoch 34, batch 1550, loss[loss=0.2566, ctc_loss=0.1401, cr_loss=0.413, attn_decoder_loss=0.2603, over 29514.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.119, cr_loss=0.362, attn_decoder_loss=0.2426, over 5781548.46 frames. ], batch size: 90, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:51:12,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=603580.0, ans=0.0 2024-09-19 05:51:30,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-09-19 05:51:37,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-19 05:51:39,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=603660.0, ans=0.0 2024-09-19 05:51:53,947 INFO [train.py:1198] (1/2) Epoch 34, batch 1600, loss[loss=0.2469, ctc_loss=0.1222, cr_loss=0.3653, attn_decoder_loss=0.2527, over 29675.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1186, cr_loss=0.3609, attn_decoder_loss=0.242, over 5763839.07 frames. ], batch size: 85, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:51:54,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=603700.0, ans=0.5 2024-09-19 05:52:04,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=603700.0, ans=0.125 2024-09-19 05:53:01,945 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.541e+01 8.929e+01 9.524e+01 1.976e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-19 05:53:09,385 INFO [train.py:1198] (1/2) Epoch 34, batch 1650, loss[loss=0.2363, ctc_loss=0.1126, cr_loss=0.3552, attn_decoder_loss=0.2422, over 29691.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.118, cr_loss=0.3599, attn_decoder_loss=0.2415, over 5759327.51 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:53:19,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=603900.0, ans=0.125 2024-09-19 05:53:21,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=603900.0, ans=0.125 2024-09-19 05:53:22,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2024-09-19 05:53:36,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.37 vs. limit=12.0 2024-09-19 05:53:42,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.94 vs. limit=22.5 2024-09-19 05:53:53,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=603980.0, ans=0.09899494936611666 2024-09-19 05:53:53,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=603980.0, ans=0.125 2024-09-19 05:54:00,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=604020.0, ans=0.1 2024-09-19 05:54:03,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=604020.0, ans=0.125 2024-09-19 05:54:14,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=604060.0, ans=0.125 2024-09-19 05:54:20,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=604060.0, ans=0.2 2024-09-19 05:54:22,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=604060.0, ans=0.125 2024-09-19 05:54:25,724 INFO [train.py:1198] (1/2) Epoch 34, batch 1700, loss[loss=0.2117, ctc_loss=0.09915, cr_loss=0.3268, attn_decoder_loss=0.2169, over 29566.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1175, cr_loss=0.359, attn_decoder_loss=0.2412, over 5781261.16 frames. ], batch size: 69, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:54:27,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=604100.0, ans=0.0 2024-09-19 05:55:35,943 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 8.515e+01 9.078e+01 9.556e+01 1.170e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 05:55:42,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=604260.0, ans=0.1 2024-09-19 05:55:45,691 INFO [train.py:1198] (1/2) Epoch 34, batch 1750, loss[loss=0.2022, ctc_loss=0.0873, cr_loss=0.2756, attn_decoder_loss=0.2088, over 29357.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1175, cr_loss=0.3591, attn_decoder_loss=0.2411, over 5788194.93 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:55:50,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=604300.0, ans=0.125 2024-09-19 05:55:54,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.63 vs. limit=15.0 2024-09-19 05:55:54,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.16 vs. limit=10.0 2024-09-19 05:55:58,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=604300.0, ans=0.125 2024-09-19 05:56:11,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=604340.0, ans=0.0 2024-09-19 05:56:51,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.94 vs. limit=12.0 2024-09-19 05:57:00,873 INFO [train.py:1198] (1/2) Epoch 34, batch 1800, loss[loss=0.2429, ctc_loss=0.1268, cr_loss=0.3829, attn_decoder_loss=0.2473, over 29705.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1176, cr_loss=0.359, attn_decoder_loss=0.2412, over 5790577.32 frames. ], batch size: 83, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:57:16,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=604540.0, ans=0.07 2024-09-19 05:57:20,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=604540.0, ans=0.025 2024-09-19 05:57:45,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=604620.0, ans=0.2 2024-09-19 05:57:48,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=604620.0, ans=0.2 2024-09-19 05:57:50,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=604620.0, ans=0.1 2024-09-19 05:57:52,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.01 vs. limit=10.0 2024-09-19 05:57:58,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.04 vs. limit=15.0 2024-09-19 05:58:03,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=604660.0, ans=0.0 2024-09-19 05:58:09,165 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.453e+01 8.879e+01 9.546e+01 1.316e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-19 05:58:12,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=604660.0, ans=0.1 2024-09-19 05:58:15,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=604700.0, ans=0.1 2024-09-19 05:58:16,923 INFO [train.py:1198] (1/2) Epoch 34, batch 1850, loss[loss=0.2354, ctc_loss=0.1119, cr_loss=0.3493, attn_decoder_loss=0.2413, over 29621.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1171, cr_loss=0.3582, attn_decoder_loss=0.2407, over 5797406.55 frames. ], batch size: 86, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:58:35,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=604740.0, ans=0.125 2024-09-19 05:58:41,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=604740.0, ans=0.125 2024-09-19 05:58:53,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=604780.0, ans=0.025 2024-09-19 05:59:13,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=604820.0, ans=0.0 2024-09-19 05:59:24,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=604860.0, ans=0.1 2024-09-19 05:59:36,986 INFO [train.py:1198] (1/2) Epoch 34, batch 1900, loss[loss=0.2385, ctc_loss=0.1118, cr_loss=0.3353, attn_decoder_loss=0.2451, over 29710.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1172, cr_loss=0.3586, attn_decoder_loss=0.2415, over 5805278.36 frames. ], batch size: 89, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:59:42,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.02 vs. limit=22.5 2024-09-19 05:59:58,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=604940.0, ans=0.125 2024-09-19 06:00:03,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.22 vs. limit=15.0 2024-09-19 06:00:04,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=604940.0, ans=0.0 2024-09-19 06:00:24,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=605020.0, ans=0.125 2024-09-19 06:00:24,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=605020.0, ans=0.1 2024-09-19 06:00:24,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=605020.0, ans=0.125 2024-09-19 06:00:26,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.19 vs. limit=22.5 2024-09-19 06:00:33,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605020.0, ans=0.1 2024-09-19 06:00:42,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=605060.0, ans=0.125 2024-09-19 06:00:44,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.35 vs. limit=15.0 2024-09-19 06:00:46,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.799e+01 9.191e+01 9.672e+01 1.531e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-19 06:00:48,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=605060.0, ans=0.125 2024-09-19 06:00:52,900 INFO [train.py:1198] (1/2) Epoch 34, batch 1950, loss[loss=0.2294, ctc_loss=0.1098, cr_loss=0.339, attn_decoder_loss=0.2351, over 29443.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.118, cr_loss=0.3601, attn_decoder_loss=0.2425, over 5819860.62 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:01:02,486 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:01:09,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=605140.0, ans=0.0 2024-09-19 06:01:20,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-09-19 06:01:46,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=605220.0, ans=0.125 2024-09-19 06:01:48,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=605220.0, ans=0.125 2024-09-19 06:02:08,568 INFO [train.py:1198] (1/2) Epoch 34, batch 2000, loss[loss=0.2149, ctc_loss=0.1006, cr_loss=0.3282, attn_decoder_loss=0.2203, over 29345.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1186, cr_loss=0.3617, attn_decoder_loss=0.2431, over 5798558.73 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:02:24,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605340.0, ans=0.1 2024-09-19 06:02:24,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=605340.0, ans=0.125 2024-09-19 06:02:51,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=605380.0, ans=0.0 2024-09-19 06:02:59,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=22.5 2024-09-19 06:03:16,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=605460.0, ans=0.0 2024-09-19 06:03:17,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=605460.0, ans=10.0 2024-09-19 06:03:22,869 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.535e+01 9.098e+01 9.559e+01 2.375e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-19 06:03:28,940 INFO [train.py:1198] (1/2) Epoch 34, batch 2050, loss[loss=0.2126, ctc_loss=0.0959, cr_loss=0.3217, attn_decoder_loss=0.2185, over 29412.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1177, cr_loss=0.3598, attn_decoder_loss=0.2419, over 5790231.70 frames. ], batch size: 70, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:03:39,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605500.0, ans=0.1 2024-09-19 06:03:42,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=605540.0, ans=0.125 2024-09-19 06:03:55,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2024-09-19 06:03:58,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2024-09-19 06:04:00,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=605580.0, ans=0.125 2024-09-19 06:04:02,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.64 vs. limit=15.0 2024-09-19 06:04:37,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=605660.0, ans=0.0 2024-09-19 06:04:44,661 INFO [train.py:1198] (1/2) Epoch 34, batch 2100, loss[loss=0.2248, ctc_loss=0.1038, cr_loss=0.3289, attn_decoder_loss=0.2309, over 29759.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1174, cr_loss=0.3595, attn_decoder_loss=0.2413, over 5802078.71 frames. ], batch size: 81, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:04:45,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=605700.0, ans=15.0 2024-09-19 06:04:49,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=605700.0, ans=0.025 2024-09-19 06:04:52,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=605700.0, ans=0.125 2024-09-19 06:05:07,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=605740.0, ans=0.0 2024-09-19 06:05:07,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=605740.0, ans=0.2 2024-09-19 06:05:20,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605780.0, ans=0.1 2024-09-19 06:05:53,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.782e+01 8.705e+01 9.050e+01 9.610e+01 1.138e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 06:05:59,956 INFO [train.py:1198] (1/2) Epoch 34, batch 2150, loss[loss=0.2355, ctc_loss=0.1168, cr_loss=0.3579, attn_decoder_loss=0.2408, over 29452.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1171, cr_loss=0.3588, attn_decoder_loss=0.241, over 5817073.37 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:06:09,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=605900.0, ans=0.0 2024-09-19 06:06:38,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=605980.0, ans=0.125 2024-09-19 06:06:39,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=605980.0, ans=0.125 2024-09-19 06:06:46,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=606020.0, ans=12.0 2024-09-19 06:07:14,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=606100.0, ans=0.125 2024-09-19 06:07:17,779 INFO [train.py:1198] (1/2) Epoch 34, batch 2200, loss[loss=0.2487, ctc_loss=0.1275, cr_loss=0.3864, attn_decoder_loss=0.2536, over 29622.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1172, cr_loss=0.3588, attn_decoder_loss=0.2411, over 5811927.04 frames. ], batch size: 86, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:07:24,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=606100.0, ans=0.0 2024-09-19 06:07:27,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2024-09-19 06:07:46,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=606140.0, ans=0.125 2024-09-19 06:08:11,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=606220.0, ans=0.0 2024-09-19 06:08:22,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=606260.0, ans=0.125 2024-09-19 06:08:31,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.748e+01 9.126e+01 9.549e+01 2.332e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-19 06:08:31,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=606260.0, ans=0.125 2024-09-19 06:08:31,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=606260.0, ans=0.125 2024-09-19 06:08:32,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=606260.0, ans=0.0 2024-09-19 06:08:35,725 INFO [train.py:1198] (1/2) Epoch 34, batch 2250, loss[loss=0.2238, ctc_loss=0.102, cr_loss=0.2986, attn_decoder_loss=0.2306, over 29696.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1173, cr_loss=0.3587, attn_decoder_loss=0.2412, over 5811276.55 frames. ], batch size: 82, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:08:43,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=606300.0, ans=0.0 2024-09-19 06:08:52,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=606340.0, ans=0.125 2024-09-19 06:08:57,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=606340.0, ans=0.125 2024-09-19 06:08:58,866 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:09:00,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-09-19 06:09:20,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-09-19 06:09:30,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=606420.0, ans=0.2 2024-09-19 06:09:36,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=606460.0, ans=0.125 2024-09-19 06:09:51,729 INFO [train.py:1198] (1/2) Epoch 34, batch 2300, loss[loss=0.2126, ctc_loss=0.1013, cr_loss=0.319, attn_decoder_loss=0.2179, over 29723.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.117, cr_loss=0.3582, attn_decoder_loss=0.2405, over 5800237.04 frames. ], batch size: 72, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:09:52,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.38 vs. limit=12.0 2024-09-19 06:09:59,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=606500.0, ans=0.0 2024-09-19 06:10:04,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=606500.0, ans=0.2 2024-09-19 06:10:08,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=606540.0, ans=0.125 2024-09-19 06:10:11,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=606540.0, ans=0.125 2024-09-19 06:10:25,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-09-19 06:10:43,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=606620.0, ans=0.0 2024-09-19 06:10:52,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=606660.0, ans=0.95 2024-09-19 06:11:00,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=606660.0, ans=0.0 2024-09-19 06:11:02,987 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.476e+01 8.516e+01 9.072e+01 9.584e+01 2.753e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-19 06:11:07,544 INFO [train.py:1198] (1/2) Epoch 34, batch 2350, loss[loss=0.25, ctc_loss=0.1258, cr_loss=0.3879, attn_decoder_loss=0.2551, over 29684.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1173, cr_loss=0.3589, attn_decoder_loss=0.2407, over 5805801.03 frames. ], batch size: 83, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:11:09,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=606700.0, ans=0.025 2024-09-19 06:11:31,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=606740.0, ans=0.1 2024-09-19 06:12:11,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=606860.0, ans=0.5 2024-09-19 06:12:26,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=606900.0, ans=0.0 2024-09-19 06:12:27,793 INFO [train.py:1198] (1/2) Epoch 34, batch 2400, loss[loss=0.2339, ctc_loss=0.1245, cr_loss=0.3849, attn_decoder_loss=0.2375, over 29539.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1176, cr_loss=0.3591, attn_decoder_loss=0.241, over 5809341.42 frames. ], batch size: 76, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:12:28,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=606900.0, ans=0.125 2024-09-19 06:12:35,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=606900.0, ans=0.2 2024-09-19 06:12:39,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.69 vs. limit=15.0 2024-09-19 06:12:44,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=606940.0, ans=0.0 2024-09-19 06:12:47,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=606940.0, ans=10.0 2024-09-19 06:13:08,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=606980.0, ans=0.125 2024-09-19 06:13:29,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=607060.0, ans=0.125 2024-09-19 06:13:32,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.16 vs. limit=15.0 2024-09-19 06:13:40,175 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.501e+01 8.985e+01 9.485e+01 2.487e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 06:13:43,270 INFO [train.py:1198] (1/2) Epoch 34, batch 2450, loss[loss=0.2517, ctc_loss=0.1303, cr_loss=0.3927, attn_decoder_loss=0.2564, over 29673.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1183, cr_loss=0.3608, attn_decoder_loss=0.2421, over 5785150.45 frames. ], batch size: 82, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:13:46,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=607100.0, ans=0.1 2024-09-19 06:13:51,648 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.99 vs. limit=12.0 2024-09-19 06:14:04,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=607140.0, ans=0.0 2024-09-19 06:14:07,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=15.0 2024-09-19 06:14:07,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=607140.0, ans=0.1 2024-09-19 06:14:10,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=607140.0, ans=0.05 2024-09-19 06:14:20,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-09-19 06:14:32,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=607220.0, ans=0.025 2024-09-19 06:14:59,118 INFO [train.py:1198] (1/2) Epoch 34, batch 2500, loss[loss=0.2452, ctc_loss=0.1172, cr_loss=0.3602, attn_decoder_loss=0.2514, over 29636.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1182, cr_loss=0.3609, attn_decoder_loss=0.2419, over 5796011.38 frames. ], batch size: 86, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:15:04,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.04 vs. limit=10.0 2024-09-19 06:15:15,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607300.0, ans=0.1 2024-09-19 06:15:21,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=12.0 2024-09-19 06:15:21,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607340.0, ans=0.1 2024-09-19 06:15:28,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=607340.0, ans=0.0 2024-09-19 06:15:47,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-19 06:15:54,137 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:16:01,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=607420.0, ans=0.125 2024-09-19 06:16:16,475 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.449e+01 8.900e+01 9.375e+01 2.079e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-19 06:16:19,648 INFO [train.py:1198] (1/2) Epoch 34, batch 2550, loss[loss=0.212, ctc_loss=0.1018, cr_loss=0.3312, attn_decoder_loss=0.2169, over 29350.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1181, cr_loss=0.3608, attn_decoder_loss=0.2419, over 5798785.02 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:16:20,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=607500.0, ans=6.0 2024-09-19 06:16:24,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=607500.0, ans=0.125 2024-09-19 06:16:34,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=607540.0, ans=0.125 2024-09-19 06:17:14,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=607620.0, ans=0.125 2024-09-19 06:17:25,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2024-09-19 06:17:35,395 INFO [train.py:1198] (1/2) Epoch 34, batch 2600, loss[loss=0.2286, ctc_loss=0.1144, cr_loss=0.3477, attn_decoder_loss=0.2335, over 29453.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.118, cr_loss=0.3605, attn_decoder_loss=0.2421, over 5794666.82 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:17:39,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2024-09-19 06:17:42,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.69 vs. limit=10.0 2024-09-19 06:18:20,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=607820.0, ans=0.125 2024-09-19 06:18:31,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=607820.0, ans=0.1 2024-09-19 06:18:47,315 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.717e+01 9.275e+01 9.753e+01 1.560e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 06:18:50,257 INFO [train.py:1198] (1/2) Epoch 34, batch 2650, loss[loss=0.2515, ctc_loss=0.1287, cr_loss=0.3951, attn_decoder_loss=0.2563, over 29229.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1182, cr_loss=0.3607, attn_decoder_loss=0.2423, over 5801812.20 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:18:54,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.85 vs. limit=22.5 2024-09-19 06:19:28,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=607980.0, ans=0.125 2024-09-19 06:19:40,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=607980.0, ans=0.5 2024-09-19 06:19:41,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=22.5 2024-09-19 06:19:41,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=607980.0, ans=0.125 2024-09-19 06:19:46,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=608020.0, ans=0.2 2024-09-19 06:19:46,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.76 vs. limit=12.0 2024-09-19 06:20:06,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-19 06:20:17,683 INFO [train.py:1198] (1/2) Epoch 34, batch 2700, loss[loss=0.2498, ctc_loss=0.1305, cr_loss=0.392, attn_decoder_loss=0.2543, over 29526.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1183, cr_loss=0.3607, attn_decoder_loss=0.2424, over 5797873.74 frames. ], batch size: 87, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:20:25,567 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:20:26,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-09-19 06:20:44,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-19 06:21:10,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2024-09-19 06:21:30,495 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.546e+01 9.039e+01 9.900e+01 1.946e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 06:21:33,579 INFO [train.py:1198] (1/2) Epoch 34, batch 2750, loss[loss=0.2201, ctc_loss=0.1038, cr_loss=0.3311, attn_decoder_loss=0.2257, over 29525.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1177, cr_loss=0.3598, attn_decoder_loss=0.2413, over 5795434.65 frames. ], batch size: 75, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:21:40,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=608300.0, ans=0.07 2024-09-19 06:21:41,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=608300.0, ans=0.125 2024-09-19 06:21:53,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608340.0, ans=0.1 2024-09-19 06:22:20,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608420.0, ans=0.1 2024-09-19 06:22:40,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=608460.0, ans=0.025 2024-09-19 06:22:40,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=608460.0, ans=0.0 2024-09-19 06:22:42,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=608460.0, ans=0.0 2024-09-19 06:22:51,651 INFO [train.py:1198] (1/2) Epoch 34, batch 2800, loss[loss=0.2465, ctc_loss=0.1421, cr_loss=0.3779, attn_decoder_loss=0.2498, over 20202.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1178, cr_loss=0.3599, attn_decoder_loss=0.2411, over 5775026.47 frames. ], batch size: 213, lr: 3.24e-03, grad_scale: 16.0 2024-09-19 06:22:56,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.27 vs. limit=22.5 2024-09-19 06:23:00,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=608500.0, ans=0.0 2024-09-19 06:23:10,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-19 06:23:11,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-09-19 06:23:13,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=608540.0, ans=0.1 2024-09-19 06:23:16,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=608540.0, ans=0.125 2024-09-19 06:23:16,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=608540.0, ans=0.125 2024-09-19 06:23:17,932 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:23:29,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.50 vs. limit=22.5 2024-09-19 06:23:45,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.50 vs. limit=15.0 2024-09-19 06:23:46,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=608620.0, ans=0.125 2024-09-19 06:23:52,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=608660.0, ans=0.125 2024-09-19 06:24:07,383 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.476e+01 9.039e+01 9.642e+01 3.312e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 06:24:08,985 INFO [train.py:1198] (1/2) Epoch 34, batch 2850, loss[loss=0.2292, ctc_loss=0.1207, cr_loss=0.3719, attn_decoder_loss=0.233, over 29503.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1185, cr_loss=0.3608, attn_decoder_loss=0.2417, over 5761110.73 frames. ], batch size: 77, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:24:19,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=608700.0, ans=0.125 2024-09-19 06:24:44,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=22.5 2024-09-19 06:24:47,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608780.0, ans=0.1 2024-09-19 06:24:48,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.88 vs. limit=15.0 2024-09-19 06:24:53,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=608820.0, ans=0.2 2024-09-19 06:25:00,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=608820.0, ans=0.0 2024-09-19 06:25:01,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=608820.0, ans=0.1 2024-09-19 06:25:10,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=608860.0, ans=0.0 2024-09-19 06:25:25,011 INFO [train.py:1198] (1/2) Epoch 34, batch 2900, loss[loss=0.2308, ctc_loss=0.1163, cr_loss=0.3621, attn_decoder_loss=0.2355, over 29438.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1189, cr_loss=0.362, attn_decoder_loss=0.2427, over 5786906.07 frames. ], batch size: 79, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:25:37,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=608900.0, ans=0.125 2024-09-19 06:25:43,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=608940.0, ans=0.125 2024-09-19 06:26:26,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2024-09-19 06:26:41,352 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.332e+01 8.811e+01 9.212e+01 1.381e+02, threshold=1.762e+02, percent-clipped=0.0 2024-09-19 06:26:42,921 INFO [train.py:1198] (1/2) Epoch 34, batch 2950, loss[loss=0.2214, ctc_loss=0.1147, cr_loss=0.3523, attn_decoder_loss=0.2255, over 29509.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1179, cr_loss=0.3598, attn_decoder_loss=0.2414, over 5781159.45 frames. ], batch size: 75, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:27:05,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.84 vs. limit=15.0 2024-09-19 06:27:20,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=609180.0, ans=0.1 2024-09-19 06:27:49,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=609260.0, ans=0.0 2024-09-19 06:27:54,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=609260.0, ans=0.125 2024-09-19 06:27:56,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=609260.0, ans=0.0 2024-09-19 06:28:00,848 INFO [train.py:1198] (1/2) Epoch 34, batch 3000, loss[loss=0.2371, ctc_loss=0.1175, cr_loss=0.3604, attn_decoder_loss=0.2424, over 29790.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1174, cr_loss=0.3585, attn_decoder_loss=0.241, over 5782931.43 frames. ], batch size: 81, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:28:00,848 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 06:28:13,099 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9958, 5.1169, 4.7967, 2.7444], device='cuda:1') 2024-09-19 06:28:19,437 INFO [train.py:1230] (1/2) Epoch 34, validation: loss=0.2118, ctc_loss=0.03645, cr_loss=6.088e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-19 06:28:19,437 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 06:28:25,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=609300.0, ans=0.1 2024-09-19 06:29:06,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2024-09-19 06:29:18,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=609460.0, ans=0.2 2024-09-19 06:29:30,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=609460.0, ans=0.0 2024-09-19 06:29:33,593 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.609e+01 9.134e+01 9.597e+01 3.076e+02, threshold=1.827e+02, percent-clipped=2.0 2024-09-19 06:29:35,187 INFO [train.py:1198] (1/2) Epoch 34, batch 3050, loss[loss=0.222, ctc_loss=0.1109, cr_loss=0.3474, attn_decoder_loss=0.2267, over 29503.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1178, cr_loss=0.3596, attn_decoder_loss=0.2417, over 5776221.85 frames. ], batch size: 76, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:29:41,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=609500.0, ans=0.125 2024-09-19 06:29:43,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=609500.0, ans=0.125 2024-09-19 06:29:56,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=609540.0, ans=0.1 2024-09-19 06:29:56,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=609540.0, ans=0.1 2024-09-19 06:29:57,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2024-09-19 06:30:05,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=609580.0, ans=0.125 2024-09-19 06:30:11,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=609580.0, ans=0.1 2024-09-19 06:30:15,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=609580.0, ans=0.125 2024-09-19 06:30:24,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=609620.0, ans=0.2 2024-09-19 06:30:54,915 INFO [train.py:1198] (1/2) Epoch 34, batch 3100, loss[loss=0.2542, ctc_loss=0.13, cr_loss=0.4039, attn_decoder_loss=0.2591, over 29266.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1174, cr_loss=0.359, attn_decoder_loss=0.2414, over 5775448.83 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:31:08,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=609740.0, ans=0.0 2024-09-19 06:31:26,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=609780.0, ans=0.125 2024-09-19 06:31:40,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-09-19 06:31:49,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=609820.0, ans=0.125 2024-09-19 06:32:03,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=609860.0, ans=0.125 2024-09-19 06:32:09,148 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 8.517e+01 8.968e+01 9.546e+01 1.931e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-19 06:32:10,677 INFO [train.py:1198] (1/2) Epoch 34, batch 3150, loss[loss=0.2494, ctc_loss=0.1261, cr_loss=0.3813, attn_decoder_loss=0.2546, over 28814.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1177, cr_loss=0.3595, attn_decoder_loss=0.2416, over 5782628.97 frames. ], batch size: 104, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:32:22,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=609900.0, ans=0.125 2024-09-19 06:32:30,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=609940.0, ans=0.1 2024-09-19 06:32:36,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=609940.0, ans=0.0 2024-09-19 06:32:36,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=609940.0, ans=0.125 2024-09-19 06:33:07,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=610020.0, ans=0.2 2024-09-19 06:33:25,797 INFO [train.py:1198] (1/2) Epoch 34, batch 3200, loss[loss=0.2352, ctc_loss=0.1215, cr_loss=0.3704, attn_decoder_loss=0.2396, over 29413.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1171, cr_loss=0.3583, attn_decoder_loss=0.241, over 5793451.65 frames. ], batch size: 79, lr: 3.24e-03, grad_scale: 16.0 2024-09-19 06:33:26,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=610100.0, ans=0.2 2024-09-19 06:33:32,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=610100.0, ans=0.125 2024-09-19 06:33:35,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=610100.0, ans=0.0 2024-09-19 06:33:55,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=610180.0, ans=0.125 2024-09-19 06:33:56,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=610180.0, ans=0.125 2024-09-19 06:34:02,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=610180.0, ans=0.125 2024-09-19 06:34:08,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=610180.0, ans=0.1 2024-09-19 06:34:27,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=610260.0, ans=0.125 2024-09-19 06:34:32,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=610260.0, ans=0.0 2024-09-19 06:34:42,402 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.499e+01 9.052e+01 9.605e+01 1.287e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 06:34:43,878 INFO [train.py:1198] (1/2) Epoch 34, batch 3250, loss[loss=0.2373, ctc_loss=0.1134, cr_loss=0.3365, attn_decoder_loss=0.2436, over 29705.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.117, cr_loss=0.3586, attn_decoder_loss=0.2414, over 5799825.92 frames. ], batch size: 84, lr: 3.24e-03, grad_scale: 16.0 2024-09-19 06:34:44,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=610300.0, ans=0.125 2024-09-19 06:34:47,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=610300.0, ans=0.0 2024-09-19 06:35:08,932 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:36:01,924 INFO [train.py:1198] (1/2) Epoch 34, batch 3300, loss[loss=0.2428, ctc_loss=0.1179, cr_loss=0.3708, attn_decoder_loss=0.2484, over 28270.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1162, cr_loss=0.3566, attn_decoder_loss=0.2402, over 5797149.92 frames. ], batch size: 111, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:36:11,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=610500.0, ans=0.0 2024-09-19 06:36:23,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=610540.0, ans=0.125 2024-09-19 06:36:26,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=610540.0, ans=0.2 2024-09-19 06:36:32,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=610580.0, ans=0.125 2024-09-19 06:36:49,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.46 vs. limit=15.0 2024-09-19 06:36:56,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=610620.0, ans=0.0 2024-09-19 06:37:04,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=610660.0, ans=0.0 2024-09-19 06:37:17,122 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 8.592e+01 9.077e+01 9.630e+01 2.771e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 06:37:17,148 INFO [train.py:1198] (1/2) Epoch 34, batch 3350, loss[loss=0.2519, ctc_loss=0.128, cr_loss=0.3801, attn_decoder_loss=0.2573, over 28900.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1172, cr_loss=0.358, attn_decoder_loss=0.2411, over 5774504.25 frames. ], batch size: 104, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:37:35,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=610740.0, ans=0.0 2024-09-19 06:37:43,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=610740.0, ans=0.125 2024-09-19 06:37:47,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=610780.0, ans=0.0 2024-09-19 06:37:59,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=610780.0, ans=0.125 2024-09-19 06:38:06,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=610820.0, ans=10.0 2024-09-19 06:38:24,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=610860.0, ans=0.125 2024-09-19 06:38:35,424 INFO [train.py:1198] (1/2) Epoch 34, batch 3400, loss[loss=0.2069, ctc_loss=0.1068, cr_loss=0.3257, attn_decoder_loss=0.2107, over 29318.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1172, cr_loss=0.3579, attn_decoder_loss=0.241, over 5767750.84 frames. ], batch size: 67, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:38:54,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=610940.0, ans=0.125 2024-09-19 06:39:31,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=611020.0, ans=0.125 2024-09-19 06:39:53,418 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.710e+01 9.261e+01 9.751e+01 2.657e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-19 06:39:53,441 INFO [train.py:1198] (1/2) Epoch 34, batch 3450, loss[loss=0.2446, ctc_loss=0.1203, cr_loss=0.3509, attn_decoder_loss=0.2506, over 28369.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1179, cr_loss=0.359, attn_decoder_loss=0.2418, over 5775900.58 frames. ], batch size: 111, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:39:55,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=22.5 2024-09-19 06:40:04,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=611100.0, ans=0.95 2024-09-19 06:40:19,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=611140.0, ans=0.125 2024-09-19 06:40:54,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=611260.0, ans=0.035 2024-09-19 06:40:57,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2024-09-19 06:41:09,776 INFO [train.py:1198] (1/2) Epoch 34, batch 3500, loss[loss=0.2155, ctc_loss=0.1011, cr_loss=0.3245, attn_decoder_loss=0.221, over 29319.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1174, cr_loss=0.3579, attn_decoder_loss=0.2412, over 5776607.34 frames. ], batch size: 71, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:41:10,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=611300.0, ans=0.0 2024-09-19 06:41:11,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=611300.0, ans=0.2 2024-09-19 06:41:19,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=611300.0, ans=0.125 2024-09-19 06:41:22,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=611300.0, ans=0.125 2024-09-19 06:41:24,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2024-09-19 06:41:33,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=8.0 2024-09-19 06:41:35,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=611340.0, ans=0.125 2024-09-19 06:41:37,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2024-09-19 06:41:48,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=611380.0, ans=0.2 2024-09-19 06:42:09,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.23 vs. limit=15.0 2024-09-19 06:42:10,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=611460.0, ans=0.035 2024-09-19 06:42:19,119 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:42:26,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.638e+01 9.255e+01 9.995e+01 3.984e+02, threshold=1.851e+02, percent-clipped=2.0 2024-09-19 06:42:26,158 INFO [train.py:1198] (1/2) Epoch 34, batch 3550, loss[loss=0.2414, ctc_loss=0.1159, cr_loss=0.3526, attn_decoder_loss=0.2475, over 29709.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1175, cr_loss=0.3586, attn_decoder_loss=0.2412, over 5784205.34 frames. ], batch size: 89, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:42:27,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=611500.0, ans=0.0 2024-09-19 06:43:34,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=611660.0, ans=0.125 2024-09-19 06:43:37,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.16 vs. limit=12.0 2024-09-19 06:43:42,114 INFO [train.py:1198] (1/2) Epoch 34, batch 3600, loss[loss=0.2377, ctc_loss=0.1223, cr_loss=0.3681, attn_decoder_loss=0.2423, over 29483.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1175, cr_loss=0.3586, attn_decoder_loss=0.2413, over 5793283.39 frames. ], batch size: 77, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 06:43:54,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=611700.0, ans=0.0 2024-09-19 06:44:12,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.51 vs. limit=15.0 2024-09-19 06:44:22,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=611780.0, ans=0.0 2024-09-19 06:44:27,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.19 vs. limit=22.5 2024-09-19 06:44:39,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2024-09-19 06:44:56,828 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.640e+01 9.081e+01 9.603e+01 2.325e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 06:44:56,854 INFO [train.py:1198] (1/2) Epoch 34, batch 3650, loss[loss=0.259, ctc_loss=0.1402, cr_loss=0.4264, attn_decoder_loss=0.2627, over 29541.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1172, cr_loss=0.358, attn_decoder_loss=0.2409, over 5794886.38 frames. ], batch size: 90, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 06:44:57,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=611900.0, ans=0.0 2024-09-19 06:45:02,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=611900.0, ans=0.0 2024-09-19 06:45:24,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=611940.0, ans=0.125 2024-09-19 06:45:28,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=611980.0, ans=0.0 2024-09-19 06:45:54,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=612020.0, ans=0.125 2024-09-19 06:45:57,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=612060.0, ans=0.0 2024-09-19 06:46:11,765 INFO [train.py:1198] (1/2) Epoch 34, batch 3700, loss[loss=0.2476, ctc_loss=0.1178, cr_loss=0.3565, attn_decoder_loss=0.2541, over 29697.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1171, cr_loss=0.3579, attn_decoder_loss=0.2409, over 5804724.69 frames. ], batch size: 84, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:46:22,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=612100.0, ans=0.125 2024-09-19 06:46:27,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=612140.0, ans=0.0 2024-09-19 06:46:33,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=612140.0, ans=0.125 2024-09-19 06:46:42,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=612180.0, ans=0.0 2024-09-19 06:46:50,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=612180.0, ans=0.025 2024-09-19 06:46:54,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-09-19 06:46:55,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.29 vs. limit=15.0 2024-09-19 06:47:01,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=612220.0, ans=0.0 2024-09-19 06:47:14,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=612260.0, ans=0.0 2024-09-19 06:47:17,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612260.0, ans=0.1 2024-09-19 06:47:26,239 INFO [train.py:1198] (1/2) Epoch 34, batch 3750, loss[loss=0.2103, ctc_loss=0.09817, cr_loss=0.3268, attn_decoder_loss=0.2155, over 29345.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.117, cr_loss=0.3579, attn_decoder_loss=0.2406, over 5808242.47 frames. ], batch size: 67, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:47:27,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.454e+01 8.933e+01 9.373e+01 1.602e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-19 06:47:28,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=612300.0, ans=0.1 2024-09-19 06:47:41,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=612340.0, ans=0.125 2024-09-19 06:47:46,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612340.0, ans=0.1 2024-09-19 06:48:08,292 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:48:12,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=612420.0, ans=0.125 2024-09-19 06:48:19,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.41 vs. limit=15.0 2024-09-19 06:48:21,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=612420.0, ans=0.09899494936611666 2024-09-19 06:48:24,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=612420.0, ans=0.025 2024-09-19 06:48:29,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=612460.0, ans=0.125 2024-09-19 06:48:32,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=612460.0, ans=0.125 2024-09-19 06:48:35,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=612460.0, ans=0.0 2024-09-19 06:48:42,285 INFO [train.py:1198] (1/2) Epoch 34, batch 3800, loss[loss=0.2447, ctc_loss=0.118, cr_loss=0.3612, attn_decoder_loss=0.2507, over 29629.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1168, cr_loss=0.3573, attn_decoder_loss=0.2405, over 5798649.47 frames. ], batch size: 86, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:48:57,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.44 vs. limit=15.0 2024-09-19 06:49:05,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612540.0, ans=0.1 2024-09-19 06:49:30,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=612620.0, ans=0.125 2024-09-19 06:49:31,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=612620.0, ans=0.025 2024-09-19 06:49:37,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=612620.0, ans=0.0 2024-09-19 06:49:39,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.92 vs. limit=15.0 2024-09-19 06:49:47,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-19 06:49:52,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=612660.0, ans=0.125 2024-09-19 06:49:58,136 INFO [train.py:1198] (1/2) Epoch 34, batch 3850, loss[loss=0.2568, ctc_loss=0.1291, cr_loss=0.3824, attn_decoder_loss=0.2625, over 29287.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1164, cr_loss=0.3568, attn_decoder_loss=0.2404, over 5811878.83 frames. ], batch size: 100, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:49:59,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.333e+01 8.389e+01 8.951e+01 9.412e+01 1.497e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-19 06:50:08,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=612700.0, ans=0.125 2024-09-19 06:50:26,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612780.0, ans=0.1 2024-09-19 06:51:07,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=612860.0, ans=0.05 2024-09-19 06:51:12,887 INFO [train.py:1198] (1/2) Epoch 34, batch 3900, loss[loss=0.2493, ctc_loss=0.1252, cr_loss=0.3761, attn_decoder_loss=0.2547, over 29626.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1167, cr_loss=0.3578, attn_decoder_loss=0.2409, over 5815795.66 frames. ], batch size: 86, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:51:24,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.83 vs. limit=15.0 2024-09-19 06:51:44,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=22.5 2024-09-19 06:52:03,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=613020.0, ans=0.125 2024-09-19 06:52:12,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=613060.0, ans=0.0 2024-09-19 06:52:22,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=613060.0, ans=0.025 2024-09-19 06:52:26,853 INFO [train.py:1198] (1/2) Epoch 34, batch 3950, loss[loss=0.246, ctc_loss=0.1254, cr_loss=0.3877, attn_decoder_loss=0.2508, over 29491.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1167, cr_loss=0.3577, attn_decoder_loss=0.241, over 5835244.75 frames. ], batch size: 97, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:52:28,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.556e+01 9.009e+01 9.395e+01 1.816e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-19 06:52:55,266 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:53:42,101 INFO [train.py:1198] (1/2) Epoch 34, batch 4000, loss[loss=0.2202, ctc_loss=0.1029, cr_loss=0.3159, attn_decoder_loss=0.2262, over 29511.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1171, cr_loss=0.3585, attn_decoder_loss=0.2411, over 5812201.77 frames. ], batch size: 74, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 06:53:53,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-19 06:54:06,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613340.0, ans=0.1 2024-09-19 06:54:15,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=613380.0, ans=0.125 2024-09-19 06:54:27,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=613420.0, ans=0.0 2024-09-19 06:54:33,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2024-09-19 06:54:52,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=613460.0, ans=0.125 2024-09-19 06:54:57,755 INFO [train.py:1198] (1/2) Epoch 34, batch 4050, loss[loss=0.2591, ctc_loss=0.1428, cr_loss=0.3871, attn_decoder_loss=0.2634, over 20176.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1171, cr_loss=0.3584, attn_decoder_loss=0.2411, over 5796656.19 frames. ], batch size: 209, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:55:00,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.471e+01 9.121e+01 9.639e+01 2.999e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 06:55:14,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.24 vs. limit=10.0 2024-09-19 06:55:42,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=613620.0, ans=0.125 2024-09-19 06:55:46,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-09-19 06:55:46,806 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:55:50,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2024-09-19 06:55:57,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=613660.0, ans=0.2 2024-09-19 06:56:11,835 INFO [train.py:1198] (1/2) Epoch 34, batch 4100, loss[loss=0.2479, ctc_loss=0.1225, cr_loss=0.3595, attn_decoder_loss=0.2538, over 29523.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1174, cr_loss=0.3588, attn_decoder_loss=0.2413, over 5791668.56 frames. ], batch size: 90, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:56:28,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=613740.0, ans=0.125 2024-09-19 06:56:41,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=613780.0, ans=0.0 2024-09-19 06:56:43,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=613780.0, ans=0.07 2024-09-19 06:56:50,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=613780.0, ans=0.125 2024-09-19 06:56:52,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.96 vs. limit=22.5 2024-09-19 06:56:54,861 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:56:56,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.46 vs. limit=15.0 2024-09-19 06:57:02,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=613820.0, ans=0.125 2024-09-19 06:57:05,308 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:57:09,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=613860.0, ans=0.2 2024-09-19 06:57:09,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=613860.0, ans=0.125 2024-09-19 06:57:25,789 INFO [train.py:1198] (1/2) Epoch 34, batch 4150, loss[loss=0.2314, ctc_loss=0.1182, cr_loss=0.3706, attn_decoder_loss=0.2358, over 29489.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1172, cr_loss=0.3589, attn_decoder_loss=0.2409, over 5796762.33 frames. ], batch size: 77, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:57:26,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=613900.0, ans=0.2 2024-09-19 06:57:28,815 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 8.506e+01 8.901e+01 9.635e+01 1.346e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-19 06:57:45,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.98 vs. limit=15.0 2024-09-19 06:57:49,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=613940.0, ans=0.0 2024-09-19 06:57:55,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=613980.0, ans=0.0 2024-09-19 06:57:55,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=613980.0, ans=0.125 2024-09-19 06:57:56,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=613980.0, ans=0.0 2024-09-19 06:57:58,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=613980.0, ans=0.2 2024-09-19 06:58:39,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=614100.0, ans=0.0 2024-09-19 06:58:40,889 INFO [train.py:1198] (1/2) Epoch 34, batch 4200, loss[loss=0.2626, ctc_loss=0.1366, cr_loss=0.4162, attn_decoder_loss=0.2674, over 29527.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1173, cr_loss=0.3594, attn_decoder_loss=0.2412, over 5798360.44 frames. ], batch size: 90, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:58:49,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.96 vs. limit=15.0 2024-09-19 06:59:10,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=12.0 2024-09-19 06:59:17,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=614180.0, ans=0.125 2024-09-19 06:59:20,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=614180.0, ans=0.125 2024-09-19 06:59:20,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=614180.0, ans=0.0 2024-09-19 06:59:38,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=614220.0, ans=0.125 2024-09-19 06:59:41,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=614260.0, ans=0.05 2024-09-19 06:59:49,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-09-19 06:59:55,724 INFO [train.py:1198] (1/2) Epoch 34, batch 4250, loss[loss=0.2095, ctc_loss=0.0865, cr_loss=0.2876, attn_decoder_loss=0.2167, over 29527.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.117, cr_loss=0.3587, attn_decoder_loss=0.2413, over 5804416.47 frames. ], batch size: 74, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:59:58,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.496e+01 8.853e+01 9.381e+01 2.444e+02, threshold=1.771e+02, percent-clipped=1.0 2024-09-19 07:00:13,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-09-19 07:00:19,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=614340.0, ans=0.0 2024-09-19 07:00:24,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-09-19 07:00:26,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=614380.0, ans=0.1 2024-09-19 07:00:32,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=614380.0, ans=0.125 2024-09-19 07:00:35,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=614380.0, ans=0.0 2024-09-19 07:00:37,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=614380.0, ans=0.125 2024-09-19 07:00:38,949 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:00:46,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=614420.0, ans=0.2 2024-09-19 07:00:46,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=614420.0, ans=0.5 2024-09-19 07:00:49,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=614420.0, ans=0.0 2024-09-19 07:00:53,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=614460.0, ans=0.125 2024-09-19 07:01:09,459 INFO [train.py:1198] (1/2) Epoch 34, batch 4300, loss[loss=0.2471, ctc_loss=0.1177, cr_loss=0.3672, attn_decoder_loss=0.2533, over 29535.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.117, cr_loss=0.3585, attn_decoder_loss=0.2415, over 5794345.87 frames. ], batch size: 87, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 07:01:29,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=614540.0, ans=0.015 2024-09-19 07:01:41,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=22.5 2024-09-19 07:01:44,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=614580.0, ans=0.2 2024-09-19 07:02:00,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.75 vs. limit=15.0 2024-09-19 07:02:04,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.52 vs. limit=22.5 2024-09-19 07:02:16,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=614660.0, ans=0.0 2024-09-19 07:02:20,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.37 vs. limit=10.0 2024-09-19 07:02:21,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2024-09-19 07:02:22,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=614660.0, ans=0.0 2024-09-19 07:02:25,305 INFO [train.py:1198] (1/2) Epoch 34, batch 4350, loss[loss=0.2412, ctc_loss=0.1203, cr_loss=0.369, attn_decoder_loss=0.2464, over 29489.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1199, cr_loss=0.3642, attn_decoder_loss=0.2448, over 5797183.89 frames. ], batch size: 97, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 07:02:28,307 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.836e+01 9.274e+01 9.839e+01 5.976e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 07:03:12,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=614820.0, ans=0.5 2024-09-19 07:03:13,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=614820.0, ans=0.0 2024-09-19 07:03:16,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=614820.0, ans=0.125 2024-09-19 07:03:38,634 INFO [train.py:1198] (1/2) Epoch 34, batch 4400, loss[loss=0.249, ctc_loss=0.1305, cr_loss=0.3834, attn_decoder_loss=0.2536, over 27434.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1212, cr_loss=0.3668, attn_decoder_loss=0.2469, over 5767514.03 frames. ], batch size: 124, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 07:03:41,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=614900.0, ans=0.125 2024-09-19 07:04:16,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=614980.0, ans=0.125 2024-09-19 07:04:29,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615020.0, ans=0.1 2024-09-19 07:04:37,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=615060.0, ans=0.125 2024-09-19 07:04:37,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0 2024-09-19 07:04:38,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=615060.0, ans=0.125 2024-09-19 07:04:53,549 INFO [train.py:1198] (1/2) Epoch 34, batch 4450, loss[loss=0.265, ctc_loss=0.1527, cr_loss=0.3974, attn_decoder_loss=0.2686, over 20329.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1249, cr_loss=0.3719, attn_decoder_loss=0.2491, over 5574717.09 frames. ], batch size: 210, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 07:04:56,493 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.045e+01 9.501e+01 1.052e+02 3.870e+02, threshold=1.900e+02, percent-clipped=1.0 2024-09-19 07:05:13,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=615140.0, ans=0.0 2024-09-19 07:05:13,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=615140.0, ans=0.125 2024-09-19 07:05:24,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=615180.0, ans=0.07 2024-09-19 07:05:35,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=615180.0, ans=0.2 2024-09-19 07:05:54,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=615260.0, ans=0.2 2024-09-19 07:06:09,428 INFO [train.py:1198] (1/2) Epoch 34, batch 4500, loss[loss=0.2607, ctc_loss=0.1516, cr_loss=0.4022, attn_decoder_loss=0.2638, over 20084.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.128, cr_loss=0.3742, attn_decoder_loss=0.2508, over 5236885.40 frames. ], batch size: 209, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 07:06:32,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=615340.0, ans=0.04949747468305833 2024-09-19 07:06:34,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=615340.0, ans=0.025 2024-09-19 07:06:40,139 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:07:33,731 INFO [train.py:1198] (1/2) Epoch 35, batch 0, loss[loss=0.2151, ctc_loss=0.09534, cr_loss=0.3194, attn_decoder_loss=0.2214, over 29556.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.09534, cr_loss=0.3194, attn_decoder_loss=0.2214, over 29556.00 frames. ], batch size: 73, lr: 3.18e-03, grad_scale: 16.0 2024-09-19 07:07:33,731 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 07:07:52,109 INFO [train.py:1230] (1/2) Epoch 35, validation: loss=0.2125, ctc_loss=0.03615, cr_loss=6.293e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-19 07:07:52,109 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 07:07:53,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.78 vs. limit=10.0 2024-09-19 07:08:24,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=615480.0, ans=0.0 2024-09-19 07:08:36,043 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 1.018e+02 1.116e+02 1.176e+02 2.643e+02, threshold=2.232e+02, percent-clipped=1.0 2024-09-19 07:09:09,392 INFO [train.py:1198] (1/2) Epoch 35, batch 50, loss[loss=0.2081, ctc_loss=0.09678, cr_loss=0.3043, attn_decoder_loss=0.2137, over 29430.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.121, cr_loss=0.3672, attn_decoder_loss=0.2431, over 1266867.30 frames. ], batch size: 70, lr: 3.18e-03, grad_scale: 8.0 2024-09-19 07:09:27,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=615640.0, ans=0.125 2024-09-19 07:09:32,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=615640.0, ans=0.0 2024-09-19 07:09:38,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=615680.0, ans=0.025 2024-09-19 07:10:15,003 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:10:25,305 INFO [train.py:1198] (1/2) Epoch 35, batch 100, loss[loss=0.2199, ctc_loss=0.1081, cr_loss=0.33, attn_decoder_loss=0.225, over 29542.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1218, cr_loss=0.3684, attn_decoder_loss=0.2444, over 2251631.28 frames. ], batch size: 76, lr: 3.18e-03, grad_scale: 8.0 2024-09-19 07:10:40,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=615840.0, ans=0.025 2024-09-19 07:10:42,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.62 vs. limit=15.0 2024-09-19 07:10:43,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=615840.0, ans=0.2 2024-09-19 07:10:51,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=615840.0, ans=0.125 2024-09-19 07:11:11,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.556e+01 9.012e+01 9.778e+01 2.155e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 07:11:11,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=615920.0, ans=0.0 2024-09-19 07:11:15,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=615920.0, ans=0.125 2024-09-19 07:11:15,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=615920.0, ans=0.125 2024-09-19 07:11:23,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=615920.0, ans=0.0 2024-09-19 07:11:23,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=615920.0, ans=0.04949747468305833 2024-09-19 07:11:27,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.79 vs. limit=10.0 2024-09-19 07:11:42,843 INFO [train.py:1198] (1/2) Epoch 35, batch 150, loss[loss=0.2146, ctc_loss=0.09813, cr_loss=0.312, attn_decoder_loss=0.2206, over 29434.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1183, cr_loss=0.3611, attn_decoder_loss=0.2417, over 3045776.08 frames. ], batch size: 70, lr: 3.18e-03, grad_scale: 8.0 2024-09-19 07:11:57,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=616000.0, ans=0.125 2024-09-19 07:11:59,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=616040.0, ans=0.1 2024-09-19 07:12:24,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616080.0, ans=0.1 2024-09-19 07:12:30,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=616120.0, ans=0.125 2024-09-19 07:12:36,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=616120.0, ans=0.125 2024-09-19 07:12:58,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.08 vs. limit=15.0 2024-09-19 07:13:00,664 INFO [train.py:1198] (1/2) Epoch 35, batch 200, loss[loss=0.2462, ctc_loss=0.1278, cr_loss=0.3855, attn_decoder_loss=0.2508, over 27519.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1176, cr_loss=0.36, attn_decoder_loss=0.2412, over 3658757.30 frames. ], batch size: 125, lr: 3.18e-03, grad_scale: 8.0 2024-09-19 07:13:10,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-09-19 07:13:32,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=616280.0, ans=0.025 2024-09-19 07:13:40,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.58 vs. limit=22.5 2024-09-19 07:13:44,290 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.322e+01 8.803e+01 9.325e+01 1.291e+02, threshold=1.761e+02, percent-clipped=0.0 2024-09-19 07:13:46,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=616320.0, ans=0.125 2024-09-19 07:14:10,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.53 vs. limit=10.0 2024-09-19 07:14:11,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=616360.0, ans=0.1 2024-09-19 07:14:15,851 INFO [train.py:1198] (1/2) Epoch 35, batch 250, loss[loss=0.2545, ctc_loss=0.1375, cr_loss=0.3921, attn_decoder_loss=0.2588, over 29273.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1169, cr_loss=0.3587, attn_decoder_loss=0.2407, over 4142255.03 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:14:16,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=22.5 2024-09-19 07:14:29,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=616440.0, ans=0.125 2024-09-19 07:14:32,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=616440.0, ans=0.125 2024-09-19 07:14:38,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-09-19 07:14:39,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=22.5 2024-09-19 07:14:44,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=616480.0, ans=0.125 2024-09-19 07:15:22,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-19 07:15:22,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-09-19 07:15:31,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=616560.0, ans=0.0 2024-09-19 07:15:34,116 INFO [train.py:1198] (1/2) Epoch 35, batch 300, loss[loss=0.2634, ctc_loss=0.1422, cr_loss=0.4052, attn_decoder_loss=0.2679, over 29494.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1164, cr_loss=0.3578, attn_decoder_loss=0.2405, over 4510431.00 frames. ], batch size: 92, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:15:50,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=616640.0, ans=0.5 2024-09-19 07:16:20,345 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.384e+01 8.991e+01 9.743e+01 6.934e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 07:16:31,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=616720.0, ans=0.1 2024-09-19 07:16:52,556 INFO [train.py:1198] (1/2) Epoch 35, batch 350, loss[loss=0.2112, ctc_loss=0.09738, cr_loss=0.3028, attn_decoder_loss=0.2171, over 29324.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1165, cr_loss=0.358, attn_decoder_loss=0.2408, over 4796270.75 frames. ], batch size: 71, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:17:06,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=616840.0, ans=0.0 2024-09-19 07:17:09,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=616840.0, ans=0.025 2024-09-19 07:17:16,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=616840.0, ans=0.125 2024-09-19 07:17:30,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=616880.0, ans=0.2 2024-09-19 07:17:34,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-19 07:17:45,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=616920.0, ans=0.0 2024-09-19 07:17:45,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-19 07:18:06,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=617000.0, ans=0.125 2024-09-19 07:18:07,763 INFO [train.py:1198] (1/2) Epoch 35, batch 400, loss[loss=0.2406, ctc_loss=0.1237, cr_loss=0.3728, attn_decoder_loss=0.2453, over 29721.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.116, cr_loss=0.357, attn_decoder_loss=0.2404, over 5025970.85 frames. ], batch size: 82, lr: 3.17e-03, grad_scale: 16.0 2024-09-19 07:18:08,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2024-09-19 07:18:14,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=617000.0, ans=0.125 2024-09-19 07:18:31,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2024-09-19 07:18:49,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=617080.0, ans=0.125 2024-09-19 07:18:54,451 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 8.637e+01 9.137e+01 9.905e+01 1.373e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 07:18:59,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=617120.0, ans=0.0 2024-09-19 07:19:13,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=617160.0, ans=0.025 2024-09-19 07:19:17,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2024-09-19 07:19:22,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=617160.0, ans=0.07 2024-09-19 07:19:25,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=617200.0, ans=0.125 2024-09-19 07:19:26,438 INFO [train.py:1198] (1/2) Epoch 35, batch 450, loss[loss=0.2463, ctc_loss=0.1201, cr_loss=0.3589, attn_decoder_loss=0.2523, over 29701.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1162, cr_loss=0.3573, attn_decoder_loss=0.2406, over 5189734.36 frames. ], batch size: 83, lr: 3.17e-03, grad_scale: 16.0 2024-09-19 07:19:33,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2024-09-19 07:19:35,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=617200.0, ans=0.0 2024-09-19 07:19:39,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-09-19 07:19:49,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.13 vs. limit=10.0 2024-09-19 07:19:56,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.15 vs. limit=15.0 2024-09-19 07:19:57,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.13 vs. limit=10.0 2024-09-19 07:20:01,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-19 07:20:14,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=617320.0, ans=0.0 2024-09-19 07:20:16,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617320.0, ans=0.1 2024-09-19 07:20:17,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=617320.0, ans=0.0 2024-09-19 07:20:26,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=617320.0, ans=0.025 2024-09-19 07:20:40,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=617360.0, ans=0.125 2024-09-19 07:20:44,376 INFO [train.py:1198] (1/2) Epoch 35, batch 500, loss[loss=0.2538, ctc_loss=0.1234, cr_loss=0.3733, attn_decoder_loss=0.26, over 29465.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1161, cr_loss=0.3568, attn_decoder_loss=0.2403, over 5331872.08 frames. ], batch size: 94, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:20:56,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=617400.0, ans=0.1 2024-09-19 07:20:57,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=12.0 2024-09-19 07:21:00,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=617440.0, ans=15.0 2024-09-19 07:21:11,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=617440.0, ans=0.125 2024-09-19 07:21:24,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=617480.0, ans=0.125 2024-09-19 07:21:24,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2024-09-19 07:21:29,906 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.461e+01 8.901e+01 9.576e+01 2.460e+02, threshold=1.780e+02, percent-clipped=1.0 2024-09-19 07:21:42,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-09-19 07:22:00,515 INFO [train.py:1198] (1/2) Epoch 35, batch 550, loss[loss=0.2519, ctc_loss=0.1242, cr_loss=0.3751, attn_decoder_loss=0.2578, over 28841.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1162, cr_loss=0.3571, attn_decoder_loss=0.2405, over 5425156.25 frames. ], batch size: 104, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:22:47,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=617720.0, ans=0.0 2024-09-19 07:22:55,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-09-19 07:23:00,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=617720.0, ans=0.125 2024-09-19 07:23:07,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=617760.0, ans=0.2 2024-09-19 07:23:13,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=12.0 2024-09-19 07:23:14,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=617760.0, ans=0.0 2024-09-19 07:23:18,916 INFO [train.py:1198] (1/2) Epoch 35, batch 600, loss[loss=0.2485, ctc_loss=0.131, cr_loss=0.3748, attn_decoder_loss=0.2533, over 29255.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1164, cr_loss=0.3573, attn_decoder_loss=0.2408, over 5512334.70 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:23:25,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.66 vs. limit=15.0 2024-09-19 07:23:36,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=617840.0, ans=0.125 2024-09-19 07:23:52,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=617880.0, ans=0.07 2024-09-19 07:24:06,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.466e+01 8.823e+01 9.402e+01 3.791e+02, threshold=1.765e+02, percent-clipped=1.0 2024-09-19 07:24:08,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.18 vs. limit=15.0 2024-09-19 07:24:12,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=617920.0, ans=0.0 2024-09-19 07:24:29,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=617960.0, ans=0.0 2024-09-19 07:24:35,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=618000.0, ans=0.1 2024-09-19 07:24:36,200 INFO [train.py:1198] (1/2) Epoch 35, batch 650, loss[loss=0.2331, ctc_loss=0.1108, cr_loss=0.3586, attn_decoder_loss=0.2387, over 29759.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1154, cr_loss=0.3553, attn_decoder_loss=0.2399, over 5588743.29 frames. ], batch size: 81, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:24:44,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.90 vs. limit=22.5 2024-09-19 07:24:47,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=618000.0, ans=0.2 2024-09-19 07:25:07,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=618080.0, ans=0.125 2024-09-19 07:25:46,375 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:25:52,021 INFO [train.py:1198] (1/2) Epoch 35, batch 700, loss[loss=0.2269, ctc_loss=0.112, cr_loss=0.3519, attn_decoder_loss=0.2318, over 29533.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1163, cr_loss=0.3573, attn_decoder_loss=0.2408, over 5638789.33 frames. ], batch size: 76, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:26:22,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2024-09-19 07:26:33,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618280.0, ans=0.1 2024-09-19 07:26:34,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=618280.0, ans=0.125 2024-09-19 07:26:37,261 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.406e+01 8.899e+01 9.421e+01 1.331e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-19 07:26:40,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=618320.0, ans=0.0 2024-09-19 07:26:45,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=618320.0, ans=0.0 2024-09-19 07:26:51,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=618360.0, ans=0.0 2024-09-19 07:26:53,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-09-19 07:27:10,382 INFO [train.py:1198] (1/2) Epoch 35, batch 750, loss[loss=0.2449, ctc_loss=0.1171, cr_loss=0.3777, attn_decoder_loss=0.2507, over 29721.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1162, cr_loss=0.3573, attn_decoder_loss=0.2406, over 5678206.58 frames. ], batch size: 82, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:27:15,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=618400.0, ans=0.0 2024-09-19 07:27:18,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-09-19 07:27:35,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=618440.0, ans=0.125 2024-09-19 07:27:38,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=618440.0, ans=0.0 2024-09-19 07:27:49,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=618480.0, ans=0.2 2024-09-19 07:27:52,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618480.0, ans=0.1 2024-09-19 07:28:09,348 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:28:16,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=618560.0, ans=0.2 2024-09-19 07:28:28,640 INFO [train.py:1198] (1/2) Epoch 35, batch 800, loss[loss=0.2143, ctc_loss=0.101, cr_loss=0.3195, attn_decoder_loss=0.2198, over 29563.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1159, cr_loss=0.3567, attn_decoder_loss=0.2404, over 5709174.48 frames. ], batch size: 73, lr: 3.17e-03, grad_scale: 16.0 2024-09-19 07:29:00,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=618680.0, ans=0.1 2024-09-19 07:29:00,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-19 07:29:15,160 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 8.586e+01 8.985e+01 9.600e+01 2.003e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 07:29:23,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.15 vs. limit=15.0 2024-09-19 07:29:28,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=618760.0, ans=0.0 2024-09-19 07:29:29,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=618760.0, ans=10.0 2024-09-19 07:29:40,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=618760.0, ans=0.125 2024-09-19 07:29:42,406 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:29:43,495 INFO [train.py:1198] (1/2) Epoch 35, batch 850, loss[loss=0.2463, ctc_loss=0.1187, cr_loss=0.354, attn_decoder_loss=0.2526, over 29724.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1158, cr_loss=0.3567, attn_decoder_loss=0.2402, over 5737349.15 frames. ], batch size: 89, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:29:48,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=618800.0, ans=0.125 2024-09-19 07:29:51,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=618800.0, ans=0.0 2024-09-19 07:30:09,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=618840.0, ans=6.0 2024-09-19 07:30:13,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=618880.0, ans=0.125 2024-09-19 07:30:14,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-19 07:30:30,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=618920.0, ans=0.125 2024-09-19 07:30:50,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=618960.0, ans=0.1 2024-09-19 07:30:53,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=618960.0, ans=0.125 2024-09-19 07:31:01,540 INFO [train.py:1198] (1/2) Epoch 35, batch 900, loss[loss=0.2222, ctc_loss=0.1108, cr_loss=0.3557, attn_decoder_loss=0.2266, over 29629.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.116, cr_loss=0.357, attn_decoder_loss=0.2404, over 5742416.40 frames. ], batch size: 73, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:31:07,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=619000.0, ans=0.05 2024-09-19 07:31:29,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=619040.0, ans=0.125 2024-09-19 07:31:34,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619080.0, ans=0.1 2024-09-19 07:31:34,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=619080.0, ans=0.125 2024-09-19 07:31:44,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=619080.0, ans=0.125 2024-09-19 07:31:48,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=619120.0, ans=0.2 2024-09-19 07:31:50,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.514e+01 9.190e+01 1.005e+02 2.448e+02, threshold=1.838e+02, percent-clipped=2.0 2024-09-19 07:31:54,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=619120.0, ans=0.0 2024-09-19 07:32:12,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=12.0 2024-09-19 07:32:19,720 INFO [train.py:1198] (1/2) Epoch 35, batch 950, loss[loss=0.2197, ctc_loss=0.09761, cr_loss=0.3115, attn_decoder_loss=0.2263, over 29520.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1162, cr_loss=0.3571, attn_decoder_loss=0.2405, over 5743551.77 frames. ], batch size: 74, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:32:24,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=619200.0, ans=0.2 2024-09-19 07:32:33,741 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:32:35,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=619240.0, ans=0.125 2024-09-19 07:32:43,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=619240.0, ans=0.125 2024-09-19 07:32:50,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2024-09-19 07:32:56,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=619280.0, ans=0.5 2024-09-19 07:33:00,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=619280.0, ans=0.2 2024-09-19 07:33:01,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=619280.0, ans=0.125 2024-09-19 07:33:04,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-09-19 07:33:22,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=619360.0, ans=0.0 2024-09-19 07:33:26,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=619360.0, ans=0.0 2024-09-19 07:33:34,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=619400.0, ans=0.125 2024-09-19 07:33:35,499 INFO [train.py:1198] (1/2) Epoch 35, batch 1000, loss[loss=0.2332, ctc_loss=0.1211, cr_loss=0.3546, attn_decoder_loss=0.2378, over 29486.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.117, cr_loss=0.3582, attn_decoder_loss=0.2414, over 5738215.57 frames. ], batch size: 77, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:33:38,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=619400.0, ans=0.0 2024-09-19 07:33:41,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=619400.0, ans=0.0 2024-09-19 07:33:49,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=619440.0, ans=0.125 2024-09-19 07:33:58,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=619440.0, ans=0.1 2024-09-19 07:34:09,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=619480.0, ans=0.125 2024-09-19 07:34:20,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=619520.0, ans=10.0 2024-09-19 07:34:22,841 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.400e+01 8.920e+01 9.804e+01 1.524e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 07:34:26,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.49 vs. limit=15.0 2024-09-19 07:34:42,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=619560.0, ans=0.125 2024-09-19 07:34:53,684 INFO [train.py:1198] (1/2) Epoch 35, batch 1050, loss[loss=0.2449, ctc_loss=0.1225, cr_loss=0.3706, attn_decoder_loss=0.2503, over 29696.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1164, cr_loss=0.3569, attn_decoder_loss=0.2406, over 5744757.45 frames. ], batch size: 85, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:34:53,981 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:35:04,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=619600.0, ans=0.125 2024-09-19 07:35:23,732 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:35:45,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=619720.0, ans=0.07 2024-09-19 07:35:50,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=619720.0, ans=0.2 2024-09-19 07:35:58,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=619760.0, ans=0.0 2024-09-19 07:36:07,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=619760.0, ans=0.05 2024-09-19 07:36:11,766 INFO [train.py:1198] (1/2) Epoch 35, batch 1100, loss[loss=0.2312, ctc_loss=0.115, cr_loss=0.3482, attn_decoder_loss=0.2364, over 29439.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1163, cr_loss=0.3568, attn_decoder_loss=0.2405, over 5757465.56 frames. ], batch size: 78, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:36:27,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=619840.0, ans=0.125 2024-09-19 07:36:33,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=619840.0, ans=0.0 2024-09-19 07:36:36,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=619840.0, ans=0.2 2024-09-19 07:36:47,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2024-09-19 07:36:56,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=619920.0, ans=0.125 2024-09-19 07:36:58,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=619920.0, ans=0.2 2024-09-19 07:36:59,138 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.442e+01 8.888e+01 9.490e+01 5.357e+02, threshold=1.778e+02, percent-clipped=1.0 2024-09-19 07:37:28,429 INFO [train.py:1198] (1/2) Epoch 35, batch 1150, loss[loss=0.2357, ctc_loss=0.1132, cr_loss=0.3413, attn_decoder_loss=0.2417, over 29472.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1165, cr_loss=0.3574, attn_decoder_loss=0.2405, over 5754123.82 frames. ], batch size: 78, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:37:42,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=620040.0, ans=0.125 2024-09-19 07:37:46,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.74 vs. limit=15.0 2024-09-19 07:37:51,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=620040.0, ans=0.05 2024-09-19 07:38:46,890 INFO [train.py:1198] (1/2) Epoch 35, batch 1200, loss[loss=0.2347, ctc_loss=0.1118, cr_loss=0.3294, attn_decoder_loss=0.2411, over 29669.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1167, cr_loss=0.3578, attn_decoder_loss=0.2412, over 5745658.13 frames. ], batch size: 85, lr: 3.17e-03, grad_scale: 16.0 2024-09-19 07:38:57,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=620200.0, ans=0.125 2024-09-19 07:39:08,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620240.0, ans=0.125 2024-09-19 07:39:12,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=620240.0, ans=0.0 2024-09-19 07:39:35,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.532e+01 9.165e+01 9.750e+01 1.443e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-19 07:39:39,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=620320.0, ans=0.125 2024-09-19 07:39:47,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=620320.0, ans=22.5 2024-09-19 07:39:58,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=620360.0, ans=0.125 2024-09-19 07:40:01,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=620360.0, ans=0.125 2024-09-19 07:40:03,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.05 vs. limit=15.0 2024-09-19 07:40:04,437 INFO [train.py:1198] (1/2) Epoch 35, batch 1250, loss[loss=0.2492, ctc_loss=0.1251, cr_loss=0.3851, attn_decoder_loss=0.2544, over 29541.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1169, cr_loss=0.3588, attn_decoder_loss=0.2415, over 5773341.53 frames. ], batch size: 92, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:40:04,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=620400.0, ans=0.0 2024-09-19 07:40:26,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=620440.0, ans=0.1 2024-09-19 07:40:36,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=620480.0, ans=0.125 2024-09-19 07:40:50,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=620520.0, ans=0.125 2024-09-19 07:40:57,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620520.0, ans=0.1 2024-09-19 07:41:05,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=620560.0, ans=0.05 2024-09-19 07:41:20,150 INFO [train.py:1198] (1/2) Epoch 35, batch 1300, loss[loss=0.2423, ctc_loss=0.1146, cr_loss=0.3504, attn_decoder_loss=0.2487, over 28317.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1163, cr_loss=0.3578, attn_decoder_loss=0.2408, over 5777949.52 frames. ], batch size: 111, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:41:26,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620600.0, ans=0.125 2024-09-19 07:41:26,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=620600.0, ans=0.125 2024-09-19 07:41:35,769 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:41:37,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=620640.0, ans=0.025 2024-09-19 07:41:41,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=620640.0, ans=0.125 2024-09-19 07:41:47,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620640.0, ans=0.125 2024-09-19 07:41:47,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=620640.0, ans=0.125 2024-09-19 07:42:08,927 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.419e+01 8.887e+01 9.525e+01 1.443e+02, threshold=1.777e+02, percent-clipped=0.0 2024-09-19 07:42:16,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=620720.0, ans=0.2 2024-09-19 07:42:36,591 INFO [train.py:1198] (1/2) Epoch 35, batch 1350, loss[loss=0.2413, ctc_loss=0.1212, cr_loss=0.3742, attn_decoder_loss=0.2463, over 29735.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.116, cr_loss=0.3573, attn_decoder_loss=0.2404, over 5794957.72 frames. ], batch size: 81, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:42:36,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=620800.0, ans=0.125 2024-09-19 07:42:41,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=620800.0, ans=0.125 2024-09-19 07:42:44,444 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:43:07,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2024-09-19 07:43:09,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=620880.0, ans=0.0 2024-09-19 07:43:09,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=620880.0, ans=0.1 2024-09-19 07:43:14,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=620880.0, ans=0.125 2024-09-19 07:43:21,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=620880.0, ans=0.0 2024-09-19 07:43:39,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=620960.0, ans=0.125 2024-09-19 07:43:56,462 INFO [train.py:1198] (1/2) Epoch 35, batch 1400, loss[loss=0.2078, ctc_loss=0.09964, cr_loss=0.3178, attn_decoder_loss=0.2127, over 29570.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1161, cr_loss=0.3576, attn_decoder_loss=0.2401, over 5807186.84 frames. ], batch size: 69, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:44:04,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=621000.0, ans=0.125 2024-09-19 07:44:44,723 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.443e+01 9.009e+01 9.628e+01 2.334e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 07:44:49,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=621120.0, ans=0.125 2024-09-19 07:44:58,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=621160.0, ans=0.0 2024-09-19 07:45:09,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=621160.0, ans=0.05 2024-09-19 07:45:11,967 INFO [train.py:1198] (1/2) Epoch 35, batch 1450, loss[loss=0.248, ctc_loss=0.1182, cr_loss=0.3568, attn_decoder_loss=0.2545, over 29423.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1162, cr_loss=0.3577, attn_decoder_loss=0.2405, over 5803639.55 frames. ], batch size: 94, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:45:36,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=621240.0, ans=0.125 2024-09-19 07:45:59,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=621320.0, ans=0.95 2024-09-19 07:46:09,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=621320.0, ans=0.0 2024-09-19 07:46:27,759 INFO [train.py:1198] (1/2) Epoch 35, batch 1500, loss[loss=0.2498, ctc_loss=0.1174, cr_loss=0.376, attn_decoder_loss=0.2562, over 29618.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1161, cr_loss=0.3573, attn_decoder_loss=0.2408, over 5805131.34 frames. ], batch size: 86, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:46:38,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621400.0, ans=0.1 2024-09-19 07:47:01,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=621480.0, ans=0.125 2024-09-19 07:47:04,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=621480.0, ans=6.0 2024-09-19 07:47:07,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621480.0, ans=0.1 2024-09-19 07:47:20,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.458e+01 9.148e+01 9.758e+01 1.676e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-19 07:47:27,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=621520.0, ans=0.125 2024-09-19 07:47:34,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=621560.0, ans=0.125 2024-09-19 07:47:44,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=621560.0, ans=10.0 2024-09-19 07:47:48,562 INFO [train.py:1198] (1/2) Epoch 35, batch 1550, loss[loss=0.2636, ctc_loss=0.1559, cr_loss=0.4459, attn_decoder_loss=0.2657, over 29530.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1167, cr_loss=0.3581, attn_decoder_loss=0.2408, over 5780716.56 frames. ], batch size: 90, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:48:17,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=621680.0, ans=0.125 2024-09-19 07:48:31,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=621680.0, ans=0.0 2024-09-19 07:48:35,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=621720.0, ans=0.2 2024-09-19 07:48:38,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=621720.0, ans=0.125 2024-09-19 07:49:04,124 INFO [train.py:1198] (1/2) Epoch 35, batch 1600, loss[loss=0.2373, ctc_loss=0.111, cr_loss=0.3535, attn_decoder_loss=0.2435, over 29682.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.117, cr_loss=0.3585, attn_decoder_loss=0.2408, over 5765048.70 frames. ], batch size: 85, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:49:30,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=621840.0, ans=0.0 2024-09-19 07:49:34,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=621880.0, ans=0.125 2024-09-19 07:49:52,917 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.463e+01 9.203e+01 9.882e+01 2.471e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-19 07:50:15,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.09 vs. limit=12.0 2024-09-19 07:50:20,050 INFO [train.py:1198] (1/2) Epoch 35, batch 1650, loss[loss=0.2482, ctc_loss=0.1214, cr_loss=0.3605, attn_decoder_loss=0.2542, over 29715.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1163, cr_loss=0.3572, attn_decoder_loss=0.2403, over 5761454.98 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:50:20,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.30 vs. limit=15.0 2024-09-19 07:50:59,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=15.0 2024-09-19 07:51:01,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-09-19 07:51:08,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=622120.0, ans=0.125 2024-09-19 07:51:31,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.82 vs. limit=15.0 2024-09-19 07:51:39,323 INFO [train.py:1198] (1/2) Epoch 35, batch 1700, loss[loss=0.2024, ctc_loss=0.09548, cr_loss=0.3052, attn_decoder_loss=0.2075, over 29577.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.116, cr_loss=0.3566, attn_decoder_loss=0.2402, over 5780986.35 frames. ], batch size: 69, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:51:47,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=622200.0, ans=0.2 2024-09-19 07:51:48,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=622200.0, ans=0.125 2024-09-19 07:51:54,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=622240.0, ans=0.1 2024-09-19 07:52:02,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=622240.0, ans=0.125 2024-09-19 07:52:07,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.82 vs. limit=10.0 2024-09-19 07:52:15,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.05 vs. limit=10.0 2024-09-19 07:52:27,941 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.456e+01 8.908e+01 9.428e+01 1.294e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-19 07:52:48,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=12.0 2024-09-19 07:52:55,731 INFO [train.py:1198] (1/2) Epoch 35, batch 1750, loss[loss=0.2138, ctc_loss=0.105, cr_loss=0.3461, attn_decoder_loss=0.2182, over 29345.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1158, cr_loss=0.3561, attn_decoder_loss=0.2399, over 5788041.35 frames. ], batch size: 67, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:53:02,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622400.0, ans=0.1 2024-09-19 07:53:08,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=622400.0, ans=0.125 2024-09-19 07:53:26,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=622480.0, ans=0.0 2024-09-19 07:53:46,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=622520.0, ans=10.0 2024-09-19 07:53:47,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622520.0, ans=0.1 2024-09-19 07:54:02,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=622560.0, ans=0.125 2024-09-19 07:54:04,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.41 vs. limit=15.0 2024-09-19 07:54:11,180 INFO [train.py:1198] (1/2) Epoch 35, batch 1800, loss[loss=0.253, ctc_loss=0.1351, cr_loss=0.3945, attn_decoder_loss=0.2573, over 29680.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1158, cr_loss=0.3563, attn_decoder_loss=0.2401, over 5791206.22 frames. ], batch size: 83, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:54:14,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=622600.0, ans=0.0 2024-09-19 07:55:05,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.311e+01 8.892e+01 9.552e+01 1.638e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 07:55:08,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=622720.0, ans=0.0 2024-09-19 07:55:10,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=622720.0, ans=0.1 2024-09-19 07:55:11,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=622720.0, ans=0.025 2024-09-19 07:55:14,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=622760.0, ans=0.125 2024-09-19 07:55:31,273 INFO [train.py:1198] (1/2) Epoch 35, batch 1850, loss[loss=0.2481, ctc_loss=0.1252, cr_loss=0.3822, attn_decoder_loss=0.2533, over 29644.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1156, cr_loss=0.3559, attn_decoder_loss=0.2399, over 5795477.24 frames. ], batch size: 86, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:55:52,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=622840.0, ans=0.05 2024-09-19 07:56:15,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=622920.0, ans=0.1 2024-09-19 07:56:18,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=622920.0, ans=0.125 2024-09-19 07:56:18,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.49 vs. limit=15.0 2024-09-19 07:56:19,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=622920.0, ans=0.2 2024-09-19 07:56:30,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=622960.0, ans=0.125 2024-09-19 07:56:33,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=622960.0, ans=0.125 2024-09-19 07:56:46,597 INFO [train.py:1198] (1/2) Epoch 35, batch 1900, loss[loss=0.2414, ctc_loss=0.1252, cr_loss=0.3745, attn_decoder_loss=0.246, over 29697.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1164, cr_loss=0.3572, attn_decoder_loss=0.2408, over 5803015.37 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:56:54,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=623000.0, ans=0.0 2024-09-19 07:57:08,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=623040.0, ans=0.125 2024-09-19 07:57:14,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=623040.0, ans=0.2 2024-09-19 07:57:36,455 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.574e+01 9.165e+01 9.579e+01 2.044e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-19 07:57:39,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=623120.0, ans=0.0 2024-09-19 07:57:44,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=623120.0, ans=0.125 2024-09-19 07:57:57,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=623160.0, ans=0.125 2024-09-19 07:57:59,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=623160.0, ans=0.0 2024-09-19 07:58:02,542 INFO [train.py:1198] (1/2) Epoch 35, batch 1950, loss[loss=0.2269, ctc_loss=0.1103, cr_loss=0.3395, attn_decoder_loss=0.2323, over 29459.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1173, cr_loss=0.3593, attn_decoder_loss=0.242, over 5818065.90 frames. ], batch size: 78, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:58:35,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=623280.0, ans=0.2 2024-09-19 07:58:43,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.98 vs. limit=15.0 2024-09-19 07:59:07,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=623360.0, ans=0.125 2024-09-19 07:59:20,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=623400.0, ans=0.125 2024-09-19 07:59:22,128 INFO [train.py:1198] (1/2) Epoch 35, batch 2000, loss[loss=0.2094, ctc_loss=0.1044, cr_loss=0.3507, attn_decoder_loss=0.2133, over 29352.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1175, cr_loss=0.3603, attn_decoder_loss=0.2422, over 5798060.81 frames. ], batch size: 67, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:59:26,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.22 vs. limit=15.0 2024-09-19 07:59:54,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=623480.0, ans=0.0 2024-09-19 08:00:09,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=623520.0, ans=0.0 2024-09-19 08:00:13,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-09-19 08:00:13,529 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.517e+01 9.042e+01 9.652e+01 2.863e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 08:00:33,334 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:00:37,513 INFO [train.py:1198] (1/2) Epoch 35, batch 2050, loss[loss=0.2113, ctc_loss=0.1014, cr_loss=0.3261, attn_decoder_loss=0.2163, over 29458.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.117, cr_loss=0.3585, attn_decoder_loss=0.2414, over 5789228.31 frames. ], batch size: 70, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 08:01:04,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.37 vs. limit=10.0 2024-09-19 08:01:10,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=623680.0, ans=0.125 2024-09-19 08:01:15,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-09-19 08:01:17,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-09-19 08:01:47,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=623760.0, ans=0.125 2024-09-19 08:01:50,703 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:01:53,588 INFO [train.py:1198] (1/2) Epoch 35, batch 2100, loss[loss=0.2303, ctc_loss=0.1098, cr_loss=0.3484, attn_decoder_loss=0.2359, over 29732.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1168, cr_loss=0.3581, attn_decoder_loss=0.2411, over 5800736.21 frames. ], batch size: 81, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 08:01:55,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=623800.0, ans=0.025 2024-09-19 08:02:43,774 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:02:49,244 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.509e+01 9.008e+01 9.603e+01 1.299e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 08:03:20,895 INFO [train.py:1198] (1/2) Epoch 35, batch 2150, loss[loss=0.2306, ctc_loss=0.1172, cr_loss=0.3548, attn_decoder_loss=0.2354, over 29435.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.116, cr_loss=0.3568, attn_decoder_loss=0.2403, over 5815795.14 frames. ], batch size: 78, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 08:03:22,904 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:03:25,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=624000.0, ans=0.1 2024-09-19 08:03:37,940 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:03:42,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=624040.0, ans=0.125 2024-09-19 08:03:42,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=624040.0, ans=0.5 2024-09-19 08:03:53,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624080.0, ans=0.1 2024-09-19 08:04:13,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624120.0, ans=0.1 2024-09-19 08:04:32,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=624160.0, ans=0.125 2024-09-19 08:04:36,500 INFO [train.py:1198] (1/2) Epoch 35, batch 2200, loss[loss=0.2588, ctc_loss=0.1266, cr_loss=0.3864, attn_decoder_loss=0.2649, over 29636.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1163, cr_loss=0.3569, attn_decoder_loss=0.2405, over 5811490.01 frames. ], batch size: 86, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 08:04:41,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=624200.0, ans=0.0 2024-09-19 08:04:51,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=624240.0, ans=0.0 2024-09-19 08:05:05,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=624280.0, ans=0.0 2024-09-19 08:05:14,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=624280.0, ans=0.0 2024-09-19 08:05:23,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.85 vs. limit=22.5 2024-09-19 08:05:27,602 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 8.597e+01 9.109e+01 9.743e+01 2.251e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 08:05:28,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=624320.0, ans=0.125 2024-09-19 08:05:30,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624320.0, ans=0.1 2024-09-19 08:05:32,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=624320.0, ans=0.125 2024-09-19 08:05:37,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.75 vs. limit=15.0 2024-09-19 08:05:47,564 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:05:51,839 INFO [train.py:1198] (1/2) Epoch 35, batch 2250, loss[loss=0.2464, ctc_loss=0.1271, cr_loss=0.382, attn_decoder_loss=0.2511, over 29708.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1164, cr_loss=0.3568, attn_decoder_loss=0.2406, over 5810561.73 frames. ], batch size: 82, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:06:02,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=624400.0, ans=0.0 2024-09-19 08:06:02,732 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:06:09,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-09-19 08:06:21,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=624440.0, ans=0.0 2024-09-19 08:06:39,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=624520.0, ans=0.125 2024-09-19 08:06:44,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=624520.0, ans=0.07 2024-09-19 08:06:49,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=624520.0, ans=0.125 2024-09-19 08:06:53,552 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:07:10,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=624600.0, ans=0.125 2024-09-19 08:07:11,531 INFO [train.py:1198] (1/2) Epoch 35, batch 2300, loss[loss=0.1965, ctc_loss=0.08313, cr_loss=0.2772, attn_decoder_loss=0.2029, over 29293.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1155, cr_loss=0.3551, attn_decoder_loss=0.2396, over 5798754.98 frames. ], batch size: 71, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:07:23,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=624600.0, ans=0.07 2024-09-19 08:07:25,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=624640.0, ans=0.025 2024-09-19 08:07:45,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2024-09-19 08:07:47,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=15.0 2024-09-19 08:08:02,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.578e+01 9.084e+01 9.791e+01 1.309e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-19 08:08:03,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=624720.0, ans=0.0 2024-09-19 08:08:27,457 INFO [train.py:1198] (1/2) Epoch 35, batch 2350, loss[loss=0.2643, ctc_loss=0.142, cr_loss=0.4126, attn_decoder_loss=0.2687, over 29689.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1159, cr_loss=0.3561, attn_decoder_loss=0.2398, over 5804272.75 frames. ], batch size: 83, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:08:40,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.72 vs. limit=10.0 2024-09-19 08:09:02,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=624880.0, ans=0.0 2024-09-19 08:09:14,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=624920.0, ans=0.1 2024-09-19 08:09:21,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=624920.0, ans=0.125 2024-09-19 08:09:21,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=624920.0, ans=0.125 2024-09-19 08:09:40,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=624960.0, ans=0.0 2024-09-19 08:09:42,847 INFO [train.py:1198] (1/2) Epoch 35, batch 2400, loss[loss=0.2262, ctc_loss=0.1135, cr_loss=0.351, attn_decoder_loss=0.2309, over 29546.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1165, cr_loss=0.3573, attn_decoder_loss=0.2404, over 5808529.11 frames. ], batch size: 76, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:09:44,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=625000.0, ans=0.0 2024-09-19 08:09:47,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=625000.0, ans=0.2 2024-09-19 08:09:49,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-19 08:09:58,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=625040.0, ans=0.125 2024-09-19 08:10:00,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.79 vs. limit=10.0 2024-09-19 08:10:15,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=625080.0, ans=0.0 2024-09-19 08:10:23,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.25 vs. limit=22.5 2024-09-19 08:10:35,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=625120.0, ans=0.0 2024-09-19 08:10:38,550 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 8.636e+01 9.212e+01 9.895e+01 1.857e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-19 08:10:56,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.11 vs. limit=6.0 2024-09-19 08:11:02,873 INFO [train.py:1198] (1/2) Epoch 35, batch 2450, loss[loss=0.2394, ctc_loss=0.1175, cr_loss=0.3579, attn_decoder_loss=0.245, over 29722.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1173, cr_loss=0.3589, attn_decoder_loss=0.2415, over 5783918.45 frames. ], batch size: 82, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:11:18,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=625240.0, ans=0.125 2024-09-19 08:11:18,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.09 vs. limit=22.5 2024-09-19 08:11:33,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=625280.0, ans=0.05 2024-09-19 08:11:33,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=625280.0, ans=0.0 2024-09-19 08:11:48,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=625320.0, ans=0.0 2024-09-19 08:12:00,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2024-09-19 08:12:01,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=625320.0, ans=0.125 2024-09-19 08:12:18,951 INFO [train.py:1198] (1/2) Epoch 35, batch 2500, loss[loss=0.2383, ctc_loss=0.1181, cr_loss=0.3599, attn_decoder_loss=0.2436, over 29635.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1174, cr_loss=0.3598, attn_decoder_loss=0.2416, over 5794904.87 frames. ], batch size: 86, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:12:41,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.39 vs. limit=10.0 2024-09-19 08:13:10,580 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.524e+01 8.980e+01 9.425e+01 1.614e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-19 08:13:24,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=625560.0, ans=0.2 2024-09-19 08:13:26,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=12.0 2024-09-19 08:13:29,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=625560.0, ans=0.2 2024-09-19 08:13:34,334 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.37 vs. limit=6.0 2024-09-19 08:13:35,396 INFO [train.py:1198] (1/2) Epoch 35, batch 2550, loss[loss=0.211, ctc_loss=0.101, cr_loss=0.3399, attn_decoder_loss=0.2157, over 29313.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1172, cr_loss=0.3595, attn_decoder_loss=0.2415, over 5798242.18 frames. ], batch size: 67, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:13:43,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=625600.0, ans=0.125 2024-09-19 08:13:54,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=625640.0, ans=0.0 2024-09-19 08:13:55,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-19 08:14:20,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=625680.0, ans=0.125 2024-09-19 08:14:20,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=625680.0, ans=0.0 2024-09-19 08:14:23,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=625720.0, ans=0.125 2024-09-19 08:14:33,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=625720.0, ans=0.125 2024-09-19 08:14:55,516 INFO [train.py:1198] (1/2) Epoch 35, batch 2600, loss[loss=0.2323, ctc_loss=0.1151, cr_loss=0.3489, attn_decoder_loss=0.2376, over 29466.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1174, cr_loss=0.36, attn_decoder_loss=0.2418, over 5794818.30 frames. ], batch size: 78, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:15:16,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=625840.0, ans=0.0 2024-09-19 08:15:18,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=625840.0, ans=0.125 2024-09-19 08:15:20,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2024-09-19 08:15:27,392 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:15:30,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625880.0, ans=0.1 2024-09-19 08:15:31,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=625880.0, ans=0.025 2024-09-19 08:15:45,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=625920.0, ans=0.125 2024-09-19 08:15:46,693 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.594e+01 9.058e+01 9.611e+01 1.555e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-19 08:15:57,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=625960.0, ans=0.125 2024-09-19 08:15:57,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=625960.0, ans=0.1 2024-09-19 08:16:10,541 INFO [train.py:1198] (1/2) Epoch 35, batch 2650, loss[loss=0.2522, ctc_loss=0.1257, cr_loss=0.3699, attn_decoder_loss=0.2581, over 29331.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1174, cr_loss=0.3602, attn_decoder_loss=0.242, over 5801969.27 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:16:10,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=626000.0, ans=0.0 2024-09-19 08:16:12,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=626000.0, ans=0.0 2024-09-19 08:16:34,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=626040.0, ans=0.125 2024-09-19 08:16:56,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=22.5 2024-09-19 08:17:00,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=626120.0, ans=0.125 2024-09-19 08:17:14,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.39 vs. limit=22.5 2024-09-19 08:17:25,667 INFO [train.py:1198] (1/2) Epoch 35, batch 2700, loss[loss=0.2521, ctc_loss=0.126, cr_loss=0.3799, attn_decoder_loss=0.2577, over 29523.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1179, cr_loss=0.3609, attn_decoder_loss=0.2425, over 5797851.12 frames. ], batch size: 87, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:17:26,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=626200.0, ans=0.0 2024-09-19 08:17:35,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.78 vs. limit=10.0 2024-09-19 08:17:46,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=626240.0, ans=0.125 2024-09-19 08:17:46,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=626240.0, ans=0.025 2024-09-19 08:17:46,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=626240.0, ans=0.0 2024-09-19 08:17:58,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=626280.0, ans=0.2 2024-09-19 08:17:59,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=626280.0, ans=0.125 2024-09-19 08:18:20,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.428e+01 9.037e+01 9.618e+01 3.244e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-19 08:18:34,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=626360.0, ans=0.2 2024-09-19 08:18:46,431 INFO [train.py:1198] (1/2) Epoch 35, batch 2750, loss[loss=0.2247, ctc_loss=0.1088, cr_loss=0.3406, attn_decoder_loss=0.23, over 29520.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1171, cr_loss=0.3588, attn_decoder_loss=0.2415, over 5797527.68 frames. ], batch size: 75, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:18:49,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=626400.0, ans=0.07 2024-09-19 08:18:51,443 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:18:51,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=626400.0, ans=0.05 2024-09-19 08:18:54,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=626400.0, ans=0.125 2024-09-19 08:19:15,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=626480.0, ans=0.0 2024-09-19 08:19:25,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=626480.0, ans=0.04949747468305833 2024-09-19 08:19:35,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=626520.0, ans=0.125 2024-09-19 08:20:02,181 INFO [train.py:1198] (1/2) Epoch 35, batch 2800, loss[loss=0.2529, ctc_loss=0.1432, cr_loss=0.3907, attn_decoder_loss=0.2564, over 19864.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.117, cr_loss=0.3586, attn_decoder_loss=0.2413, over 5777963.21 frames. ], batch size: 209, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:20:52,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=626720.0, ans=0.0 2024-09-19 08:20:54,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.599e+01 9.222e+01 9.663e+01 2.009e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-19 08:21:17,437 INFO [train.py:1198] (1/2) Epoch 35, batch 2850, loss[loss=0.2261, ctc_loss=0.1106, cr_loss=0.345, attn_decoder_loss=0.2312, over 29499.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1175, cr_loss=0.3591, attn_decoder_loss=0.2415, over 5763607.33 frames. ], batch size: 77, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:21:17,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=626800.0, ans=0.2 2024-09-19 08:21:38,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=626840.0, ans=0.09899494936611666 2024-09-19 08:22:00,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-19 08:22:03,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=626880.0, ans=22.5 2024-09-19 08:22:25,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2024-09-19 08:22:28,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626960.0, ans=0.1 2024-09-19 08:22:37,754 INFO [train.py:1198] (1/2) Epoch 35, batch 2900, loss[loss=0.2405, ctc_loss=0.1229, cr_loss=0.3753, attn_decoder_loss=0.2452, over 29419.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.118, cr_loss=0.3609, attn_decoder_loss=0.2424, over 5788619.68 frames. ], batch size: 79, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:23:02,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=627040.0, ans=0.2 2024-09-19 08:23:27,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=627120.0, ans=0.0 2024-09-19 08:23:32,019 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.643e+01 9.038e+01 9.732e+01 2.249e+02, threshold=1.808e+02, percent-clipped=2.0 2024-09-19 08:23:43,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.86 vs. limit=15.0 2024-09-19 08:23:45,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=627160.0, ans=0.125 2024-09-19 08:23:50,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=627160.0, ans=0.0 2024-09-19 08:23:50,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=627160.0, ans=0.125 2024-09-19 08:23:53,515 INFO [train.py:1198] (1/2) Epoch 35, batch 2950, loss[loss=0.2182, ctc_loss=0.1048, cr_loss=0.3349, attn_decoder_loss=0.2233, over 29515.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1173, cr_loss=0.3593, attn_decoder_loss=0.2412, over 5783539.03 frames. ], batch size: 75, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:24:25,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=627280.0, ans=0.125 2024-09-19 08:24:33,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=627280.0, ans=0.125 2024-09-19 08:25:09,411 INFO [train.py:1198] (1/2) Epoch 35, batch 3000, loss[loss=0.2413, ctc_loss=0.1235, cr_loss=0.3749, attn_decoder_loss=0.246, over 29739.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1171, cr_loss=0.3591, attn_decoder_loss=0.2411, over 5782780.28 frames. ], batch size: 81, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:25:09,412 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 08:25:28,762 INFO [train.py:1230] (1/2) Epoch 35, validation: loss=0.2119, ctc_loss=0.03685, cr_loss=6.108e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-19 08:25:28,762 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 08:25:38,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=627400.0, ans=0.125 2024-09-19 08:25:46,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=627440.0, ans=0.1 2024-09-19 08:26:01,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=627480.0, ans=0.125 2024-09-19 08:26:25,876 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.691e+01 9.210e+01 9.887e+01 4.457e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-19 08:26:36,940 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:26:47,038 INFO [train.py:1198] (1/2) Epoch 35, batch 3050, loss[loss=0.2302, ctc_loss=0.116, cr_loss=0.3756, attn_decoder_loss=0.2345, over 29537.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1172, cr_loss=0.3597, attn_decoder_loss=0.2416, over 5776273.88 frames. ], batch size: 76, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:26:55,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2024-09-19 08:27:07,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=627640.0, ans=0.1 2024-09-19 08:27:15,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.71 vs. limit=22.5 2024-09-19 08:27:15,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.33 vs. limit=15.0 2024-09-19 08:27:16,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=627680.0, ans=0.0 2024-09-19 08:27:23,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=627680.0, ans=0.125 2024-09-19 08:27:49,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=627760.0, ans=0.125 2024-09-19 08:28:02,355 INFO [train.py:1198] (1/2) Epoch 35, batch 3100, loss[loss=0.2418, ctc_loss=0.117, cr_loss=0.3511, attn_decoder_loss=0.2478, over 29261.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.117, cr_loss=0.3595, attn_decoder_loss=0.2413, over 5776268.04 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:28:16,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=627840.0, ans=0.95 2024-09-19 08:28:45,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=627880.0, ans=0.125 2024-09-19 08:28:57,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.618e+01 9.080e+01 9.751e+01 2.675e+02, threshold=1.816e+02, percent-clipped=2.0 2024-09-19 08:28:57,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=627920.0, ans=0.125 2024-09-19 08:29:03,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=627960.0, ans=0.125 2024-09-19 08:29:16,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=627960.0, ans=0.125 2024-09-19 08:29:21,433 INFO [train.py:1198] (1/2) Epoch 35, batch 3150, loss[loss=0.2575, ctc_loss=0.1381, cr_loss=0.4043, attn_decoder_loss=0.2618, over 28741.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1171, cr_loss=0.3596, attn_decoder_loss=0.2413, over 5783403.49 frames. ], batch size: 104, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:29:23,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=628000.0, ans=0.125 2024-09-19 08:29:48,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=628040.0, ans=0.125 2024-09-19 08:30:01,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=628080.0, ans=0.0 2024-09-19 08:30:18,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=628120.0, ans=0.125 2024-09-19 08:30:36,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=628160.0, ans=0.2 2024-09-19 08:30:38,989 INFO [train.py:1198] (1/2) Epoch 35, batch 3200, loss[loss=0.2255, ctc_loss=0.1017, cr_loss=0.3313, attn_decoder_loss=0.2318, over 29774.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1169, cr_loss=0.359, attn_decoder_loss=0.2411, over 5794326.20 frames. ], batch size: 80, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:30:50,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2024-09-19 08:30:51,451 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:30:55,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=628240.0, ans=0.0 2024-09-19 08:31:34,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.608e+01 9.276e+01 9.756e+01 1.910e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 08:31:50,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=628360.0, ans=0.0 2024-09-19 08:31:54,892 INFO [train.py:1198] (1/2) Epoch 35, batch 3250, loss[loss=0.2453, ctc_loss=0.1271, cr_loss=0.3836, attn_decoder_loss=0.2499, over 29700.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1168, cr_loss=0.3588, attn_decoder_loss=0.2413, over 5800968.31 frames. ], batch size: 84, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:32:04,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=628400.0, ans=0.1 2024-09-19 08:32:07,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=628400.0, ans=0.0 2024-09-19 08:32:10,291 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:32:57,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=628560.0, ans=0.2 2024-09-19 08:33:04,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.60 vs. limit=10.0 2024-09-19 08:33:12,770 INFO [train.py:1198] (1/2) Epoch 35, batch 3300, loss[loss=0.2451, ctc_loss=0.1255, cr_loss=0.3623, attn_decoder_loss=0.2503, over 28431.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.116, cr_loss=0.3569, attn_decoder_loss=0.2401, over 5797504.56 frames. ], batch size: 112, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:33:26,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=628640.0, ans=0.125 2024-09-19 08:33:31,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=628640.0, ans=0.04949747468305833 2024-09-19 08:34:10,345 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.715e+01 9.290e+01 9.754e+01 2.928e+02, threshold=1.858e+02, percent-clipped=2.0 2024-09-19 08:34:13,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=628760.0, ans=0.09899494936611666 2024-09-19 08:34:13,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=628760.0, ans=0.2 2024-09-19 08:34:30,126 INFO [train.py:1198] (1/2) Epoch 35, batch 3350, loss[loss=0.2506, ctc_loss=0.1282, cr_loss=0.3672, attn_decoder_loss=0.256, over 28812.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.117, cr_loss=0.3588, attn_decoder_loss=0.241, over 5774378.97 frames. ], batch size: 104, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:34:53,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=12.0 2024-09-19 08:34:56,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=628840.0, ans=0.125 2024-09-19 08:35:11,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=628880.0, ans=0.2 2024-09-19 08:35:28,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=628920.0, ans=0.04949747468305833 2024-09-19 08:35:28,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=12.0 2024-09-19 08:35:33,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.57 vs. limit=15.0 2024-09-19 08:35:34,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=628960.0, ans=0.125 2024-09-19 08:35:43,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=628960.0, ans=0.2 2024-09-19 08:35:46,089 INFO [train.py:1198] (1/2) Epoch 35, batch 3400, loss[loss=0.2062, ctc_loss=0.1008, cr_loss=0.3295, attn_decoder_loss=0.2106, over 29325.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1173, cr_loss=0.3593, attn_decoder_loss=0.241, over 5767431.26 frames. ], batch size: 67, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:35:46,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=629000.0, ans=0.2 2024-09-19 08:35:51,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.98 vs. limit=8.0 2024-09-19 08:35:52,523 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:35:56,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=629000.0, ans=0.125 2024-09-19 08:36:05,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=629040.0, ans=0.2 2024-09-19 08:36:30,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=629120.0, ans=0.0 2024-09-19 08:36:44,225 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.920e+01 8.517e+01 9.055e+01 9.651e+01 2.142e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-19 08:36:45,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=629120.0, ans=0.125 2024-09-19 08:36:49,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2024-09-19 08:36:55,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2024-09-19 08:37:03,842 INFO [train.py:1198] (1/2) Epoch 35, batch 3450, loss[loss=0.2406, ctc_loss=0.1151, cr_loss=0.344, attn_decoder_loss=0.2469, over 28299.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.117, cr_loss=0.3589, attn_decoder_loss=0.2413, over 5775150.62 frames. ], batch size: 111, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:37:10,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=629200.0, ans=0.125 2024-09-19 08:37:10,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=629200.0, ans=0.125 2024-09-19 08:37:11,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-19 08:37:48,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=629320.0, ans=0.0 2024-09-19 08:38:08,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=629360.0, ans=0.0 2024-09-19 08:38:13,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=629360.0, ans=0.125 2024-09-19 08:38:21,939 INFO [train.py:1198] (1/2) Epoch 35, batch 3500, loss[loss=0.2118, ctc_loss=0.09745, cr_loss=0.3107, attn_decoder_loss=0.2176, over 29290.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.117, cr_loss=0.3589, attn_decoder_loss=0.241, over 5776448.94 frames. ], batch size: 71, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:38:56,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=629480.0, ans=0.025 2024-09-19 08:38:59,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=629480.0, ans=0.07 2024-09-19 08:39:04,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=629480.0, ans=0.0 2024-09-19 08:39:11,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2024-09-19 08:39:17,037 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.529e+01 8.957e+01 9.484e+01 1.276e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 08:39:32,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=629560.0, ans=10.0 2024-09-19 08:39:36,788 INFO [train.py:1198] (1/2) Epoch 35, batch 3550, loss[loss=0.2408, ctc_loss=0.1107, cr_loss=0.3493, attn_decoder_loss=0.2475, over 29715.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1169, cr_loss=0.359, attn_decoder_loss=0.2411, over 5782315.37 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:39:37,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2024-09-19 08:39:51,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=629640.0, ans=0.0 2024-09-19 08:40:50,972 INFO [train.py:1198] (1/2) Epoch 35, batch 3600, loss[loss=0.2345, ctc_loss=0.1125, cr_loss=0.3567, attn_decoder_loss=0.2402, over 29478.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.117, cr_loss=0.3592, attn_decoder_loss=0.2411, over 5791274.26 frames. ], batch size: 77, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 08:40:54,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=629800.0, ans=0.125 2024-09-19 08:41:10,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=629840.0, ans=0.1 2024-09-19 08:41:12,313 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:41:16,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=629840.0, ans=0.2 2024-09-19 08:41:16,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=629840.0, ans=0.0 2024-09-19 08:41:36,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2024-09-19 08:41:47,405 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.545e+01 9.030e+01 9.736e+01 4.485e+02, threshold=1.806e+02, percent-clipped=2.0 2024-09-19 08:41:49,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=629960.0, ans=0.125 2024-09-19 08:41:54,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=629960.0, ans=0.125 2024-09-19 08:41:58,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=629960.0, ans=0.2 2024-09-19 08:42:01,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=629960.0, ans=0.125 2024-09-19 08:42:07,119 INFO [train.py:1198] (1/2) Epoch 35, batch 3650, loss[loss=0.2552, ctc_loss=0.1335, cr_loss=0.3979, attn_decoder_loss=0.2599, over 29497.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1163, cr_loss=0.3575, attn_decoder_loss=0.2404, over 5794515.25 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:42:22,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=630040.0, ans=0.025 2024-09-19 08:42:29,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=630040.0, ans=0.125 2024-09-19 08:42:40,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=630080.0, ans=0.125 2024-09-19 08:43:03,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=630120.0, ans=0.2 2024-09-19 08:43:16,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=630160.0, ans=0.0 2024-09-19 08:43:21,875 INFO [train.py:1198] (1/2) Epoch 35, batch 3700, loss[loss=0.2494, ctc_loss=0.1279, cr_loss=0.3847, attn_decoder_loss=0.2543, over 29700.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1161, cr_loss=0.3574, attn_decoder_loss=0.2404, over 5804084.17 frames. ], batch size: 84, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:43:22,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630200.0, ans=0.1 2024-09-19 08:43:50,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630280.0, ans=0.1 2024-09-19 08:44:02,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=630280.0, ans=0.125 2024-09-19 08:44:05,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630280.0, ans=0.1 2024-09-19 08:44:08,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630320.0, ans=0.0 2024-09-19 08:44:08,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2024-09-19 08:44:11,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=630320.0, ans=0.125 2024-09-19 08:44:15,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=630320.0, ans=0.125 2024-09-19 08:44:19,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.511e+01 9.010e+01 9.557e+01 1.443e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 08:44:20,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=630320.0, ans=0.0 2024-09-19 08:44:20,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.55 vs. limit=15.0 2024-09-19 08:44:25,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.24 vs. limit=10.0 2024-09-19 08:44:36,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-19 08:44:38,050 INFO [train.py:1198] (1/2) Epoch 35, batch 3750, loss[loss=0.2035, ctc_loss=0.09671, cr_loss=0.3113, attn_decoder_loss=0.2084, over 29340.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.116, cr_loss=0.3571, attn_decoder_loss=0.2402, over 5807587.06 frames. ], batch size: 67, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:45:00,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=630440.0, ans=0.125 2024-09-19 08:45:03,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630440.0, ans=0.0 2024-09-19 08:45:13,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=630480.0, ans=0.125 2024-09-19 08:45:23,102 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:45:24,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=630520.0, ans=0.5 2024-09-19 08:45:47,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2024-09-19 08:45:52,361 INFO [train.py:1198] (1/2) Epoch 35, batch 3800, loss[loss=0.2322, ctc_loss=0.106, cr_loss=0.3431, attn_decoder_loss=0.2386, over 29653.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1165, cr_loss=0.3579, attn_decoder_loss=0.2401, over 5798620.58 frames. ], batch size: 86, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:45:58,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=630600.0, ans=0.125 2024-09-19 08:46:05,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.82 vs. limit=10.0 2024-09-19 08:46:05,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=630640.0, ans=0.1 2024-09-19 08:46:11,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2024-09-19 08:46:20,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.53 vs. limit=15.0 2024-09-19 08:46:22,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=630680.0, ans=0.125 2024-09-19 08:46:48,666 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.417e+01 9.020e+01 9.508e+01 1.354e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-19 08:46:57,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=630760.0, ans=0.05 2024-09-19 08:46:59,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2024-09-19 08:47:06,573 INFO [train.py:1198] (1/2) Epoch 35, batch 3850, loss[loss=0.2479, ctc_loss=0.1189, cr_loss=0.3707, attn_decoder_loss=0.254, over 29222.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1165, cr_loss=0.3582, attn_decoder_loss=0.2402, over 5812839.33 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:47:11,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630800.0, ans=0.1 2024-09-19 08:47:26,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=630840.0, ans=0.025 2024-09-19 08:47:32,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=630840.0, ans=0.125 2024-09-19 08:47:33,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=630840.0, ans=0.1 2024-09-19 08:47:33,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=630840.0, ans=0.0 2024-09-19 08:47:47,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=630880.0, ans=0.2 2024-09-19 08:47:48,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=630880.0, ans=0.125 2024-09-19 08:47:50,230 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:47:57,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=630920.0, ans=0.0 2024-09-19 08:48:00,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=630920.0, ans=0.2 2024-09-19 08:48:17,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=630960.0, ans=15.0 2024-09-19 08:48:18,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=630960.0, ans=0.04949747468305833 2024-09-19 08:48:21,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=631000.0, ans=0.125 2024-09-19 08:48:22,473 INFO [train.py:1198] (1/2) Epoch 35, batch 3900, loss[loss=0.2393, ctc_loss=0.1134, cr_loss=0.3619, attn_decoder_loss=0.2452, over 29618.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1167, cr_loss=0.3589, attn_decoder_loss=0.2407, over 5816856.53 frames. ], batch size: 86, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:48:48,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=631040.0, ans=0.125 2024-09-19 08:48:55,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=631080.0, ans=0.05 2024-09-19 08:48:57,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=631080.0, ans=0.125 2024-09-19 08:49:08,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=631120.0, ans=0.125 2024-09-19 08:49:15,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.02 vs. limit=10.0 2024-09-19 08:49:17,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631120.0, ans=0.1 2024-09-19 08:49:18,877 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.482e+01 8.961e+01 9.353e+01 1.224e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 08:49:19,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=631120.0, ans=0.125 2024-09-19 08:49:30,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=631160.0, ans=0.125 2024-09-19 08:49:38,529 INFO [train.py:1198] (1/2) Epoch 35, batch 3950, loss[loss=0.2472, ctc_loss=0.1254, cr_loss=0.3702, attn_decoder_loss=0.2525, over 29454.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1165, cr_loss=0.3583, attn_decoder_loss=0.2409, over 5836104.35 frames. ], batch size: 97, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:49:40,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=631200.0, ans=0.125 2024-09-19 08:50:02,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=631240.0, ans=0.125 2024-09-19 08:50:02,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=631240.0, ans=0.0 2024-09-19 08:50:18,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=631280.0, ans=0.125 2024-09-19 08:50:36,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-19 08:50:45,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=631360.0, ans=0.07 2024-09-19 08:50:45,369 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:50:52,168 INFO [train.py:1198] (1/2) Epoch 35, batch 4000, loss[loss=0.2179, ctc_loss=0.1011, cr_loss=0.337, attn_decoder_loss=0.2234, over 29548.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1168, cr_loss=0.3589, attn_decoder_loss=0.2409, over 5814083.82 frames. ], batch size: 74, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 08:50:58,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=631400.0, ans=0.2 2024-09-19 08:51:25,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=631480.0, ans=0.0 2024-09-19 08:51:38,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=631520.0, ans=0.125 2024-09-19 08:51:50,542 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.935e+01 8.609e+01 9.049e+01 9.611e+01 2.994e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 08:52:06,768 INFO [train.py:1198] (1/2) Epoch 35, batch 4050, loss[loss=0.2563, ctc_loss=0.1465, cr_loss=0.3966, attn_decoder_loss=0.2597, over 20498.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1163, cr_loss=0.3578, attn_decoder_loss=0.2405, over 5798511.82 frames. ], batch size: 209, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:52:40,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=631680.0, ans=0.0 2024-09-19 08:52:54,887 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:53:03,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=631720.0, ans=0.125 2024-09-19 08:53:08,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=631760.0, ans=0.125 2024-09-19 08:53:21,185 INFO [train.py:1198] (1/2) Epoch 35, batch 4100, loss[loss=0.2525, ctc_loss=0.1324, cr_loss=0.393, attn_decoder_loss=0.2571, over 29510.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1168, cr_loss=0.3586, attn_decoder_loss=0.2408, over 5793533.21 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:53:28,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=631800.0, ans=0.125 2024-09-19 08:54:00,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=631880.0, ans=0.125 2024-09-19 08:54:02,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=631880.0, ans=0.0 2024-09-19 08:54:03,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=631920.0, ans=0.5 2024-09-19 08:54:13,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=15.0 2024-09-19 08:54:19,449 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.556e+01 9.136e+01 9.776e+01 2.394e+02, threshold=1.827e+02, percent-clipped=3.0 2024-09-19 08:54:19,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=631960.0, ans=0.05 2024-09-19 08:54:21,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=631960.0, ans=0.1 2024-09-19 08:54:26,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.89 vs. limit=22.5 2024-09-19 08:54:34,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=632000.0, ans=0.2 2024-09-19 08:54:36,190 INFO [train.py:1198] (1/2) Epoch 35, batch 4150, loss[loss=0.2327, ctc_loss=0.1186, cr_loss=0.3704, attn_decoder_loss=0.2371, over 29516.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1166, cr_loss=0.3577, attn_decoder_loss=0.2405, over 5800535.74 frames. ], batch size: 77, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:54:42,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632000.0, ans=0.1 2024-09-19 08:55:08,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=632080.0, ans=0.125 2024-09-19 08:55:11,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=632080.0, ans=0.0 2024-09-19 08:55:26,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=632120.0, ans=0.2 2024-09-19 08:55:30,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=632120.0, ans=0.025 2024-09-19 08:55:49,859 INFO [train.py:1198] (1/2) Epoch 35, batch 4200, loss[loss=0.2477, ctc_loss=0.1394, cr_loss=0.398, attn_decoder_loss=0.2508, over 29513.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1166, cr_loss=0.3579, attn_decoder_loss=0.2408, over 5802238.38 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:56:12,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=632240.0, ans=0.125 2024-09-19 08:56:36,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632320.0, ans=0.1 2024-09-19 08:56:39,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=632320.0, ans=0.025 2024-09-19 08:56:42,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=632320.0, ans=0.125 2024-09-19 08:56:48,205 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.470e+01 8.972e+01 9.495e+01 2.308e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-19 08:56:51,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=632360.0, ans=0.0 2024-09-19 08:56:51,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=632360.0, ans=0.0 2024-09-19 08:57:04,334 INFO [train.py:1198] (1/2) Epoch 35, batch 4250, loss[loss=0.2181, ctc_loss=0.101, cr_loss=0.3227, attn_decoder_loss=0.224, over 29512.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1162, cr_loss=0.357, attn_decoder_loss=0.2407, over 5806676.66 frames. ], batch size: 74, lr: 3.13e-03, grad_scale: 8.0 2024-09-19 08:57:14,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=632400.0, ans=0.5 2024-09-19 08:57:16,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.07 vs. limit=6.0 2024-09-19 08:57:19,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=632440.0, ans=0.0 2024-09-19 08:57:25,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=22.5 2024-09-19 08:58:13,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=632560.0, ans=0.125 2024-09-19 08:58:17,922 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:58:17,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=632600.0, ans=0.125 2024-09-19 08:58:19,080 INFO [train.py:1198] (1/2) Epoch 35, batch 4300, loss[loss=0.2388, ctc_loss=0.1114, cr_loss=0.3514, attn_decoder_loss=0.2452, over 29503.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1159, cr_loss=0.3567, attn_decoder_loss=0.2407, over 5795939.72 frames. ], batch size: 87, lr: 3.13e-03, grad_scale: 8.0 2024-09-19 08:58:20,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=632600.0, ans=0.0 2024-09-19 08:58:52,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=632680.0, ans=10.0 2024-09-19 08:59:01,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=632680.0, ans=0.125 2024-09-19 08:59:02,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=632720.0, ans=0.0 2024-09-19 08:59:13,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2024-09-19 08:59:17,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.841e+01 9.230e+01 9.936e+01 2.115e+02, threshold=1.846e+02, percent-clipped=2.0 2024-09-19 08:59:17,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632760.0, ans=0.1 2024-09-19 08:59:34,556 INFO [train.py:1198] (1/2) Epoch 35, batch 4350, loss[loss=0.2423, ctc_loss=0.1216, cr_loss=0.3655, attn_decoder_loss=0.2475, over 29465.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1186, cr_loss=0.362, attn_decoder_loss=0.244, over 5798759.88 frames. ], batch size: 97, lr: 3.13e-03, grad_scale: 8.0 2024-09-19 08:59:40,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=632800.0, ans=0.125 2024-09-19 08:59:45,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632800.0, ans=0.1 2024-09-19 08:59:53,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=632840.0, ans=0.025 2024-09-19 09:00:17,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-09-19 09:00:27,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=22.5 2024-09-19 09:00:47,707 INFO [train.py:1198] (1/2) Epoch 35, batch 4400, loss[loss=0.2465, ctc_loss=0.1242, cr_loss=0.3801, attn_decoder_loss=0.2517, over 27331.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1201, cr_loss=0.3651, attn_decoder_loss=0.2464, over 5767745.18 frames. ], batch size: 124, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 09:00:48,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=633000.0, ans=0.125 2024-09-19 09:00:53,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=633000.0, ans=0.025 2024-09-19 09:00:56,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.43 vs. limit=10.0 2024-09-19 09:01:03,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633040.0, ans=0.1 2024-09-19 09:01:10,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=633040.0, ans=0.1 2024-09-19 09:01:45,575 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.031e+01 8.931e+01 9.450e+01 9.933e+01 1.920e+02, threshold=1.890e+02, percent-clipped=1.0 2024-09-19 09:02:02,863 INFO [train.py:1198] (1/2) Epoch 35, batch 4450, loss[loss=0.2521, ctc_loss=0.1409, cr_loss=0.3679, attn_decoder_loss=0.2563, over 20280.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1235, cr_loss=0.3699, attn_decoder_loss=0.2485, over 5577439.41 frames. ], batch size: 209, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 09:02:18,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=633240.0, ans=0.025 2024-09-19 09:02:46,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=633320.0, ans=0.1 2024-09-19 09:02:56,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.66 vs. limit=10.0 2024-09-19 09:03:15,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=633360.0, ans=0.2 2024-09-19 09:03:17,988 INFO [train.py:1198] (1/2) Epoch 35, batch 4500, loss[loss=0.2556, ctc_loss=0.1406, cr_loss=0.3761, attn_decoder_loss=0.26, over 20381.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1269, cr_loss=0.3724, attn_decoder_loss=0.2505, over 5235461.09 frames. ], batch size: 209, lr: 3.13e-03, grad_scale: 8.0 2024-09-19 09:03:19,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=633400.0, ans=0.125 2024-09-19 09:03:31,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=633440.0, ans=0.125 2024-09-19 09:04:27,601 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:04:41,123 INFO [train.py:1198] (1/2) Epoch 36, batch 0, loss[loss=0.2228, ctc_loss=0.1012, cr_loss=0.3353, attn_decoder_loss=0.2288, over 29594.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1012, cr_loss=0.3353, attn_decoder_loss=0.2288, over 29594.00 frames. ], batch size: 73, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:04:41,123 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 09:04:49,391 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2378, 3.8471, 4.1154, 3.7352], device='cuda:1') 2024-09-19 09:04:56,230 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1307, 4.3156, 4.5002, 4.7744], device='cuda:1') 2024-09-19 09:04:59,473 INFO [train.py:1230] (1/2) Epoch 36, validation: loss=0.2129, ctc_loss=0.03662, cr_loss=5.743e-15, attn_decoder_loss=0.2325, over 944034.00 frames. 2024-09-19 09:04:59,473 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 09:05:02,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=633500.0, ans=0.0 2024-09-19 09:05:08,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=633500.0, ans=0.0 2024-09-19 09:05:08,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=633500.0, ans=0.2 2024-09-19 09:05:11,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=633500.0, ans=0.2 2024-09-19 09:05:17,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=633540.0, ans=0.1 2024-09-19 09:05:18,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.68 vs. limit=15.0 2024-09-19 09:05:22,046 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 1.073e+02 1.144e+02 1.210e+02 8.768e+02, threshold=2.289e+02, percent-clipped=4.0 2024-09-19 09:05:28,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=22.5 2024-09-19 09:05:59,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633660.0, ans=0.1 2024-09-19 09:06:00,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=633660.0, ans=0.125 2024-09-19 09:06:06,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633660.0, ans=0.1 2024-09-19 09:06:06,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=633660.0, ans=0.05 2024-09-19 09:06:08,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=12.0 2024-09-19 09:06:12,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.92 vs. limit=6.0 2024-09-19 09:06:15,569 INFO [train.py:1198] (1/2) Epoch 36, batch 50, loss[loss=0.2122, ctc_loss=0.1022, cr_loss=0.3304, attn_decoder_loss=0.2171, over 29454.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1193, cr_loss=0.3647, attn_decoder_loss=0.2424, over 1267863.35 frames. ], batch size: 70, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:06:49,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=633780.0, ans=0.125 2024-09-19 09:06:52,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=633780.0, ans=0.0 2024-09-19 09:06:53,722 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2024-09-19 09:07:22,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=633860.0, ans=0.0 2024-09-19 09:07:32,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=633860.0, ans=0.125 2024-09-19 09:07:35,511 INFO [train.py:1198] (1/2) Epoch 36, batch 100, loss[loss=0.2192, ctc_loss=0.1102, cr_loss=0.346, attn_decoder_loss=0.2236, over 29547.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1191, cr_loss=0.3634, attn_decoder_loss=0.2435, over 2253480.06 frames. ], batch size: 76, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:07:47,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=633900.0, ans=0.1 2024-09-19 09:07:47,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=633900.0, ans=0.0 2024-09-19 09:07:57,926 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.630e+01 9.046e+01 9.825e+01 1.723e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 09:08:18,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.27 vs. limit=10.0 2024-09-19 09:08:19,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=634020.0, ans=0.125 2024-09-19 09:08:42,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=634060.0, ans=0.125 2024-09-19 09:08:50,183 INFO [train.py:1198] (1/2) Epoch 36, batch 150, loss[loss=0.2141, ctc_loss=0.1044, cr_loss=0.3443, attn_decoder_loss=0.2186, over 29448.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1176, cr_loss=0.3604, attn_decoder_loss=0.2418, over 3048518.25 frames. ], batch size: 70, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:08:52,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=634100.0, ans=0.125 2024-09-19 09:09:05,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=634140.0, ans=0.0 2024-09-19 09:09:22,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634180.0, ans=0.1 2024-09-19 09:10:04,912 INFO [train.py:1198] (1/2) Epoch 36, batch 200, loss[loss=0.2543, ctc_loss=0.1304, cr_loss=0.3941, attn_decoder_loss=0.2593, over 27623.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1167, cr_loss=0.3583, attn_decoder_loss=0.2407, over 3661735.45 frames. ], batch size: 125, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:10:08,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=634300.0, ans=0.125 2024-09-19 09:10:26,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634340.0, ans=0.1 2024-09-19 09:10:29,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.423e+01 8.790e+01 9.226e+01 1.100e+02, threshold=1.758e+02, percent-clipped=0.0 2024-09-19 09:10:36,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.48 vs. limit=15.0 2024-09-19 09:11:10,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634460.0, ans=0.1 2024-09-19 09:11:23,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=634460.0, ans=0.0 2024-09-19 09:11:25,661 INFO [train.py:1198] (1/2) Epoch 36, batch 250, loss[loss=0.2492, ctc_loss=0.1317, cr_loss=0.3926, attn_decoder_loss=0.2535, over 29266.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1172, cr_loss=0.3597, attn_decoder_loss=0.2411, over 4142316.01 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:11:49,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=634540.0, ans=0.0 2024-09-19 09:11:52,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=634540.0, ans=0.125 2024-09-19 09:12:06,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=634580.0, ans=0.1 2024-09-19 09:12:08,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-09-19 09:12:40,877 INFO [train.py:1198] (1/2) Epoch 36, batch 300, loss[loss=0.2437, ctc_loss=0.1179, cr_loss=0.3748, attn_decoder_loss=0.2494, over 29530.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1167, cr_loss=0.359, attn_decoder_loss=0.2404, over 4510055.94 frames. ], batch size: 92, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:12:51,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=634700.0, ans=10.0 2024-09-19 09:12:54,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=634740.0, ans=0.0 2024-09-19 09:12:59,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=634740.0, ans=10.0 2024-09-19 09:13:04,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.698e+01 9.076e+01 9.667e+01 1.639e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-19 09:13:12,543 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:13:12,702 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:13:12,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634780.0, ans=0.1 2024-09-19 09:13:21,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=634780.0, ans=0.0 2024-09-19 09:13:50,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=634860.0, ans=0.125 2024-09-19 09:13:56,338 INFO [train.py:1198] (1/2) Epoch 36, batch 350, loss[loss=0.2, ctc_loss=0.09005, cr_loss=0.2937, attn_decoder_loss=0.2057, over 29327.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1172, cr_loss=0.3601, attn_decoder_loss=0.2412, over 4795753.29 frames. ], batch size: 71, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:14:06,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=634900.0, ans=0.125 2024-09-19 09:14:17,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=634940.0, ans=0.1 2024-09-19 09:14:21,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634940.0, ans=0.1 2024-09-19 09:14:26,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=634940.0, ans=0.125 2024-09-19 09:14:38,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=634980.0, ans=0.5 2024-09-19 09:15:04,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=635060.0, ans=0.125 2024-09-19 09:15:13,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=635060.0, ans=0.125 2024-09-19 09:15:16,546 INFO [train.py:1198] (1/2) Epoch 36, batch 400, loss[loss=0.2415, ctc_loss=0.1109, cr_loss=0.3499, attn_decoder_loss=0.2482, over 29723.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1171, cr_loss=0.3604, attn_decoder_loss=0.2411, over 5025991.75 frames. ], batch size: 82, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:15:18,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=635100.0, ans=0.125 2024-09-19 09:15:23,005 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:15:40,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.564e+01 9.194e+01 9.781e+01 3.536e+02, threshold=1.839e+02, percent-clipped=4.0 2024-09-19 09:15:51,931 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:16:19,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=635260.0, ans=0.0 2024-09-19 09:16:21,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=635260.0, ans=0.2 2024-09-19 09:16:25,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=635260.0, ans=0.125 2024-09-19 09:16:33,031 INFO [train.py:1198] (1/2) Epoch 36, batch 450, loss[loss=0.2465, ctc_loss=0.1172, cr_loss=0.3597, attn_decoder_loss=0.2528, over 29700.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1167, cr_loss=0.3591, attn_decoder_loss=0.2411, over 5187723.64 frames. ], batch size: 83, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:17:15,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=635380.0, ans=0.05 2024-09-19 09:17:20,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=635420.0, ans=0.125 2024-09-19 09:17:34,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2024-09-19 09:17:48,989 INFO [train.py:1198] (1/2) Epoch 36, batch 500, loss[loss=0.252, ctc_loss=0.1379, cr_loss=0.4027, attn_decoder_loss=0.2558, over 29412.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1163, cr_loss=0.3577, attn_decoder_loss=0.2401, over 5330905.18 frames. ], batch size: 94, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:17:50,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=635500.0, ans=0.07 2024-09-19 09:17:50,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=635500.0, ans=0.125 2024-09-19 09:17:54,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-09-19 09:17:59,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2024-09-19 09:18:13,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.310e+01 8.819e+01 9.519e+01 1.597e+02, threshold=1.764e+02, percent-clipped=0.0 2024-09-19 09:18:20,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=635580.0, ans=0.125 2024-09-19 09:18:23,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635580.0, ans=0.125 2024-09-19 09:18:24,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=635580.0, ans=0.0 2024-09-19 09:18:40,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=635620.0, ans=0.0 2024-09-19 09:18:52,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=635660.0, ans=0.125 2024-09-19 09:18:54,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=635660.0, ans=0.125 2024-09-19 09:19:01,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-19 09:19:05,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-09-19 09:19:08,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=635700.0, ans=0.1 2024-09-19 09:19:09,304 INFO [train.py:1198] (1/2) Epoch 36, batch 550, loss[loss=0.2549, ctc_loss=0.1349, cr_loss=0.396, attn_decoder_loss=0.2594, over 28755.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1165, cr_loss=0.3582, attn_decoder_loss=0.2402, over 5424239.34 frames. ], batch size: 104, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:19:20,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=635700.0, ans=0.125 2024-09-19 09:19:36,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=635740.0, ans=0.0 2024-09-19 09:20:01,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2024-09-19 09:20:24,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.48 vs. limit=22.5 2024-09-19 09:20:25,473 INFO [train.py:1198] (1/2) Epoch 36, batch 600, loss[loss=0.2425, ctc_loss=0.1181, cr_loss=0.3738, attn_decoder_loss=0.248, over 29249.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1167, cr_loss=0.3593, attn_decoder_loss=0.2407, over 5510754.33 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:20:27,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=635900.0, ans=0.0 2024-09-19 09:20:36,713 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2024-09-19 09:20:37,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=635900.0, ans=0.07 2024-09-19 09:20:41,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.13 vs. limit=10.0 2024-09-19 09:20:45,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=635940.0, ans=0.1 2024-09-19 09:20:49,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=635940.0, ans=0.025 2024-09-19 09:20:50,745 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.519e+01 9.044e+01 9.582e+01 1.949e+02, threshold=1.809e+02, percent-clipped=1.0 2024-09-19 09:21:08,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.25 vs. limit=15.0 2024-09-19 09:21:12,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=636020.0, ans=0.2 2024-09-19 09:21:14,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=636020.0, ans=0.125 2024-09-19 09:21:18,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=636020.0, ans=0.025 2024-09-19 09:21:33,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=636060.0, ans=0.2 2024-09-19 09:21:40,525 INFO [train.py:1198] (1/2) Epoch 36, batch 650, loss[loss=0.2288, ctc_loss=0.1025, cr_loss=0.3264, attn_decoder_loss=0.2356, over 29758.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1157, cr_loss=0.357, attn_decoder_loss=0.2398, over 5587358.68 frames. ], batch size: 81, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:21:47,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.62 vs. limit=15.0 2024-09-19 09:21:49,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=636100.0, ans=0.125 2024-09-19 09:21:51,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2024-09-19 09:22:02,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=636140.0, ans=0.0 2024-09-19 09:22:06,608 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:22:10,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=636180.0, ans=0.1 2024-09-19 09:22:25,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=636180.0, ans=0.125 2024-09-19 09:22:45,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-19 09:22:55,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.21 vs. limit=22.5 2024-09-19 09:23:00,742 INFO [train.py:1198] (1/2) Epoch 36, batch 700, loss[loss=0.2332, ctc_loss=0.1117, cr_loss=0.3615, attn_decoder_loss=0.2387, over 29544.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1162, cr_loss=0.3582, attn_decoder_loss=0.2404, over 5637703.30 frames. ], batch size: 76, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:23:05,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636300.0, ans=0.1 2024-09-19 09:23:20,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.17 vs. limit=10.0 2024-09-19 09:23:26,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.520e+01 8.919e+01 9.430e+01 1.206e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 09:23:31,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.53 vs. limit=22.5 2024-09-19 09:23:45,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=636420.0, ans=0.125 2024-09-19 09:23:47,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=636420.0, ans=0.0 2024-09-19 09:24:00,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=636460.0, ans=0.125 2024-09-19 09:24:03,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2024-09-19 09:24:16,393 INFO [train.py:1198] (1/2) Epoch 36, batch 750, loss[loss=0.2436, ctc_loss=0.117, cr_loss=0.3682, attn_decoder_loss=0.2495, over 29700.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.116, cr_loss=0.3575, attn_decoder_loss=0.24, over 5676424.56 frames. ], batch size: 82, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:24:33,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636540.0, ans=0.1 2024-09-19 09:24:36,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=636540.0, ans=0.025 2024-09-19 09:24:42,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=636540.0, ans=0.0 2024-09-19 09:24:50,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.46 vs. limit=15.0 2024-09-19 09:25:12,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=636620.0, ans=0.025 2024-09-19 09:25:29,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.57 vs. limit=15.0 2024-09-19 09:25:31,852 INFO [train.py:1198] (1/2) Epoch 36, batch 800, loss[loss=0.2111, ctc_loss=0.1034, cr_loss=0.3393, attn_decoder_loss=0.2155, over 29622.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1161, cr_loss=0.358, attn_decoder_loss=0.2401, over 5708062.76 frames. ], batch size: 73, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:25:38,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.41 vs. limit=15.0 2024-09-19 09:25:56,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=636740.0, ans=0.125 2024-09-19 09:25:57,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.447e+01 8.844e+01 9.388e+01 5.453e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-19 09:26:25,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=636820.0, ans=0.2 2024-09-19 09:26:38,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=636860.0, ans=0.0 2024-09-19 09:26:41,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=636860.0, ans=0.125 2024-09-19 09:26:52,610 INFO [train.py:1198] (1/2) Epoch 36, batch 850, loss[loss=0.2459, ctc_loss=0.1247, cr_loss=0.3976, attn_decoder_loss=0.2505, over 29711.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1158, cr_loss=0.3576, attn_decoder_loss=0.2398, over 5737016.40 frames. ], batch size: 89, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:27:01,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=636900.0, ans=0.125 2024-09-19 09:27:17,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.28 vs. limit=12.0 2024-09-19 09:27:35,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=636980.0, ans=0.125 2024-09-19 09:27:46,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.79 vs. limit=10.0 2024-09-19 09:27:48,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=637020.0, ans=0.125 2024-09-19 09:28:02,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=637060.0, ans=0.125 2024-09-19 09:28:02,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=637060.0, ans=0.0 2024-09-19 09:28:08,137 INFO [train.py:1198] (1/2) Epoch 36, batch 900, loss[loss=0.2144, ctc_loss=0.1018, cr_loss=0.3259, attn_decoder_loss=0.2196, over 29604.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1157, cr_loss=0.357, attn_decoder_loss=0.2402, over 5741795.34 frames. ], batch size: 73, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:28:18,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=22.5 2024-09-19 09:28:34,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.83 vs. limit=10.0 2024-09-19 09:28:35,110 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.598e+01 8.959e+01 9.567e+01 2.745e+02, threshold=1.792e+02, percent-clipped=2.0 2024-09-19 09:28:44,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=637180.0, ans=0.09899494936611666 2024-09-19 09:28:55,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=637220.0, ans=0.125 2024-09-19 09:28:56,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=637220.0, ans=0.125 2024-09-19 09:28:58,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=637220.0, ans=0.125 2024-09-19 09:29:00,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=637220.0, ans=0.125 2024-09-19 09:29:04,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=637220.0, ans=0.2 2024-09-19 09:29:23,684 INFO [train.py:1198] (1/2) Epoch 36, batch 950, loss[loss=0.2235, ctc_loss=0.1029, cr_loss=0.3257, attn_decoder_loss=0.2297, over 29488.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1155, cr_loss=0.3565, attn_decoder_loss=0.2403, over 5744737.44 frames. ], batch size: 74, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:29:41,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2024-09-19 09:29:42,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=637340.0, ans=0.1 2024-09-19 09:29:50,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.96 vs. limit=10.0 2024-09-19 09:30:13,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=637420.0, ans=0.125 2024-09-19 09:30:28,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=7.39 vs. limit=12.0 2024-09-19 09:30:36,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=637460.0, ans=0.125 2024-09-19 09:30:43,636 INFO [train.py:1198] (1/2) Epoch 36, batch 1000, loss[loss=0.236, ctc_loss=0.1219, cr_loss=0.3722, attn_decoder_loss=0.2404, over 29539.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1165, cr_loss=0.3582, attn_decoder_loss=0.241, over 5737221.78 frames. ], batch size: 77, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:30:43,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=637500.0, ans=0.125 2024-09-19 09:31:03,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=637540.0, ans=0.125 2024-09-19 09:31:11,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.580e+01 9.134e+01 9.845e+01 2.020e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-19 09:31:50,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=637660.0, ans=0.125 2024-09-19 09:31:54,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=637660.0, ans=0.0 2024-09-19 09:31:59,734 INFO [train.py:1198] (1/2) Epoch 36, batch 1050, loss[loss=0.2337, ctc_loss=0.1117, cr_loss=0.3559, attn_decoder_loss=0.2393, over 29670.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1159, cr_loss=0.3573, attn_decoder_loss=0.2401, over 5746955.02 frames. ], batch size: 85, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:32:32,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-19 09:32:59,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=637860.0, ans=0.125 2024-09-19 09:33:07,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=637860.0, ans=0.0 2024-09-19 09:33:15,958 INFO [train.py:1198] (1/2) Epoch 36, batch 1100, loss[loss=0.2318, ctc_loss=0.1154, cr_loss=0.3314, attn_decoder_loss=0.2373, over 29460.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1154, cr_loss=0.3559, attn_decoder_loss=0.2397, over 5758668.12 frames. ], batch size: 78, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:33:40,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=637940.0, ans=0.2 2024-09-19 09:33:43,101 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.368e+01 8.851e+01 9.380e+01 2.140e+02, threshold=1.770e+02, percent-clipped=1.0 2024-09-19 09:33:48,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=637980.0, ans=0.125 2024-09-19 09:33:55,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=637980.0, ans=0.125 2024-09-19 09:34:06,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=638020.0, ans=10.0 2024-09-19 09:34:35,865 INFO [train.py:1198] (1/2) Epoch 36, batch 1150, loss[loss=0.234, ctc_loss=0.1132, cr_loss=0.3429, attn_decoder_loss=0.2398, over 29459.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1159, cr_loss=0.3568, attn_decoder_loss=0.2401, over 5757345.14 frames. ], batch size: 78, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:34:37,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=638100.0, ans=0.125 2024-09-19 09:34:42,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=638100.0, ans=0.125 2024-09-19 09:34:45,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=638100.0, ans=0.2 2024-09-19 09:34:46,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=638100.0, ans=0.2 2024-09-19 09:34:51,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638140.0, ans=0.1 2024-09-19 09:34:54,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.39 vs. limit=15.0 2024-09-19 09:35:00,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=638140.0, ans=0.125 2024-09-19 09:35:00,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638140.0, ans=0.1 2024-09-19 09:35:10,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.42 vs. limit=15.0 2024-09-19 09:35:11,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638180.0, ans=0.1 2024-09-19 09:35:18,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.77 vs. limit=12.0 2024-09-19 09:35:38,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=638260.0, ans=0.125 2024-09-19 09:35:46,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=638260.0, ans=0.0 2024-09-19 09:35:51,885 INFO [train.py:1198] (1/2) Epoch 36, batch 1200, loss[loss=0.2515, ctc_loss=0.1244, cr_loss=0.3686, attn_decoder_loss=0.2574, over 29696.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1165, cr_loss=0.3575, attn_decoder_loss=0.241, over 5749290.30 frames. ], batch size: 85, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:36:10,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=638340.0, ans=0.125 2024-09-19 09:36:19,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.630e+01 9.163e+01 9.879e+01 2.531e+02, threshold=1.833e+02, percent-clipped=3.0 2024-09-19 09:36:25,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=638380.0, ans=0.0 2024-09-19 09:36:32,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638380.0, ans=0.1 2024-09-19 09:36:40,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-09-19 09:37:08,426 INFO [train.py:1198] (1/2) Epoch 36, batch 1250, loss[loss=0.2472, ctc_loss=0.1302, cr_loss=0.3954, attn_decoder_loss=0.2514, over 29510.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.117, cr_loss=0.3589, attn_decoder_loss=0.2416, over 5776197.72 frames. ], batch size: 92, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:37:11,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=638500.0, ans=0.0 2024-09-19 09:37:37,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=638580.0, ans=0.125 2024-09-19 09:37:50,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2024-09-19 09:37:55,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=638620.0, ans=0.125 2024-09-19 09:38:10,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.99 vs. limit=12.0 2024-09-19 09:38:14,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=638660.0, ans=0.125 2024-09-19 09:38:26,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=638660.0, ans=0.125 2024-09-19 09:38:29,053 INFO [train.py:1198] (1/2) Epoch 36, batch 1300, loss[loss=0.2455, ctc_loss=0.12, cr_loss=0.3574, attn_decoder_loss=0.2515, over 28172.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1166, cr_loss=0.3583, attn_decoder_loss=0.241, over 5779619.06 frames. ], batch size: 111, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:38:35,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=638700.0, ans=0.95 2024-09-19 09:38:52,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=638740.0, ans=0.2 2024-09-19 09:38:56,431 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.178e+01 8.817e+01 9.661e+01 1.409e+02, threshold=1.763e+02, percent-clipped=0.0 2024-09-19 09:39:07,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=638780.0, ans=0.025 2024-09-19 09:39:17,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=22.5 2024-09-19 09:39:30,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=638860.0, ans=0.1 2024-09-19 09:39:31,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=638860.0, ans=0.025 2024-09-19 09:39:45,480 INFO [train.py:1198] (1/2) Epoch 36, batch 1350, loss[loss=0.2369, ctc_loss=0.1087, cr_loss=0.3499, attn_decoder_loss=0.2434, over 29750.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1161, cr_loss=0.3572, attn_decoder_loss=0.2407, over 5795258.95 frames. ], batch size: 81, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:39:50,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638900.0, ans=0.1 2024-09-19 09:39:53,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=638900.0, ans=0.125 2024-09-19 09:40:14,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2024-09-19 09:40:17,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=638980.0, ans=0.125 2024-09-19 09:40:29,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=639020.0, ans=0.125 2024-09-19 09:40:29,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=639020.0, ans=0.125 2024-09-19 09:40:32,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=639020.0, ans=0.0 2024-09-19 09:40:33,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=15.0 2024-09-19 09:40:35,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=639020.0, ans=0.0 2024-09-19 09:40:46,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.74 vs. limit=15.0 2024-09-19 09:40:53,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639060.0, ans=0.1 2024-09-19 09:41:00,602 INFO [train.py:1198] (1/2) Epoch 36, batch 1400, loss[loss=0.2064, ctc_loss=0.09397, cr_loss=0.3096, attn_decoder_loss=0.2121, over 29585.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.116, cr_loss=0.3573, attn_decoder_loss=0.2403, over 5806735.86 frames. ], batch size: 69, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:41:15,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=639140.0, ans=0.125 2024-09-19 09:41:18,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=639140.0, ans=0.1 2024-09-19 09:41:27,784 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.347e+01 8.417e+01 9.024e+01 9.500e+01 1.848e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-19 09:41:35,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=639180.0, ans=0.125 2024-09-19 09:42:02,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=639260.0, ans=0.025 2024-09-19 09:42:18,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=12.0 2024-09-19 09:42:20,993 INFO [train.py:1198] (1/2) Epoch 36, batch 1450, loss[loss=0.2461, ctc_loss=0.1249, cr_loss=0.3582, attn_decoder_loss=0.2516, over 29457.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1161, cr_loss=0.357, attn_decoder_loss=0.2407, over 5804129.38 frames. ], batch size: 94, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:42:37,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=639340.0, ans=0.125 2024-09-19 09:42:46,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=639340.0, ans=0.0 2024-09-19 09:42:47,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=12.0 2024-09-19 09:43:21,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=639460.0, ans=0.2 2024-09-19 09:43:24,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=639460.0, ans=0.0 2024-09-19 09:43:36,445 INFO [train.py:1198] (1/2) Epoch 36, batch 1500, loss[loss=0.2364, ctc_loss=0.1125, cr_loss=0.3574, attn_decoder_loss=0.2422, over 29645.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1161, cr_loss=0.357, attn_decoder_loss=0.2409, over 5805335.60 frames. ], batch size: 86, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:43:57,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=639540.0, ans=0.0 2024-09-19 09:44:03,553 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.635e+01 9.112e+01 9.549e+01 2.206e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 09:44:11,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2024-09-19 09:44:16,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=639580.0, ans=0.0 2024-09-19 09:44:26,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=639620.0, ans=0.2 2024-09-19 09:44:32,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=639620.0, ans=0.125 2024-09-19 09:44:34,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=639620.0, ans=0.125 2024-09-19 09:44:40,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=639660.0, ans=0.125 2024-09-19 09:44:44,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-19 09:44:52,262 INFO [train.py:1198] (1/2) Epoch 36, batch 1550, loss[loss=0.2593, ctc_loss=0.1366, cr_loss=0.4165, attn_decoder_loss=0.2637, over 29524.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1166, cr_loss=0.3584, attn_decoder_loss=0.2412, over 5781225.00 frames. ], batch size: 90, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:44:54,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=639700.0, ans=0.035 2024-09-19 09:44:54,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=639700.0, ans=0.125 2024-09-19 09:45:01,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=639700.0, ans=0.125 2024-09-19 09:45:03,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=639700.0, ans=0.0 2024-09-19 09:45:13,731 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:45:46,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=22.5 2024-09-19 09:45:50,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=639820.0, ans=0.025 2024-09-19 09:46:00,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=639860.0, ans=0.04949747468305833 2024-09-19 09:46:11,768 INFO [train.py:1198] (1/2) Epoch 36, batch 1600, loss[loss=0.2441, ctc_loss=0.1206, cr_loss=0.3732, attn_decoder_loss=0.2495, over 29658.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1164, cr_loss=0.3579, attn_decoder_loss=0.241, over 5764159.46 frames. ], batch size: 85, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:46:18,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.48 vs. limit=15.0 2024-09-19 09:46:42,000 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.623e+01 9.307e+01 9.759e+01 1.491e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-19 09:46:43,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-09-19 09:46:59,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.31 vs. limit=15.0 2024-09-19 09:47:01,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=639980.0, ans=0.125 2024-09-19 09:47:11,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=640020.0, ans=0.0 2024-09-19 09:47:34,966 INFO [train.py:1198] (1/2) Epoch 36, batch 1650, loss[loss=0.247, ctc_loss=0.1214, cr_loss=0.3694, attn_decoder_loss=0.2528, over 29735.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1164, cr_loss=0.3578, attn_decoder_loss=0.2409, over 5758348.96 frames. ], batch size: 89, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:47:41,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=640100.0, ans=0.125 2024-09-19 09:47:42,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=640100.0, ans=0.0 2024-09-19 09:47:42,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=640100.0, ans=0.125 2024-09-19 09:47:54,748 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:47:59,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=640140.0, ans=0.0 2024-09-19 09:48:16,097 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:48:22,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=640220.0, ans=0.0 2024-09-19 09:48:44,474 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:48:50,206 INFO [train.py:1198] (1/2) Epoch 36, batch 1700, loss[loss=0.211, ctc_loss=0.09741, cr_loss=0.3116, attn_decoder_loss=0.2167, over 29575.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1163, cr_loss=0.3576, attn_decoder_loss=0.2407, over 5780386.92 frames. ], batch size: 69, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:48:59,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=640300.0, ans=0.125 2024-09-19 09:49:01,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=15.0 2024-09-19 09:49:13,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=640340.0, ans=0.95 2024-09-19 09:49:20,274 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.508e+01 8.971e+01 9.480e+01 1.290e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 09:49:42,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=640420.0, ans=0.125 2024-09-19 09:49:53,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=640460.0, ans=0.2 2024-09-19 09:50:03,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=640460.0, ans=0.125 2024-09-19 09:50:03,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=640460.0, ans=0.125 2024-09-19 09:50:10,193 INFO [train.py:1198] (1/2) Epoch 36, batch 1750, loss[loss=0.2025, ctc_loss=0.09049, cr_loss=0.3031, attn_decoder_loss=0.2082, over 29342.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.116, cr_loss=0.357, attn_decoder_loss=0.2406, over 5787776.84 frames. ], batch size: 67, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:50:11,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=640500.0, ans=0.1 2024-09-19 09:50:11,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=640500.0, ans=0.05 2024-09-19 09:50:20,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=22.5 2024-09-19 09:50:26,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.76 vs. limit=22.5 2024-09-19 09:50:44,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=640580.0, ans=0.025 2024-09-19 09:51:03,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=640620.0, ans=0.0 2024-09-19 09:51:03,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=640620.0, ans=0.0 2024-09-19 09:51:11,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=640660.0, ans=0.0 2024-09-19 09:51:13,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-19 09:51:26,090 INFO [train.py:1198] (1/2) Epoch 36, batch 1800, loss[loss=0.2527, ctc_loss=0.1287, cr_loss=0.3888, attn_decoder_loss=0.2578, over 29677.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1163, cr_loss=0.3581, attn_decoder_loss=0.2409, over 5790785.20 frames. ], batch size: 83, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:51:45,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.55 vs. limit=6.0 2024-09-19 09:51:47,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=640740.0, ans=0.125 2024-09-19 09:51:49,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=640740.0, ans=0.125 2024-09-19 09:51:56,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.489e+01 9.081e+01 9.519e+01 1.920e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 09:51:56,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=640780.0, ans=0.125 2024-09-19 09:52:02,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.13 vs. limit=15.0 2024-09-19 09:52:05,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.81 vs. limit=10.0 2024-09-19 09:52:18,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=640820.0, ans=0.125 2024-09-19 09:52:19,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.48 vs. limit=22.5 2024-09-19 09:52:20,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=640820.0, ans=0.125 2024-09-19 09:52:42,685 INFO [train.py:1198] (1/2) Epoch 36, batch 1850, loss[loss=0.2469, ctc_loss=0.1201, cr_loss=0.3763, attn_decoder_loss=0.2527, over 29623.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1159, cr_loss=0.3572, attn_decoder_loss=0.2406, over 5795070.81 frames. ], batch size: 86, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:52:48,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=640900.0, ans=0.1 2024-09-19 09:53:14,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.86 vs. limit=5.0 2024-09-19 09:53:19,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=640980.0, ans=0.125 2024-09-19 09:53:22,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=640980.0, ans=0.2 2024-09-19 09:53:42,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=641020.0, ans=0.95 2024-09-19 09:53:44,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=641060.0, ans=0.125 2024-09-19 09:53:51,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=641060.0, ans=0.125 2024-09-19 09:53:53,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=641060.0, ans=0.125 2024-09-19 09:53:53,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=641060.0, ans=0.0 2024-09-19 09:54:00,521 INFO [train.py:1198] (1/2) Epoch 36, batch 1900, loss[loss=0.2494, ctc_loss=0.1181, cr_loss=0.3793, attn_decoder_loss=0.2556, over 29696.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1156, cr_loss=0.3566, attn_decoder_loss=0.2407, over 5802953.49 frames. ], batch size: 89, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:54:13,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=641100.0, ans=0.125 2024-09-19 09:54:15,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.73 vs. limit=15.0 2024-09-19 09:54:30,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=641140.0, ans=0.2 2024-09-19 09:54:31,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=641180.0, ans=0.125 2024-09-19 09:54:33,086 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.812e+01 8.610e+01 8.955e+01 9.499e+01 1.383e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 09:54:44,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=641180.0, ans=0.07 2024-09-19 09:54:48,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=641220.0, ans=0.025 2024-09-19 09:54:51,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=641220.0, ans=0.125 2024-09-19 09:55:18,518 INFO [train.py:1198] (1/2) Epoch 36, batch 1950, loss[loss=0.2448, ctc_loss=0.124, cr_loss=0.3826, attn_decoder_loss=0.2497, over 29429.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1159, cr_loss=0.3575, attn_decoder_loss=0.2415, over 5818225.56 frames. ], batch size: 78, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:55:43,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-09-19 09:55:57,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=641380.0, ans=0.125 2024-09-19 09:56:12,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641420.0, ans=0.1 2024-09-19 09:56:26,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=641460.0, ans=0.125 2024-09-19 09:56:29,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=641460.0, ans=0.2 2024-09-19 09:56:33,539 INFO [train.py:1198] (1/2) Epoch 36, batch 2000, loss[loss=0.2223, ctc_loss=0.108, cr_loss=0.3446, attn_decoder_loss=0.2274, over 29358.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1166, cr_loss=0.3592, attn_decoder_loss=0.242, over 5795870.77 frames. ], batch size: 67, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:56:36,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=641500.0, ans=0.2 2024-09-19 09:56:46,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=641500.0, ans=0.1 2024-09-19 09:57:04,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 8.610e+01 8.991e+01 9.571e+01 3.322e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-19 09:57:06,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=641580.0, ans=0.025 2024-09-19 09:57:38,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=641660.0, ans=0.0 2024-09-19 09:57:40,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=641660.0, ans=0.125 2024-09-19 09:57:48,071 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:57:52,157 INFO [train.py:1198] (1/2) Epoch 36, batch 2050, loss[loss=0.2124, ctc_loss=0.09843, cr_loss=0.3315, attn_decoder_loss=0.2177, over 29441.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1162, cr_loss=0.3579, attn_decoder_loss=0.241, over 5786809.95 frames. ], batch size: 70, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:58:02,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=12.0 2024-09-19 09:58:11,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=641740.0, ans=0.1 2024-09-19 09:58:29,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=641780.0, ans=0.0 2024-09-19 09:59:09,451 INFO [train.py:1198] (1/2) Epoch 36, batch 2100, loss[loss=0.2286, ctc_loss=0.111, cr_loss=0.3406, attn_decoder_loss=0.2341, over 29781.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.116, cr_loss=0.3577, attn_decoder_loss=0.2407, over 5798977.81 frames. ], batch size: 81, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:59:15,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-09-19 09:59:20,258 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:59:21,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=641900.0, ans=0.0 2024-09-19 09:59:39,341 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.432e+01 8.828e+01 9.578e+01 1.169e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-19 10:00:01,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2024-09-19 10:00:12,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=642060.0, ans=0.125 2024-09-19 10:00:17,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=642060.0, ans=0.2 2024-09-19 10:00:24,374 INFO [train.py:1198] (1/2) Epoch 36, batch 2150, loss[loss=0.229, ctc_loss=0.1158, cr_loss=0.3607, attn_decoder_loss=0.2336, over 29448.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1152, cr_loss=0.3557, attn_decoder_loss=0.2398, over 5814500.80 frames. ], batch size: 78, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 10:00:47,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=642140.0, ans=0.125 2024-09-19 10:00:55,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=642180.0, ans=0.125 2024-09-19 10:01:14,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=642220.0, ans=0.1 2024-09-19 10:01:25,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=642260.0, ans=0.025 2024-09-19 10:01:30,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=642260.0, ans=0.0 2024-09-19 10:01:42,330 INFO [train.py:1198] (1/2) Epoch 36, batch 2200, loss[loss=0.2494, ctc_loss=0.1286, cr_loss=0.3976, attn_decoder_loss=0.254, over 29625.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1155, cr_loss=0.3562, attn_decoder_loss=0.2401, over 5810580.53 frames. ], batch size: 86, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 10:01:44,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=642300.0, ans=0.125 2024-09-19 10:01:50,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=642300.0, ans=10.0 2024-09-19 10:01:57,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=642340.0, ans=0.125 2024-09-19 10:02:07,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642340.0, ans=0.1 2024-09-19 10:02:14,558 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.688e+01 9.104e+01 9.664e+01 2.107e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-19 10:02:23,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.04 vs. limit=6.0 2024-09-19 10:02:33,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=642420.0, ans=0.125 2024-09-19 10:02:34,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=642420.0, ans=0.125 2024-09-19 10:02:45,659 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:02:50,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-09-19 10:02:54,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=642460.0, ans=0.125 2024-09-19 10:03:00,207 INFO [train.py:1198] (1/2) Epoch 36, batch 2250, loss[loss=0.2423, ctc_loss=0.1188, cr_loss=0.3652, attn_decoder_loss=0.2479, over 29721.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1148, cr_loss=0.3553, attn_decoder_loss=0.2399, over 5809735.09 frames. ], batch size: 82, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 10:03:14,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=642540.0, ans=0.025 2024-09-19 10:03:16,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2024-09-19 10:03:20,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-09-19 10:03:21,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=642540.0, ans=0.1 2024-09-19 10:04:15,306 INFO [train.py:1198] (1/2) Epoch 36, batch 2300, loss[loss=0.2063, ctc_loss=0.09486, cr_loss=0.3102, attn_decoder_loss=0.2118, over 29323.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1146, cr_loss=0.3544, attn_decoder_loss=0.2391, over 5797742.25 frames. ], batch size: 71, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 10:04:26,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=642700.0, ans=0.035 2024-09-19 10:04:46,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.402e+01 9.082e+01 9.464e+01 1.800e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 10:05:33,288 INFO [train.py:1198] (1/2) Epoch 36, batch 2350, loss[loss=0.2385, ctc_loss=0.1167, cr_loss=0.3602, attn_decoder_loss=0.244, over 29695.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1148, cr_loss=0.3556, attn_decoder_loss=0.2394, over 5803615.58 frames. ], batch size: 83, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:05:35,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=642900.0, ans=0.125 2024-09-19 10:05:42,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=642900.0, ans=0.125 2024-09-19 10:06:22,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=643020.0, ans=0.125 2024-09-19 10:06:34,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=643060.0, ans=0.2 2024-09-19 10:06:38,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=643060.0, ans=0.0 2024-09-19 10:06:46,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=643060.0, ans=0.2 2024-09-19 10:06:50,474 INFO [train.py:1198] (1/2) Epoch 36, batch 2400, loss[loss=0.2267, ctc_loss=0.1118, cr_loss=0.3489, attn_decoder_loss=0.2317, over 29547.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1157, cr_loss=0.3576, attn_decoder_loss=0.24, over 5807181.07 frames. ], batch size: 76, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:06:51,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-19 10:06:57,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=22.5 2024-09-19 10:07:10,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=643140.0, ans=0.0 2024-09-19 10:07:15,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=643140.0, ans=0.125 2024-09-19 10:07:22,299 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.644e+01 9.234e+01 9.836e+01 2.155e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-19 10:07:24,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=643180.0, ans=0.125 2024-09-19 10:07:25,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643180.0, ans=0.1 2024-09-19 10:07:26,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=643180.0, ans=15.0 2024-09-19 10:07:30,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=643180.0, ans=0.2 2024-09-19 10:07:30,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=643180.0, ans=0.125 2024-09-19 10:07:35,242 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:07:36,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=643220.0, ans=0.0 2024-09-19 10:07:40,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2024-09-19 10:07:50,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=643260.0, ans=0.2 2024-09-19 10:08:01,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=643260.0, ans=0.125 2024-09-19 10:08:07,033 INFO [train.py:1198] (1/2) Epoch 36, batch 2450, loss[loss=0.2394, ctc_loss=0.1198, cr_loss=0.3678, attn_decoder_loss=0.2445, over 29686.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1163, cr_loss=0.3583, attn_decoder_loss=0.241, over 5782390.73 frames. ], batch size: 82, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:08:07,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=643300.0, ans=0.2 2024-09-19 10:08:08,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=643300.0, ans=0.0 2024-09-19 10:08:22,442 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:08:39,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.77 vs. limit=5.0 2024-09-19 10:08:46,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=643380.0, ans=0.0 2024-09-19 10:09:00,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=643420.0, ans=0.125 2024-09-19 10:09:05,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=643420.0, ans=0.0 2024-09-19 10:09:06,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=643420.0, ans=0.125 2024-09-19 10:09:22,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.96 vs. limit=15.0 2024-09-19 10:09:22,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.05 vs. limit=22.5 2024-09-19 10:09:24,637 INFO [train.py:1198] (1/2) Epoch 36, batch 2500, loss[loss=0.2432, ctc_loss=0.1063, cr_loss=0.3243, attn_decoder_loss=0.2512, over 29621.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1161, cr_loss=0.358, attn_decoder_loss=0.2407, over 5793148.31 frames. ], batch size: 86, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:09:26,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643500.0, ans=0.1 2024-09-19 10:09:29,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=643500.0, ans=0.125 2024-09-19 10:09:50,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=643540.0, ans=0.125 2024-09-19 10:09:58,826 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.618e+01 8.994e+01 9.637e+01 2.222e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-19 10:10:05,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=643580.0, ans=0.125 2024-09-19 10:10:17,327 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:10:24,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=643620.0, ans=0.0 2024-09-19 10:10:42,727 INFO [train.py:1198] (1/2) Epoch 36, batch 2550, loss[loss=0.2075, ctc_loss=0.0944, cr_loss=0.3296, attn_decoder_loss=0.2127, over 29341.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.116, cr_loss=0.3575, attn_decoder_loss=0.2407, over 5796165.40 frames. ], batch size: 67, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:10:43,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643700.0, ans=0.1 2024-09-19 10:10:47,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=643700.0, ans=0.125 2024-09-19 10:10:48,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=643700.0, ans=0.0 2024-09-19 10:10:59,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2024-09-19 10:11:16,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-09-19 10:11:24,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=643780.0, ans=0.125 2024-09-19 10:11:58,150 INFO [train.py:1198] (1/2) Epoch 36, batch 2600, loss[loss=0.2313, ctc_loss=0.1044, cr_loss=0.3425, attn_decoder_loss=0.2378, over 29439.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1164, cr_loss=0.3585, attn_decoder_loss=0.2411, over 5794061.82 frames. ], batch size: 78, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:12:03,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=643900.0, ans=0.125 2024-09-19 10:12:24,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=643940.0, ans=0.125 2024-09-19 10:12:31,417 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.615e+01 9.147e+01 9.711e+01 1.347e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-19 10:12:35,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2024-09-19 10:12:53,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=644020.0, ans=0.0 2024-09-19 10:12:58,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=644020.0, ans=0.0 2024-09-19 10:13:06,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2024-09-19 10:13:08,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=644060.0, ans=0.0 2024-09-19 10:13:16,049 INFO [train.py:1198] (1/2) Epoch 36, batch 2650, loss[loss=0.2534, ctc_loss=0.1312, cr_loss=0.3937, attn_decoder_loss=0.2582, over 29282.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1163, cr_loss=0.3584, attn_decoder_loss=0.2413, over 5801736.41 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:13:22,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=644100.0, ans=0.125 2024-09-19 10:13:22,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.46 vs. limit=15.0 2024-09-19 10:13:23,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=644100.0, ans=0.125 2024-09-19 10:13:28,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=644100.0, ans=0.125 2024-09-19 10:13:54,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=644180.0, ans=0.125 2024-09-19 10:14:06,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=644220.0, ans=0.0 2024-09-19 10:14:06,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.22 vs. limit=22.5 2024-09-19 10:14:07,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-09-19 10:14:12,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=644220.0, ans=0.05 2024-09-19 10:14:21,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=644260.0, ans=0.125 2024-09-19 10:14:33,360 INFO [train.py:1198] (1/2) Epoch 36, batch 2700, loss[loss=0.2326, ctc_loss=0.1048, cr_loss=0.3231, attn_decoder_loss=0.2397, over 29519.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1161, cr_loss=0.3582, attn_decoder_loss=0.2414, over 5796020.75 frames. ], batch size: 87, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:14:49,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=644340.0, ans=0.125 2024-09-19 10:15:06,280 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.558e+01 9.078e+01 9.683e+01 1.491e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 10:15:22,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2024-09-19 10:15:29,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=644420.0, ans=0.0 2024-09-19 10:15:30,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-19 10:15:33,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=644460.0, ans=0.0 2024-09-19 10:15:38,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2024-09-19 10:15:48,818 INFO [train.py:1198] (1/2) Epoch 36, batch 2750, loss[loss=0.2349, ctc_loss=0.1233, cr_loss=0.3877, attn_decoder_loss=0.2387, over 29506.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1157, cr_loss=0.357, attn_decoder_loss=0.2404, over 5793674.58 frames. ], batch size: 75, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:16:04,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=644540.0, ans=10.0 2024-09-19 10:16:53,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.43 vs. limit=15.0 2024-09-19 10:17:06,642 INFO [train.py:1198] (1/2) Epoch 36, batch 2800, loss[loss=0.255, ctc_loss=0.1485, cr_loss=0.4033, attn_decoder_loss=0.2579, over 20326.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.116, cr_loss=0.3571, attn_decoder_loss=0.2406, over 5774572.62 frames. ], batch size: 210, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:17:18,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=644700.0, ans=0.0 2024-09-19 10:17:28,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=644740.0, ans=0.125 2024-09-19 10:17:38,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=644780.0, ans=0.0 2024-09-19 10:17:43,407 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.665e+01 8.452e+01 9.019e+01 9.554e+01 2.850e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-19 10:17:47,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.35 vs. limit=15.0 2024-09-19 10:18:08,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=644860.0, ans=0.1 2024-09-19 10:18:11,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=644860.0, ans=0.125 2024-09-19 10:18:13,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2024-09-19 10:18:22,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.23 vs. limit=15.0 2024-09-19 10:18:24,434 INFO [train.py:1198] (1/2) Epoch 36, batch 2850, loss[loss=0.2279, ctc_loss=0.1106, cr_loss=0.3346, attn_decoder_loss=0.2334, over 29485.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1162, cr_loss=0.3567, attn_decoder_loss=0.2408, over 5760315.83 frames. ], batch size: 77, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:18:29,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=644900.0, ans=0.125 2024-09-19 10:18:30,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=644900.0, ans=0.0 2024-09-19 10:18:35,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=644900.0, ans=0.0 2024-09-19 10:18:38,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=644940.0, ans=0.0 2024-09-19 10:19:14,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=645020.0, ans=0.2 2024-09-19 10:19:14,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-19 10:19:23,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=645060.0, ans=0.1 2024-09-19 10:19:23,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=645060.0, ans=0.2 2024-09-19 10:19:32,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=645060.0, ans=0.95 2024-09-19 10:19:32,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=645060.0, ans=0.125 2024-09-19 10:19:39,851 INFO [train.py:1198] (1/2) Epoch 36, batch 2900, loss[loss=0.238, ctc_loss=0.1227, cr_loss=0.3722, attn_decoder_loss=0.2425, over 29431.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1168, cr_loss=0.3583, attn_decoder_loss=0.2419, over 5786444.66 frames. ], batch size: 79, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:19:59,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=645140.0, ans=0.0 2024-09-19 10:20:14,869 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.534e+01 8.969e+01 9.435e+01 1.794e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 10:20:24,984 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:20:41,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645260.0, ans=0.1 2024-09-19 10:20:57,804 INFO [train.py:1198] (1/2) Epoch 36, batch 2950, loss[loss=0.2218, ctc_loss=0.1053, cr_loss=0.3454, attn_decoder_loss=0.2271, over 29509.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1156, cr_loss=0.356, attn_decoder_loss=0.2405, over 5782576.26 frames. ], batch size: 75, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:21:04,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=645300.0, ans=0.2 2024-09-19 10:21:06,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=12.0 2024-09-19 10:21:11,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=645340.0, ans=0.1 2024-09-19 10:21:12,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=15.0 2024-09-19 10:22:15,282 INFO [train.py:1198] (1/2) Epoch 36, batch 3000, loss[loss=0.2379, ctc_loss=0.1099, cr_loss=0.3549, attn_decoder_loss=0.2442, over 29752.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1157, cr_loss=0.3562, attn_decoder_loss=0.2403, over 5783198.54 frames. ], batch size: 81, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:22:15,282 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 10:22:33,133 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.4.self_attn_weights, attn_weights_entropy = tensor([3.7443, 3.2396, 2.7017, 3.4968, 3.0432, 2.3886, 2.7009, 3.0437], device='cuda:1') 2024-09-19 10:22:33,842 INFO [train.py:1230] (1/2) Epoch 36, validation: loss=0.212, ctc_loss=0.03671, cr_loss=5.93e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-19 10:22:33,842 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 10:22:38,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=645500.0, ans=0.0 2024-09-19 10:22:40,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=645500.0, ans=0.09899494936611666 2024-09-19 10:22:43,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=645500.0, ans=0.2 2024-09-19 10:22:54,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=645540.0, ans=15.0 2024-09-19 10:23:07,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=645580.0, ans=0.125 2024-09-19 10:23:08,441 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.660e+01 9.002e+01 9.609e+01 4.841e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-19 10:23:15,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=645580.0, ans=0.025 2024-09-19 10:23:24,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=645620.0, ans=0.0 2024-09-19 10:23:28,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=645620.0, ans=0.125 2024-09-19 10:23:48,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=645700.0, ans=0.0 2024-09-19 10:23:49,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2024-09-19 10:23:50,006 INFO [train.py:1198] (1/2) Epoch 36, batch 3050, loss[loss=0.2208, ctc_loss=0.1023, cr_loss=0.3352, attn_decoder_loss=0.2265, over 29543.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1164, cr_loss=0.3578, attn_decoder_loss=0.2413, over 5777031.19 frames. ], batch size: 76, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:23:52,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-19 10:24:01,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=645700.0, ans=0.025 2024-09-19 10:24:10,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=645740.0, ans=0.125 2024-09-19 10:24:24,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=645780.0, ans=0.0 2024-09-19 10:24:26,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.16 vs. limit=15.0 2024-09-19 10:24:37,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=645820.0, ans=0.125 2024-09-19 10:24:38,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=22.5 2024-09-19 10:24:52,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2024-09-19 10:24:59,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=645860.0, ans=0.125 2024-09-19 10:25:07,755 INFO [train.py:1198] (1/2) Epoch 36, batch 3100, loss[loss=0.2616, ctc_loss=0.1431, cr_loss=0.4093, attn_decoder_loss=0.2657, over 29262.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1164, cr_loss=0.3574, attn_decoder_loss=0.241, over 5777979.18 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:25:24,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=645940.0, ans=0.125 2024-09-19 10:25:33,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.90 vs. limit=15.0 2024-09-19 10:25:34,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=645940.0, ans=0.125 2024-09-19 10:25:40,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=645980.0, ans=0.0 2024-09-19 10:25:44,450 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.450e+01 9.039e+01 9.711e+01 1.761e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 10:25:45,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.97 vs. limit=15.0 2024-09-19 10:26:16,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=646060.0, ans=0.125 2024-09-19 10:26:17,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.10 vs. limit=12.0 2024-09-19 10:26:25,345 INFO [train.py:1198] (1/2) Epoch 36, batch 3150, loss[loss=0.2513, ctc_loss=0.1237, cr_loss=0.3657, attn_decoder_loss=0.2574, over 28793.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1164, cr_loss=0.3575, attn_decoder_loss=0.2411, over 5783674.00 frames. ], batch size: 104, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:26:54,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=646180.0, ans=0.125 2024-09-19 10:27:07,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=646180.0, ans=0.125 2024-09-19 10:27:22,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=646220.0, ans=0.5 2024-09-19 10:27:36,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=646260.0, ans=0.125 2024-09-19 10:27:39,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646300.0, ans=0.1 2024-09-19 10:27:40,721 INFO [train.py:1198] (1/2) Epoch 36, batch 3200, loss[loss=0.2345, ctc_loss=0.1157, cr_loss=0.3382, attn_decoder_loss=0.2402, over 29401.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1161, cr_loss=0.3566, attn_decoder_loss=0.2406, over 5793608.00 frames. ], batch size: 79, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:27:55,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-19 10:27:59,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=646340.0, ans=0.125 2024-09-19 10:28:06,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=646340.0, ans=0.0 2024-09-19 10:28:09,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646340.0, ans=0.1 2024-09-19 10:28:17,946 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.478e+01 9.056e+01 9.805e+01 1.899e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-19 10:28:29,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=646420.0, ans=0.125 2024-09-19 10:28:29,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=646420.0, ans=0.125 2024-09-19 10:28:45,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=646460.0, ans=0.0 2024-09-19 10:28:59,139 INFO [train.py:1198] (1/2) Epoch 36, batch 3250, loss[loss=0.2422, ctc_loss=0.1285, cr_loss=0.3912, attn_decoder_loss=0.2461, over 29697.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1162, cr_loss=0.3572, attn_decoder_loss=0.2409, over 5801179.52 frames. ], batch size: 84, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:29:00,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=646500.0, ans=0.125 2024-09-19 10:29:27,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=646540.0, ans=0.5 2024-09-19 10:29:29,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.97 vs. limit=22.5 2024-09-19 10:29:37,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2024-09-19 10:29:46,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=646620.0, ans=0.125 2024-09-19 10:30:00,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-09-19 10:30:15,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=646700.0, ans=0.0 2024-09-19 10:30:16,478 INFO [train.py:1198] (1/2) Epoch 36, batch 3300, loss[loss=0.2392, ctc_loss=0.1094, cr_loss=0.3285, attn_decoder_loss=0.2463, over 28393.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1155, cr_loss=0.3556, attn_decoder_loss=0.2399, over 5798327.26 frames. ], batch size: 111, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:30:22,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=646700.0, ans=0.0 2024-09-19 10:30:25,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=646700.0, ans=0.125 2024-09-19 10:30:52,747 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.565e+01 9.043e+01 9.746e+01 1.474e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 10:30:56,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=646780.0, ans=0.07 2024-09-19 10:30:56,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.48 vs. limit=15.0 2024-09-19 10:31:14,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=646820.0, ans=0.04949747468305833 2024-09-19 10:31:15,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=646860.0, ans=0.1 2024-09-19 10:31:18,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=646860.0, ans=0.2 2024-09-19 10:31:30,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=646900.0, ans=0.0 2024-09-19 10:31:31,685 INFO [train.py:1198] (1/2) Epoch 36, batch 3350, loss[loss=0.2395, ctc_loss=0.1079, cr_loss=0.3551, attn_decoder_loss=0.2462, over 28891.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.116, cr_loss=0.3561, attn_decoder_loss=0.2404, over 5775610.03 frames. ], batch size: 104, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:31:36,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=646900.0, ans=0.0 2024-09-19 10:31:44,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2024-09-19 10:31:52,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646940.0, ans=0.1 2024-09-19 10:32:20,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=647020.0, ans=0.125 2024-09-19 10:32:36,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2024-09-19 10:32:37,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.98 vs. limit=22.5 2024-09-19 10:32:47,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.78 vs. limit=10.0 2024-09-19 10:32:49,153 INFO [train.py:1198] (1/2) Epoch 36, batch 3400, loss[loss=0.2023, ctc_loss=0.09535, cr_loss=0.3231, attn_decoder_loss=0.207, over 29351.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1161, cr_loss=0.3563, attn_decoder_loss=0.2404, over 5767635.33 frames. ], batch size: 67, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:33:23,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=647180.0, ans=0.0 2024-09-19 10:33:27,779 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.557e+01 9.096e+01 9.972e+01 2.860e+02, threshold=1.819e+02, percent-clipped=2.0 2024-09-19 10:33:34,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=647180.0, ans=0.0 2024-09-19 10:33:39,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=647220.0, ans=0.5 2024-09-19 10:34:07,340 INFO [train.py:1198] (1/2) Epoch 36, batch 3450, loss[loss=0.2461, ctc_loss=0.1161, cr_loss=0.3584, attn_decoder_loss=0.2526, over 28204.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1162, cr_loss=0.3572, attn_decoder_loss=0.2408, over 5775207.79 frames. ], batch size: 111, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:34:51,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-09-19 10:34:54,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=647420.0, ans=0.0 2024-09-19 10:35:00,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.54 vs. limit=15.0 2024-09-19 10:35:06,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=647460.0, ans=0.125 2024-09-19 10:35:12,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=647460.0, ans=0.125 2024-09-19 10:35:22,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-09-19 10:35:22,926 INFO [train.py:1198] (1/2) Epoch 36, batch 3500, loss[loss=0.2065, ctc_loss=0.08676, cr_loss=0.28, attn_decoder_loss=0.2135, over 29345.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1157, cr_loss=0.356, attn_decoder_loss=0.2403, over 5778032.53 frames. ], batch size: 71, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:35:56,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=647580.0, ans=0.125 2024-09-19 10:36:00,831 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.443e+01 8.960e+01 9.445e+01 1.390e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 10:36:07,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=647580.0, ans=0.025 2024-09-19 10:36:08,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=647620.0, ans=0.125 2024-09-19 10:36:33,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=647660.0, ans=0.125 2024-09-19 10:36:36,850 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:36:39,502 INFO [train.py:1198] (1/2) Epoch 36, batch 3550, loss[loss=0.2459, ctc_loss=0.1139, cr_loss=0.3542, attn_decoder_loss=0.2527, over 29685.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1157, cr_loss=0.3561, attn_decoder_loss=0.2405, over 5783970.10 frames. ], batch size: 89, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:36:58,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=647740.0, ans=0.125 2024-09-19 10:37:03,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=647740.0, ans=0.125 2024-09-19 10:37:12,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647780.0, ans=0.1 2024-09-19 10:37:21,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=647780.0, ans=0.1 2024-09-19 10:37:43,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647860.0, ans=0.1 2024-09-19 10:37:53,727 INFO [train.py:1198] (1/2) Epoch 36, batch 3600, loss[loss=0.2265, ctc_loss=0.1067, cr_loss=0.3512, attn_decoder_loss=0.232, over 29492.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1158, cr_loss=0.3565, attn_decoder_loss=0.2407, over 5792863.55 frames. ], batch size: 77, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:37:55,673 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:38:04,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=647900.0, ans=0.125 2024-09-19 10:38:10,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647940.0, ans=0.1 2024-09-19 10:38:31,860 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.382e+01 8.950e+01 9.458e+01 2.043e+02, threshold=1.790e+02, percent-clipped=2.0 2024-09-19 10:38:35,632 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:38:51,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=648020.0, ans=0.5 2024-09-19 10:38:56,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.63 vs. limit=15.0 2024-09-19 10:38:59,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=648060.0, ans=0.0 2024-09-19 10:39:10,853 INFO [train.py:1198] (1/2) Epoch 36, batch 3650, loss[loss=0.2512, ctc_loss=0.1267, cr_loss=0.3919, attn_decoder_loss=0.2563, over 29492.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1153, cr_loss=0.3558, attn_decoder_loss=0.2399, over 5794728.01 frames. ], batch size: 90, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:39:29,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=648140.0, ans=0.0 2024-09-19 10:39:58,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=648220.0, ans=0.035 2024-09-19 10:40:18,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-19 10:40:18,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.00 vs. limit=15.0 2024-09-19 10:40:22,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=648260.0, ans=0.2 2024-09-19 10:40:25,596 INFO [train.py:1198] (1/2) Epoch 36, batch 3700, loss[loss=0.2479, ctc_loss=0.118, cr_loss=0.3593, attn_decoder_loss=0.2543, over 29725.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1152, cr_loss=0.3555, attn_decoder_loss=0.2401, over 5803836.08 frames. ], batch size: 84, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:40:26,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-09-19 10:40:49,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=648340.0, ans=0.1 2024-09-19 10:40:52,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=648340.0, ans=0.1 2024-09-19 10:40:55,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=648380.0, ans=0.2 2024-09-19 10:41:01,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.595e+01 9.070e+01 9.562e+01 1.267e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-19 10:41:10,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=648420.0, ans=0.125 2024-09-19 10:41:13,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=648420.0, ans=0.125 2024-09-19 10:41:19,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=648420.0, ans=0.2 2024-09-19 10:41:30,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=648460.0, ans=0.0 2024-09-19 10:41:39,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=648500.0, ans=0.125 2024-09-19 10:41:40,460 INFO [train.py:1198] (1/2) Epoch 36, batch 3750, loss[loss=0.211, ctc_loss=0.09514, cr_loss=0.3044, attn_decoder_loss=0.2171, over 29349.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.115, cr_loss=0.355, attn_decoder_loss=0.2399, over 5807834.32 frames. ], batch size: 67, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:41:42,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=648500.0, ans=0.0 2024-09-19 10:41:52,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=648500.0, ans=0.1 2024-09-19 10:41:57,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=648540.0, ans=0.125 2024-09-19 10:42:13,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=648580.0, ans=0.125 2024-09-19 10:42:15,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=648580.0, ans=0.125 2024-09-19 10:42:35,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=648620.0, ans=0.1 2024-09-19 10:42:35,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=648620.0, ans=0.125 2024-09-19 10:42:44,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=15.0 2024-09-19 10:42:56,340 INFO [train.py:1198] (1/2) Epoch 36, batch 3800, loss[loss=0.2325, ctc_loss=0.1018, cr_loss=0.3287, attn_decoder_loss=0.2398, over 29622.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1149, cr_loss=0.3547, attn_decoder_loss=0.2398, over 5798550.24 frames. ], batch size: 86, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:42:59,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=648700.0, ans=0.125 2024-09-19 10:43:26,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=648780.0, ans=0.0 2024-09-19 10:43:34,111 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.360e+01 8.859e+01 9.442e+01 1.706e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 10:43:46,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=648820.0, ans=0.1 2024-09-19 10:43:48,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=648820.0, ans=0.125 2024-09-19 10:43:58,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=648860.0, ans=0.04949747468305833 2024-09-19 10:44:00,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.70 vs. limit=15.0 2024-09-19 10:44:01,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=648860.0, ans=0.1 2024-09-19 10:44:01,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=648860.0, ans=0.125 2024-09-19 10:44:06,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=648860.0, ans=0.1 2024-09-19 10:44:11,224 INFO [train.py:1198] (1/2) Epoch 36, batch 3850, loss[loss=0.2498, ctc_loss=0.1248, cr_loss=0.3775, attn_decoder_loss=0.2553, over 29244.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1149, cr_loss=0.3553, attn_decoder_loss=0.2397, over 5812883.31 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:44:33,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=648940.0, ans=0.025 2024-09-19 10:44:53,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=648980.0, ans=0.0 2024-09-19 10:45:06,484 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:45:08,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-19 10:45:26,941 INFO [train.py:1198] (1/2) Epoch 36, batch 3900, loss[loss=0.2492, ctc_loss=0.1175, cr_loss=0.3774, attn_decoder_loss=0.2554, over 29625.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1153, cr_loss=0.3563, attn_decoder_loss=0.2404, over 5817748.57 frames. ], batch size: 86, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:45:33,069 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:45:36,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=649100.0, ans=0.5 2024-09-19 10:45:41,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649140.0, ans=0.1 2024-09-19 10:45:46,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=649140.0, ans=0.125 2024-09-19 10:45:46,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=649140.0, ans=0.125 2024-09-19 10:46:03,828 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.759e+01 8.587e+01 8.995e+01 9.649e+01 1.195e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-19 10:46:06,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=649180.0, ans=0.0 2024-09-19 10:46:21,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=649220.0, ans=0.0 2024-09-19 10:46:27,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649260.0, ans=0.1 2024-09-19 10:46:32,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=649260.0, ans=0.09899494936611666 2024-09-19 10:46:40,851 INFO [train.py:1198] (1/2) Epoch 36, batch 3950, loss[loss=0.252, ctc_loss=0.1281, cr_loss=0.4011, attn_decoder_loss=0.2568, over 29499.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1153, cr_loss=0.3568, attn_decoder_loss=0.2405, over 5837045.18 frames. ], batch size: 97, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:46:41,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=649300.0, ans=0.125 2024-09-19 10:47:03,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649340.0, ans=0.1 2024-09-19 10:47:10,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=649380.0, ans=0.125 2024-09-19 10:47:18,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.33 vs. limit=10.0 2024-09-19 10:47:21,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-09-19 10:47:47,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=649460.0, ans=0.125 2024-09-19 10:47:51,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=649460.0, ans=0.125 2024-09-19 10:47:56,081 INFO [train.py:1198] (1/2) Epoch 36, batch 4000, loss[loss=0.2166, ctc_loss=0.09488, cr_loss=0.3188, attn_decoder_loss=0.223, over 29521.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1156, cr_loss=0.3573, attn_decoder_loss=0.2405, over 5814855.73 frames. ], batch size: 74, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:47:56,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=22.5 2024-09-19 10:47:57,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=649500.0, ans=0.1 2024-09-19 10:48:05,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=649500.0, ans=0.0 2024-09-19 10:48:33,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.599e+01 9.136e+01 9.707e+01 2.354e+02, threshold=1.827e+02, percent-clipped=2.0 2024-09-19 10:48:58,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=12.0 2024-09-19 10:49:06,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=649660.0, ans=0.035 2024-09-19 10:49:10,625 INFO [train.py:1198] (1/2) Epoch 36, batch 4050, loss[loss=0.2587, ctc_loss=0.1473, cr_loss=0.3877, attn_decoder_loss=0.2625, over 20619.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1155, cr_loss=0.3563, attn_decoder_loss=0.2403, over 5798085.84 frames. ], batch size: 210, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:49:20,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.68 vs. limit=22.5 2024-09-19 10:49:31,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-19 10:49:51,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=649780.0, ans=0.125 2024-09-19 10:49:53,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.91 vs. limit=22.5 2024-09-19 10:50:10,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=649860.0, ans=0.125 2024-09-19 10:50:11,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-19 10:50:19,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=649860.0, ans=0.0 2024-09-19 10:50:21,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=649860.0, ans=0.125 2024-09-19 10:50:25,522 INFO [train.py:1198] (1/2) Epoch 36, batch 4100, loss[loss=0.2464, ctc_loss=0.1315, cr_loss=0.3905, attn_decoder_loss=0.2505, over 29479.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1162, cr_loss=0.3575, attn_decoder_loss=0.2409, over 5793010.71 frames. ], batch size: 90, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:50:26,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.25 vs. limit=22.5 2024-09-19 10:50:35,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.02 vs. limit=6.0 2024-09-19 10:50:40,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649940.0, ans=0.1 2024-09-19 10:50:40,328 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:50:41,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=649940.0, ans=0.125 2024-09-19 10:50:51,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649940.0, ans=0.1 2024-09-19 10:50:59,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=649980.0, ans=0.025 2024-09-19 10:51:01,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.541e+01 9.319e+01 9.811e+01 6.662e+02, threshold=1.864e+02, percent-clipped=1.0 2024-09-19 10:51:14,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.18 vs. limit=10.0 2024-09-19 10:51:21,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=650020.0, ans=0.0 2024-09-19 10:51:31,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=12.0 2024-09-19 10:51:39,870 INFO [train.py:1198] (1/2) Epoch 36, batch 4150, loss[loss=0.2385, ctc_loss=0.1229, cr_loss=0.3785, attn_decoder_loss=0.2429, over 29504.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1162, cr_loss=0.3576, attn_decoder_loss=0.2407, over 5799042.50 frames. ], batch size: 77, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:51:44,690 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:51:52,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=650100.0, ans=0.125 2024-09-19 10:51:52,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=650100.0, ans=6.0 2024-09-19 10:51:59,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=650140.0, ans=0.125 2024-09-19 10:51:59,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=650140.0, ans=0.2 2024-09-19 10:52:00,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-09-19 10:52:14,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=650180.0, ans=0.0 2024-09-19 10:52:23,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=22.5 2024-09-19 10:52:27,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=650220.0, ans=0.02 2024-09-19 10:52:51,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=650260.0, ans=0.025 2024-09-19 10:52:53,759 INFO [train.py:1198] (1/2) Epoch 36, batch 4200, loss[loss=0.2622, ctc_loss=0.1413, cr_loss=0.4081, attn_decoder_loss=0.2665, over 29491.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1162, cr_loss=0.3578, attn_decoder_loss=0.2407, over 5800174.91 frames. ], batch size: 90, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:53:01,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=650300.0, ans=0.125 2024-09-19 10:53:01,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=650300.0, ans=0.125 2024-09-19 10:53:01,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650300.0, ans=0.1 2024-09-19 10:53:22,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=650380.0, ans=0.125 2024-09-19 10:53:31,781 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.662e+01 9.257e+01 9.687e+01 2.927e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-19 10:54:01,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=650460.0, ans=0.0 2024-09-19 10:54:04,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650460.0, ans=0.1 2024-09-19 10:54:04,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=650460.0, ans=0.1 2024-09-19 10:54:05,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=650460.0, ans=0.025 2024-09-19 10:54:08,386 INFO [train.py:1198] (1/2) Epoch 36, batch 4250, loss[loss=0.2206, ctc_loss=0.1022, cr_loss=0.3299, attn_decoder_loss=0.2264, over 29509.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1158, cr_loss=0.3565, attn_decoder_loss=0.2407, over 5805853.95 frames. ], batch size: 74, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:54:26,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=650540.0, ans=0.0 2024-09-19 10:54:36,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=650580.0, ans=0.125 2024-09-19 10:54:52,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=650620.0, ans=0.2 2024-09-19 10:55:09,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=650660.0, ans=0.0 2024-09-19 10:55:22,678 INFO [train.py:1198] (1/2) Epoch 36, batch 4300, loss[loss=0.2545, ctc_loss=0.1305, cr_loss=0.3909, attn_decoder_loss=0.2596, over 29539.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1157, cr_loss=0.3561, attn_decoder_loss=0.241, over 5795129.25 frames. ], batch size: 87, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:55:43,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=650740.0, ans=0.125 2024-09-19 10:55:58,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=650780.0, ans=0.125 2024-09-19 10:55:58,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=650780.0, ans=0.125 2024-09-19 10:56:01,046 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.651e+01 9.063e+01 9.682e+01 5.777e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-19 10:56:01,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=650780.0, ans=0.125 2024-09-19 10:56:01,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.16 vs. limit=15.0 2024-09-19 10:56:05,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=650820.0, ans=0.125 2024-09-19 10:56:19,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-09-19 10:56:36,544 INFO [train.py:1198] (1/2) Epoch 36, batch 4350, loss[loss=0.2443, ctc_loss=0.1177, cr_loss=0.3608, attn_decoder_loss=0.2504, over 29459.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1179, cr_loss=0.361, attn_decoder_loss=0.2443, over 5797412.22 frames. ], batch size: 97, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:56:39,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=22.5 2024-09-19 10:56:50,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=650940.0, ans=0.1 2024-09-19 10:56:55,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=650940.0, ans=0.125 2024-09-19 10:57:01,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=650940.0, ans=0.125 2024-09-19 10:57:02,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=650940.0, ans=0.125 2024-09-19 10:57:30,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.38 vs. limit=15.0 2024-09-19 10:57:51,172 INFO [train.py:1198] (1/2) Epoch 36, batch 4400, loss[loss=0.2487, ctc_loss=0.132, cr_loss=0.3979, attn_decoder_loss=0.2528, over 27353.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1192, cr_loss=0.3636, attn_decoder_loss=0.2463, over 5768090.34 frames. ], batch size: 124, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:57:58,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=651100.0, ans=0.125 2024-09-19 10:58:02,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=651100.0, ans=15.0 2024-09-19 10:58:03,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=651100.0, ans=0.025 2024-09-19 10:58:16,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=651140.0, ans=0.125 2024-09-19 10:58:29,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.945e+01 9.277e+01 9.704e+01 3.205e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 10:58:42,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=651220.0, ans=0.0 2024-09-19 10:58:43,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=651220.0, ans=10.0 2024-09-19 10:58:52,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=651260.0, ans=0.125 2024-09-19 10:58:56,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=651260.0, ans=0.125 2024-09-19 10:59:05,957 INFO [train.py:1198] (1/2) Epoch 36, batch 4450, loss[loss=0.2598, ctc_loss=0.1505, cr_loss=0.3966, attn_decoder_loss=0.2631, over 21405.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1227, cr_loss=0.3686, attn_decoder_loss=0.2482, over 5576663.01 frames. ], batch size: 210, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 10:59:16,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=651300.0, ans=0.0 2024-09-19 10:59:26,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2024-09-19 10:59:29,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=651340.0, ans=0.125 2024-09-19 10:59:44,488 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:59:44,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=651380.0, ans=0.0 2024-09-19 10:59:46,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-09-19 10:59:51,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.59 vs. limit=15.0 2024-09-19 11:00:20,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=651500.0, ans=0.125 2024-09-19 11:00:21,397 INFO [train.py:1198] (1/2) Epoch 36, batch 4500, loss[loss=0.2517, ctc_loss=0.1366, cr_loss=0.3642, attn_decoder_loss=0.2564, over 20222.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1258, cr_loss=0.3713, attn_decoder_loss=0.2499, over 5240033.97 frames. ], batch size: 210, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 11:00:21,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=651500.0, ans=0.125 2024-09-19 11:00:25,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=651500.0, ans=0.025 2024-09-19 11:00:28,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-09-19 11:00:29,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=651500.0, ans=0.0 2024-09-19 11:00:53,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=651580.0, ans=0.125 2024-09-19 11:00:55,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-09-19 11:01:50,762 INFO [train.py:1198] (1/2) Epoch 37, batch 0, loss[loss=0.2147, ctc_loss=0.09723, cr_loss=0.32, attn_decoder_loss=0.2206, over 29586.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.09723, cr_loss=0.32, attn_decoder_loss=0.2206, over 29586.00 frames. ], batch size: 73, lr: 3.00e-03, grad_scale: 16.0 2024-09-19 11:01:50,762 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 11:01:53,999 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4250, 4.0850, 3.9626, 3.8090], device='cuda:1') 2024-09-19 11:02:09,656 INFO [train.py:1230] (1/2) Epoch 37, validation: loss=0.2132, ctc_loss=0.03619, cr_loss=6.181e-15, attn_decoder_loss=0.2329, over 944034.00 frames. 2024-09-19 11:02:09,657 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 11:02:11,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=651600.0, ans=0.2 2024-09-19 11:02:12,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.774e+01 1.049e+02 1.138e+02 1.230e+02 2.136e+02, threshold=2.276e+02, percent-clipped=1.0 2024-09-19 11:02:12,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651600.0, ans=0.1 2024-09-19 11:02:22,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=651600.0, ans=0.0 2024-09-19 11:02:45,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=651680.0, ans=0.0 2024-09-19 11:02:51,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=651680.0, ans=0.125 2024-09-19 11:02:54,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=651720.0, ans=0.07 2024-09-19 11:02:59,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.73 vs. limit=15.0 2024-09-19 11:03:05,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=651720.0, ans=0.2 2024-09-19 11:03:08,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=651720.0, ans=0.2 2024-09-19 11:03:08,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2024-09-19 11:03:09,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=651760.0, ans=0.05 2024-09-19 11:03:15,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=651760.0, ans=0.125 2024-09-19 11:03:23,556 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:03:26,221 INFO [train.py:1198] (1/2) Epoch 37, batch 50, loss[loss=0.2103, ctc_loss=0.09044, cr_loss=0.2881, attn_decoder_loss=0.2172, over 29451.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1179, cr_loss=0.3624, attn_decoder_loss=0.2418, over 1268510.03 frames. ], batch size: 70, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:03:26,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=651800.0, ans=0.0 2024-09-19 11:03:35,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=651800.0, ans=0.0 2024-09-19 11:03:41,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=651840.0, ans=0.1 2024-09-19 11:03:44,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=651840.0, ans=0.0 2024-09-19 11:03:57,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=651880.0, ans=0.0 2024-09-19 11:04:01,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=651880.0, ans=0.125 2024-09-19 11:04:04,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=651880.0, ans=0.0 2024-09-19 11:04:33,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=651960.0, ans=0.025 2024-09-19 11:04:42,595 INFO [train.py:1198] (1/2) Epoch 37, batch 100, loss[loss=0.2264, ctc_loss=0.1116, cr_loss=0.336, attn_decoder_loss=0.2317, over 29547.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1189, cr_loss=0.3654, attn_decoder_loss=0.2437, over 2252117.45 frames. ], batch size: 76, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:04:46,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 8.722e+01 9.272e+01 9.995e+01 2.422e+02, threshold=1.854e+02, percent-clipped=1.0 2024-09-19 11:05:11,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=652080.0, ans=0.2 2024-09-19 11:05:13,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.96 vs. limit=22.5 2024-09-19 11:05:19,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.44 vs. limit=15.0 2024-09-19 11:05:23,144 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:05:40,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=652120.0, ans=0.125 2024-09-19 11:05:52,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.38 vs. limit=10.0 2024-09-19 11:05:55,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.37 vs. limit=15.0 2024-09-19 11:05:59,143 INFO [train.py:1198] (1/2) Epoch 37, batch 150, loss[loss=0.2069, ctc_loss=0.09072, cr_loss=0.2981, attn_decoder_loss=0.2132, over 29420.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.116, cr_loss=0.3584, attn_decoder_loss=0.2411, over 3046972.17 frames. ], batch size: 70, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:06:07,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=652200.0, ans=0.025 2024-09-19 11:06:22,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=652240.0, ans=0.125 2024-09-19 11:06:30,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=652280.0, ans=0.125 2024-09-19 11:06:52,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=652320.0, ans=0.0 2024-09-19 11:06:58,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=652320.0, ans=0.2 2024-09-19 11:07:16,348 INFO [train.py:1198] (1/2) Epoch 37, batch 200, loss[loss=0.2406, ctc_loss=0.1252, cr_loss=0.3923, attn_decoder_loss=0.2447, over 27339.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.115, cr_loss=0.3563, attn_decoder_loss=0.2397, over 3659232.06 frames. ], batch size: 124, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:07:17,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.16 vs. limit=22.5 2024-09-19 11:07:20,813 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.412e+01 8.881e+01 9.450e+01 8.334e+02, threshold=1.776e+02, percent-clipped=1.0 2024-09-19 11:07:33,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=652440.0, ans=0.125 2024-09-19 11:07:36,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=652440.0, ans=0.0 2024-09-19 11:07:53,014 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:07:55,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=11.93 vs. limit=15.0 2024-09-19 11:07:57,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=652480.0, ans=0.125 2024-09-19 11:08:14,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=652520.0, ans=15.0 2024-09-19 11:08:15,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=652560.0, ans=0.0 2024-09-19 11:08:28,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.13 vs. limit=22.5 2024-09-19 11:08:31,987 INFO [train.py:1198] (1/2) Epoch 37, batch 250, loss[loss=0.2446, ctc_loss=0.1214, cr_loss=0.3616, attn_decoder_loss=0.2502, over 29293.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1143, cr_loss=0.3545, attn_decoder_loss=0.2395, over 4142080.69 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:08:42,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=652600.0, ans=0.125 2024-09-19 11:08:49,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=652640.0, ans=0.125 2024-09-19 11:08:56,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=652640.0, ans=0.125 2024-09-19 11:09:05,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=652680.0, ans=0.125 2024-09-19 11:09:20,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=652720.0, ans=0.125 2024-09-19 11:09:43,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=652760.0, ans=0.125 2024-09-19 11:09:43,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=652760.0, ans=0.125 2024-09-19 11:09:50,022 INFO [train.py:1198] (1/2) Epoch 37, batch 300, loss[loss=0.2472, ctc_loss=0.123, cr_loss=0.3782, attn_decoder_loss=0.2526, over 29551.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1142, cr_loss=0.3541, attn_decoder_loss=0.2395, over 4510761.53 frames. ], batch size: 92, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:09:53,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=652800.0, ans=0.125 2024-09-19 11:09:54,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.309e+01 8.922e+01 9.556e+01 2.479e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-19 11:10:00,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.64 vs. limit=15.0 2024-09-19 11:10:05,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2024-09-19 11:10:18,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=652840.0, ans=0.0 2024-09-19 11:10:35,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=652880.0, ans=0.125 2024-09-19 11:10:47,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=652920.0, ans=0.0 2024-09-19 11:10:53,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=652960.0, ans=0.09899494936611666 2024-09-19 11:10:53,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=652960.0, ans=0.0 2024-09-19 11:11:08,045 INFO [train.py:1198] (1/2) Epoch 37, batch 350, loss[loss=0.2068, ctc_loss=0.09164, cr_loss=0.3086, attn_decoder_loss=0.2127, over 29306.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1146, cr_loss=0.3549, attn_decoder_loss=0.2399, over 4795556.50 frames. ], batch size: 71, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:11:31,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=653040.0, ans=0.0 2024-09-19 11:11:39,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=653080.0, ans=0.0 2024-09-19 11:11:47,515 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:12:10,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=653160.0, ans=0.125 2024-09-19 11:12:10,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=653160.0, ans=0.125 2024-09-19 11:12:23,524 INFO [train.py:1198] (1/2) Epoch 37, batch 400, loss[loss=0.2401, ctc_loss=0.1243, cr_loss=0.3766, attn_decoder_loss=0.2446, over 29705.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1145, cr_loss=0.3545, attn_decoder_loss=0.2398, over 5025650.99 frames. ], batch size: 82, lr: 3.00e-03, grad_scale: 16.0 2024-09-19 11:12:28,156 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.454e+01 8.886e+01 9.286e+01 1.359e+02, threshold=1.777e+02, percent-clipped=0.0 2024-09-19 11:12:32,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=653200.0, ans=0.125 2024-09-19 11:12:54,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=653280.0, ans=0.125 2024-09-19 11:13:13,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=653320.0, ans=0.0 2024-09-19 11:13:41,669 INFO [train.py:1198] (1/2) Epoch 37, batch 450, loss[loss=0.2492, ctc_loss=0.1278, cr_loss=0.3793, attn_decoder_loss=0.2543, over 29699.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1148, cr_loss=0.3554, attn_decoder_loss=0.2401, over 5188091.46 frames. ], batch size: 83, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:13:59,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=22.5 2024-09-19 11:14:03,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653440.0, ans=0.1 2024-09-19 11:14:20,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=653480.0, ans=0.0 2024-09-19 11:14:21,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=653480.0, ans=0.1 2024-09-19 11:14:24,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=653480.0, ans=0.0 2024-09-19 11:14:37,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2024-09-19 11:14:48,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653560.0, ans=0.1 2024-09-19 11:15:00,249 INFO [train.py:1198] (1/2) Epoch 37, batch 500, loss[loss=0.259, ctc_loss=0.1389, cr_loss=0.4116, attn_decoder_loss=0.2633, over 29416.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1147, cr_loss=0.3548, attn_decoder_loss=0.2396, over 5330947.36 frames. ], batch size: 94, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:15:03,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=653600.0, ans=0.125 2024-09-19 11:15:06,232 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.426e+01 9.049e+01 9.525e+01 1.733e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 11:15:11,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=15.0 2024-09-19 11:15:17,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.63 vs. limit=22.5 2024-09-19 11:15:23,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=653640.0, ans=0.125 2024-09-19 11:15:32,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2024-09-19 11:15:42,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=653680.0, ans=0.025 2024-09-19 11:16:10,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=653760.0, ans=0.1 2024-09-19 11:16:13,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=653760.0, ans=0.125 2024-09-19 11:16:15,824 INFO [train.py:1198] (1/2) Epoch 37, batch 550, loss[loss=0.2441, ctc_loss=0.1164, cr_loss=0.348, attn_decoder_loss=0.2505, over 28888.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1146, cr_loss=0.3541, attn_decoder_loss=0.2396, over 5423903.46 frames. ], batch size: 104, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:16:16,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=653800.0, ans=0.04949747468305833 2024-09-19 11:16:44,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=653880.0, ans=0.0 2024-09-19 11:17:06,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=653920.0, ans=0.125 2024-09-19 11:17:11,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=653920.0, ans=0.1 2024-09-19 11:17:13,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=653920.0, ans=0.125 2024-09-19 11:17:24,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=653960.0, ans=0.0 2024-09-19 11:17:31,925 INFO [train.py:1198] (1/2) Epoch 37, batch 600, loss[loss=0.2545, ctc_loss=0.1316, cr_loss=0.381, attn_decoder_loss=0.2597, over 29213.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1151, cr_loss=0.3554, attn_decoder_loss=0.2402, over 5510468.09 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:17:40,202 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.494e+01 8.998e+01 9.681e+01 2.744e+02, threshold=1.800e+02, percent-clipped=3.0 2024-09-19 11:17:54,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=654040.0, ans=0.0 2024-09-19 11:18:05,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=654080.0, ans=0.04949747468305833 2024-09-19 11:18:23,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2024-09-19 11:18:51,778 INFO [train.py:1198] (1/2) Epoch 37, batch 650, loss[loss=0.2442, ctc_loss=0.1214, cr_loss=0.3661, attn_decoder_loss=0.2497, over 29778.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1147, cr_loss=0.3547, attn_decoder_loss=0.2395, over 5586798.66 frames. ], batch size: 81, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:19:04,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2024-09-19 11:19:24,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=654280.0, ans=0.025 2024-09-19 11:19:27,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654280.0, ans=0.1 2024-09-19 11:19:35,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2024-09-19 11:19:45,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=654320.0, ans=0.125 2024-09-19 11:20:07,723 INFO [train.py:1198] (1/2) Epoch 37, batch 700, loss[loss=0.2193, ctc_loss=0.1037, cr_loss=0.3372, attn_decoder_loss=0.2246, over 29519.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.115, cr_loss=0.3559, attn_decoder_loss=0.2399, over 5637503.07 frames. ], batch size: 76, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:20:13,641 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.549e+01 8.958e+01 9.415e+01 1.725e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 11:20:24,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=654440.0, ans=0.125 2024-09-19 11:20:32,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=654440.0, ans=0.025 2024-09-19 11:20:34,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-19 11:20:35,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=654440.0, ans=0.125 2024-09-19 11:20:54,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=654520.0, ans=0.2 2024-09-19 11:21:00,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=654520.0, ans=10.0 2024-09-19 11:21:05,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=654520.0, ans=0.2 2024-09-19 11:21:05,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=12.0 2024-09-19 11:21:23,307 INFO [train.py:1198] (1/2) Epoch 37, batch 750, loss[loss=0.2488, ctc_loss=0.1275, cr_loss=0.3835, attn_decoder_loss=0.2538, over 29689.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1149, cr_loss=0.3556, attn_decoder_loss=0.2398, over 5674523.66 frames. ], batch size: 82, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:21:34,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=654600.0, ans=0.125 2024-09-19 11:22:30,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654760.0, ans=0.1 2024-09-19 11:22:43,728 INFO [train.py:1198] (1/2) Epoch 37, batch 800, loss[loss=0.2151, ctc_loss=0.09286, cr_loss=0.2996, attn_decoder_loss=0.222, over 29586.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1148, cr_loss=0.3551, attn_decoder_loss=0.2397, over 5705939.18 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:22:45,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=654800.0, ans=0.125 2024-09-19 11:22:49,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.523e+01 9.017e+01 9.581e+01 2.303e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 11:23:29,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.43 vs. limit=15.0 2024-09-19 11:23:41,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=654920.0, ans=0.0 2024-09-19 11:23:58,746 INFO [train.py:1198] (1/2) Epoch 37, batch 850, loss[loss=0.2558, ctc_loss=0.1335, cr_loss=0.3915, attn_decoder_loss=0.2607, over 29711.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1145, cr_loss=0.3543, attn_decoder_loss=0.2395, over 5736217.10 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:24:31,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.94 vs. limit=15.0 2024-09-19 11:24:39,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-09-19 11:24:44,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2024-09-19 11:24:53,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=655120.0, ans=0.125 2024-09-19 11:25:04,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.93 vs. limit=10.0 2024-09-19 11:25:15,268 INFO [train.py:1198] (1/2) Epoch 37, batch 900, loss[loss=0.2102, ctc_loss=0.09577, cr_loss=0.2987, attn_decoder_loss=0.2162, over 29576.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1148, cr_loss=0.3551, attn_decoder_loss=0.2398, over 5740839.17 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:25:19,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=655200.0, ans=0.125 2024-09-19 11:25:22,644 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.623e+01 9.305e+01 9.762e+01 2.031e+02, threshold=1.861e+02, percent-clipped=1.0 2024-09-19 11:25:23,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-09-19 11:25:27,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=655200.0, ans=0.025 2024-09-19 11:25:32,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2024-09-19 11:25:41,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=655240.0, ans=0.0 2024-09-19 11:25:42,579 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:25:43,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.34 vs. limit=15.0 2024-09-19 11:26:17,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=655320.0, ans=0.0 2024-09-19 11:26:18,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=655360.0, ans=0.025 2024-09-19 11:26:23,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=655360.0, ans=0.125 2024-09-19 11:26:24,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff3.min_abs, batch_count=655360.0, ans=0.2 2024-09-19 11:26:26,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=22.5 2024-09-19 11:26:30,736 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:26:34,857 INFO [train.py:1198] (1/2) Epoch 37, batch 950, loss[loss=0.22, ctc_loss=0.1066, cr_loss=0.3192, attn_decoder_loss=0.2255, over 29529.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1148, cr_loss=0.3553, attn_decoder_loss=0.2399, over 5743013.62 frames. ], batch size: 74, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:27:00,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=655440.0, ans=0.0 2024-09-19 11:27:11,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=655480.0, ans=0.125 2024-09-19 11:27:19,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.56 vs. limit=22.5 2024-09-19 11:27:20,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=655520.0, ans=0.1 2024-09-19 11:27:25,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-19 11:27:41,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=655560.0, ans=0.125 2024-09-19 11:27:50,314 INFO [train.py:1198] (1/2) Epoch 37, batch 1000, loss[loss=0.2326, ctc_loss=0.1158, cr_loss=0.3546, attn_decoder_loss=0.2376, over 29489.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1156, cr_loss=0.3571, attn_decoder_loss=0.2408, over 5735816.89 frames. ], batch size: 77, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:27:57,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.810e+01 9.265e+01 9.999e+01 4.241e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-19 11:27:59,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=655600.0, ans=0.1 2024-09-19 11:28:00,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=655600.0, ans=0.125 2024-09-19 11:28:04,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=655640.0, ans=0.05 2024-09-19 11:28:13,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=655640.0, ans=0.125 2024-09-19 11:28:16,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=655640.0, ans=0.0 2024-09-19 11:28:18,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.74 vs. limit=22.5 2024-09-19 11:28:43,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=655720.0, ans=0.125 2024-09-19 11:28:58,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=655760.0, ans=0.0 2024-09-19 11:29:06,001 INFO [train.py:1198] (1/2) Epoch 37, batch 1050, loss[loss=0.2403, ctc_loss=0.1149, cr_loss=0.357, attn_decoder_loss=0.2463, over 29686.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1151, cr_loss=0.3562, attn_decoder_loss=0.2401, over 5743772.94 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:29:18,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=655800.0, ans=0.0 2024-09-19 11:29:39,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=655880.0, ans=0.125 2024-09-19 11:29:52,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=655880.0, ans=0.025 2024-09-19 11:29:56,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=655920.0, ans=0.0 2024-09-19 11:29:58,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=655920.0, ans=0.125 2024-09-19 11:30:04,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=655920.0, ans=0.2 2024-09-19 11:30:16,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.47 vs. limit=15.0 2024-09-19 11:30:33,919 INFO [train.py:1198] (1/2) Epoch 37, batch 1100, loss[loss=0.2346, ctc_loss=0.1164, cr_loss=0.383, attn_decoder_loss=0.2392, over 29452.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.115, cr_loss=0.356, attn_decoder_loss=0.2398, over 5757055.82 frames. ], batch size: 78, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:30:41,295 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.502e+01 8.949e+01 9.455e+01 1.229e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-19 11:30:43,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=656000.0, ans=0.125 2024-09-19 11:31:07,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=656080.0, ans=0.0 2024-09-19 11:31:12,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2024-09-19 11:31:25,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=656120.0, ans=0.0 2024-09-19 11:31:39,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.39 vs. limit=15.0 2024-09-19 11:31:41,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=656160.0, ans=0.125 2024-09-19 11:31:48,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=656200.0, ans=0.0 2024-09-19 11:31:49,466 INFO [train.py:1198] (1/2) Epoch 37, batch 1150, loss[loss=0.2329, ctc_loss=0.1143, cr_loss=0.3545, attn_decoder_loss=0.2382, over 29431.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1151, cr_loss=0.3561, attn_decoder_loss=0.2397, over 5754441.94 frames. ], batch size: 78, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:31:56,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2024-09-19 11:32:07,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=656240.0, ans=0.025 2024-09-19 11:32:09,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.34 vs. limit=10.0 2024-09-19 11:32:20,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=656280.0, ans=0.125 2024-09-19 11:32:26,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=656280.0, ans=0.125 2024-09-19 11:32:27,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=656280.0, ans=0.125 2024-09-19 11:32:32,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.73 vs. limit=5.0 2024-09-19 11:32:43,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=656320.0, ans=0.125 2024-09-19 11:33:05,436 INFO [train.py:1198] (1/2) Epoch 37, batch 1200, loss[loss=0.2386, ctc_loss=0.1065, cr_loss=0.3299, attn_decoder_loss=0.2459, over 29674.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1157, cr_loss=0.3571, attn_decoder_loss=0.2406, over 5747126.19 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:33:12,938 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.756e+01 9.143e+01 9.785e+01 1.884e+02, threshold=1.829e+02, percent-clipped=2.0 2024-09-19 11:33:31,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=656440.0, ans=0.125 2024-09-19 11:33:34,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=656440.0, ans=0.2 2024-09-19 11:33:38,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=656480.0, ans=0.0 2024-09-19 11:33:43,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=656480.0, ans=0.125 2024-09-19 11:33:57,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=656520.0, ans=0.0 2024-09-19 11:34:25,537 INFO [train.py:1198] (1/2) Epoch 37, batch 1250, loss[loss=0.2532, ctc_loss=0.132, cr_loss=0.3922, attn_decoder_loss=0.258, over 29525.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1163, cr_loss=0.3588, attn_decoder_loss=0.2415, over 5775379.36 frames. ], batch size: 92, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:34:27,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=656600.0, ans=0.025 2024-09-19 11:35:22,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656720.0, ans=0.1 2024-09-19 11:35:23,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-09-19 11:35:41,474 INFO [train.py:1198] (1/2) Epoch 37, batch 1300, loss[loss=0.2477, ctc_loss=0.1176, cr_loss=0.3461, attn_decoder_loss=0.2544, over 28267.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1159, cr_loss=0.358, attn_decoder_loss=0.2409, over 5780586.37 frames. ], batch size: 111, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:35:42,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-09-19 11:35:49,094 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.478e+01 8.951e+01 9.333e+01 1.111e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-19 11:35:58,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=656840.0, ans=0.0 2024-09-19 11:36:03,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=656840.0, ans=0.0 2024-09-19 11:36:16,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=656880.0, ans=0.125 2024-09-19 11:36:36,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=656920.0, ans=0.125 2024-09-19 11:36:56,962 INFO [train.py:1198] (1/2) Epoch 37, batch 1350, loss[loss=0.2329, ctc_loss=0.1097, cr_loss=0.3341, attn_decoder_loss=0.2391, over 29772.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1155, cr_loss=0.357, attn_decoder_loss=0.2407, over 5797605.97 frames. ], batch size: 81, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:37:10,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=657040.0, ans=10.0 2024-09-19 11:37:15,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.96 vs. limit=22.5 2024-09-19 11:37:16,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=657040.0, ans=0.0 2024-09-19 11:37:17,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=657040.0, ans=0.125 2024-09-19 11:37:19,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=657040.0, ans=0.1 2024-09-19 11:37:32,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=657080.0, ans=0.125 2024-09-19 11:37:37,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=657080.0, ans=0.0 2024-09-19 11:37:47,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=657120.0, ans=0.125 2024-09-19 11:38:12,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=657160.0, ans=0.125 2024-09-19 11:38:16,416 INFO [train.py:1198] (1/2) Epoch 37, batch 1400, loss[loss=0.213, ctc_loss=0.09876, cr_loss=0.3317, attn_decoder_loss=0.2183, over 29570.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1152, cr_loss=0.3561, attn_decoder_loss=0.2401, over 5808426.20 frames. ], batch size: 69, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:38:25,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.397e+01 9.027e+01 9.734e+01 1.349e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 11:38:33,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-19 11:38:38,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=657240.0, ans=0.025 2024-09-19 11:38:54,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.47 vs. limit=15.0 2024-09-19 11:39:32,000 INFO [train.py:1198] (1/2) Epoch 37, batch 1450, loss[loss=0.2569, ctc_loss=0.1314, cr_loss=0.391, attn_decoder_loss=0.2621, over 29436.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1149, cr_loss=0.3558, attn_decoder_loss=0.2404, over 5806162.67 frames. ], batch size: 94, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:39:45,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=15.0 2024-09-19 11:39:45,939 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:40:04,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.57 vs. limit=6.0 2024-09-19 11:40:17,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=657520.0, ans=0.09899494936611666 2024-09-19 11:40:25,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=657520.0, ans=0.125 2024-09-19 11:40:37,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=657560.0, ans=0.125 2024-09-19 11:40:48,387 INFO [train.py:1198] (1/2) Epoch 37, batch 1500, loss[loss=0.2435, ctc_loss=0.1158, cr_loss=0.3676, attn_decoder_loss=0.2495, over 29615.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.115, cr_loss=0.3559, attn_decoder_loss=0.2406, over 5807290.01 frames. ], batch size: 86, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:40:57,335 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.474e+01 8.863e+01 9.485e+01 5.565e+02, threshold=1.773e+02, percent-clipped=3.0 2024-09-19 11:40:57,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=657600.0, ans=0.025 2024-09-19 11:40:57,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=657600.0, ans=0.0 2024-09-19 11:41:31,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.58 vs. limit=10.0 2024-09-19 11:41:33,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=657680.0, ans=0.0 2024-09-19 11:41:46,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=12.0 2024-09-19 11:42:07,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.40 vs. limit=15.0 2024-09-19 11:42:07,915 INFO [train.py:1198] (1/2) Epoch 37, batch 1550, loss[loss=0.2517, ctc_loss=0.134, cr_loss=0.4114, attn_decoder_loss=0.2556, over 29519.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1156, cr_loss=0.357, attn_decoder_loss=0.2406, over 5783013.75 frames. ], batch size: 90, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:42:32,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=657840.0, ans=0.025 2024-09-19 11:42:39,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=657880.0, ans=0.2 2024-09-19 11:42:56,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=657920.0, ans=0.125 2024-09-19 11:43:23,367 INFO [train.py:1198] (1/2) Epoch 37, batch 1600, loss[loss=0.2358, ctc_loss=0.1138, cr_loss=0.3643, attn_decoder_loss=0.2413, over 29668.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1156, cr_loss=0.3571, attn_decoder_loss=0.2403, over 5765563.60 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:43:26,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=658000.0, ans=0.125 2024-09-19 11:43:29,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=658000.0, ans=0.0 2024-09-19 11:43:32,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.772e+01 8.728e+01 9.146e+01 9.748e+01 2.180e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-19 11:43:55,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=658080.0, ans=0.0 2024-09-19 11:44:02,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.12 vs. limit=10.0 2024-09-19 11:44:33,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=658160.0, ans=0.125 2024-09-19 11:44:33,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=658160.0, ans=0.0 2024-09-19 11:44:38,984 INFO [train.py:1198] (1/2) Epoch 37, batch 1650, loss[loss=0.2475, ctc_loss=0.1235, cr_loss=0.3745, attn_decoder_loss=0.2529, over 29704.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1154, cr_loss=0.3567, attn_decoder_loss=0.2402, over 5760472.02 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:44:49,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=658200.0, ans=0.125 2024-09-19 11:44:58,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=658240.0, ans=0.0 2024-09-19 11:44:59,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=658240.0, ans=0.2 2024-09-19 11:45:44,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=658360.0, ans=0.125 2024-09-19 11:45:47,124 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:45:59,547 INFO [train.py:1198] (1/2) Epoch 37, batch 1700, loss[loss=0.2116, ctc_loss=0.09662, cr_loss=0.305, attn_decoder_loss=0.2176, over 29537.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1153, cr_loss=0.3563, attn_decoder_loss=0.24, over 5782722.06 frames. ], batch size: 69, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:46:03,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2024-09-19 11:46:10,138 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.607e+01 9.068e+01 9.479e+01 1.872e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-19 11:46:14,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=658440.0, ans=0.0 2024-09-19 11:46:28,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=658480.0, ans=0.125 2024-09-19 11:46:34,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=658480.0, ans=0.125 2024-09-19 11:46:42,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.07 vs. limit=22.5 2024-09-19 11:46:49,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=658520.0, ans=0.0 2024-09-19 11:46:53,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-09-19 11:47:13,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=658600.0, ans=0.125 2024-09-19 11:47:15,006 INFO [train.py:1198] (1/2) Epoch 37, batch 1750, loss[loss=0.2182, ctc_loss=0.1096, cr_loss=0.3564, attn_decoder_loss=0.2224, over 29362.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.115, cr_loss=0.3555, attn_decoder_loss=0.2394, over 5790041.67 frames. ], batch size: 67, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:47:20,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.47 vs. limit=15.0 2024-09-19 11:47:51,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=658680.0, ans=0.0 2024-09-19 11:48:05,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=658720.0, ans=0.125 2024-09-19 11:48:11,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=658720.0, ans=0.2 2024-09-19 11:48:24,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=658760.0, ans=0.0 2024-09-19 11:48:26,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-19 11:48:30,138 INFO [train.py:1198] (1/2) Epoch 37, batch 1800, loss[loss=0.2369, ctc_loss=0.1124, cr_loss=0.359, attn_decoder_loss=0.2428, over 29699.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1151, cr_loss=0.3559, attn_decoder_loss=0.2397, over 5792001.26 frames. ], batch size: 83, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:48:40,895 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.421e+01 8.885e+01 9.322e+01 2.627e+02, threshold=1.777e+02, percent-clipped=1.0 2024-09-19 11:49:44,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-19 11:49:50,129 INFO [train.py:1198] (1/2) Epoch 37, batch 1850, loss[loss=0.2404, ctc_loss=0.1174, cr_loss=0.3439, attn_decoder_loss=0.2464, over 29642.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.115, cr_loss=0.3554, attn_decoder_loss=0.2398, over 5796935.36 frames. ], batch size: 86, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:50:35,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=659120.0, ans=0.125 2024-09-19 11:50:52,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=659160.0, ans=0.1 2024-09-19 11:50:53,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=659160.0, ans=0.2 2024-09-19 11:51:00,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.17 vs. limit=10.0 2024-09-19 11:51:02,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=659160.0, ans=0.1 2024-09-19 11:51:05,865 INFO [train.py:1198] (1/2) Epoch 37, batch 1900, loss[loss=0.2409, ctc_loss=0.1117, cr_loss=0.3603, attn_decoder_loss=0.2473, over 29734.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1154, cr_loss=0.3564, attn_decoder_loss=0.2404, over 5805152.84 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 11:51:11,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.63 vs. limit=15.0 2024-09-19 11:51:16,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 8.529e+01 8.942e+01 9.570e+01 1.575e+02, threshold=1.788e+02, percent-clipped=0.0 2024-09-19 11:51:48,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.62 vs. limit=10.0 2024-09-19 11:52:09,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=659360.0, ans=0.125 2024-09-19 11:52:21,354 INFO [train.py:1198] (1/2) Epoch 37, batch 1950, loss[loss=0.2287, ctc_loss=0.1145, cr_loss=0.3634, attn_decoder_loss=0.2333, over 29433.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1156, cr_loss=0.3573, attn_decoder_loss=0.2413, over 5820317.73 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 11:52:26,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-19 11:52:33,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=659400.0, ans=0.05 2024-09-19 11:52:38,478 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:52:46,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=22.5 2024-09-19 11:53:01,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=659480.0, ans=0.125 2024-09-19 11:53:09,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=659520.0, ans=0.0 2024-09-19 11:53:17,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=659520.0, ans=0.125 2024-09-19 11:53:17,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659520.0, ans=0.1 2024-09-19 11:53:18,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=659520.0, ans=0.125 2024-09-19 11:53:24,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=659560.0, ans=0.025 2024-09-19 11:53:41,382 INFO [train.py:1198] (1/2) Epoch 37, batch 2000, loss[loss=0.2124, ctc_loss=0.1007, cr_loss=0.3357, attn_decoder_loss=0.2174, over 29341.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1163, cr_loss=0.3588, attn_decoder_loss=0.242, over 5798556.15 frames. ], batch size: 67, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 11:53:47,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=659600.0, ans=0.125 2024-09-19 11:53:51,965 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.756e+01 9.402e+01 9.802e+01 1.853e+02, threshold=1.880e+02, percent-clipped=1.0 2024-09-19 11:53:55,525 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:53:55,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=659640.0, ans=15.0 2024-09-19 11:54:05,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=659640.0, ans=0.0 2024-09-19 11:54:08,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=659640.0, ans=0.035 2024-09-19 11:54:11,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=659680.0, ans=0.0 2024-09-19 11:54:56,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=659800.0, ans=0.0 2024-09-19 11:54:57,539 INFO [train.py:1198] (1/2) Epoch 37, batch 2050, loss[loss=0.2031, ctc_loss=0.09512, cr_loss=0.322, attn_decoder_loss=0.2079, over 29431.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1156, cr_loss=0.3568, attn_decoder_loss=0.2409, over 5790919.87 frames. ], batch size: 70, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 11:54:58,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.47 vs. limit=15.0 2024-09-19 11:55:32,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=659880.0, ans=0.125 2024-09-19 11:55:34,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=659880.0, ans=0.125 2024-09-19 11:55:47,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=659920.0, ans=0.0 2024-09-19 11:55:50,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=659920.0, ans=0.125 2024-09-19 11:56:13,779 INFO [train.py:1198] (1/2) Epoch 37, batch 2100, loss[loss=0.2305, ctc_loss=0.1127, cr_loss=0.3578, attn_decoder_loss=0.2356, over 29778.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1148, cr_loss=0.3553, attn_decoder_loss=0.2401, over 5803241.60 frames. ], batch size: 81, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 11:56:19,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=660000.0, ans=0.125 2024-09-19 11:56:24,222 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.399e+01 8.911e+01 9.542e+01 1.204e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-19 11:56:24,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660000.0, ans=0.125 2024-09-19 11:56:26,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660000.0, ans=0.1 2024-09-19 11:56:26,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.68 vs. limit=15.0 2024-09-19 11:56:36,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-19 11:56:54,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2024-09-19 11:57:11,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=660120.0, ans=0.1 2024-09-19 11:57:26,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660160.0, ans=0.1 2024-09-19 11:57:27,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660160.0, ans=0.125 2024-09-19 11:57:33,572 INFO [train.py:1198] (1/2) Epoch 37, batch 2150, loss[loss=0.2292, ctc_loss=0.1097, cr_loss=0.3502, attn_decoder_loss=0.2346, over 29491.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1145, cr_loss=0.3549, attn_decoder_loss=0.2399, over 5817613.30 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 11:57:52,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2024-09-19 11:58:04,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=660280.0, ans=0.2 2024-09-19 11:58:18,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=12.0 2024-09-19 11:58:35,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=660360.0, ans=0.0 2024-09-19 11:58:48,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=660400.0, ans=0.125 2024-09-19 11:58:49,161 INFO [train.py:1198] (1/2) Epoch 37, batch 2200, loss[loss=0.2502, ctc_loss=0.1241, cr_loss=0.3754, attn_decoder_loss=0.2559, over 29637.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1152, cr_loss=0.3563, attn_decoder_loss=0.2403, over 5813499.24 frames. ], batch size: 86, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 11:58:50,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2024-09-19 11:59:01,231 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.378e+01 8.935e+01 9.603e+01 1.294e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-19 11:59:53,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=660560.0, ans=0.0 2024-09-19 12:00:03,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=660600.0, ans=0.07 2024-09-19 12:00:03,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.63 vs. limit=22.5 2024-09-19 12:00:04,885 INFO [train.py:1198] (1/2) Epoch 37, batch 2250, loss[loss=0.2388, ctc_loss=0.1159, cr_loss=0.3564, attn_decoder_loss=0.2445, over 29700.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1148, cr_loss=0.3559, attn_decoder_loss=0.2401, over 5811135.17 frames. ], batch size: 82, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:00:21,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=660640.0, ans=0.125 2024-09-19 12:00:52,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=660720.0, ans=0.95 2024-09-19 12:01:21,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=660760.0, ans=0.0 2024-09-19 12:01:25,027 INFO [train.py:1198] (1/2) Epoch 37, batch 2300, loss[loss=0.2036, ctc_loss=0.0903, cr_loss=0.2931, attn_decoder_loss=0.2097, over 29318.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1142, cr_loss=0.354, attn_decoder_loss=0.2392, over 5799440.25 frames. ], batch size: 71, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:01:36,927 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.548e+01 9.077e+01 9.950e+01 1.821e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 12:02:09,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=660920.0, ans=0.0 2024-09-19 12:02:18,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660920.0, ans=0.125 2024-09-19 12:02:23,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=660920.0, ans=0.0 2024-09-19 12:02:29,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=660960.0, ans=0.0 2024-09-19 12:02:41,027 INFO [train.py:1198] (1/2) Epoch 37, batch 2350, loss[loss=0.2388, ctc_loss=0.1147, cr_loss=0.3594, attn_decoder_loss=0.2446, over 29702.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1144, cr_loss=0.3552, attn_decoder_loss=0.2393, over 5804777.07 frames. ], batch size: 83, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:02:47,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=661000.0, ans=0.0 2024-09-19 12:03:19,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661080.0, ans=0.1 2024-09-19 12:03:25,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=661120.0, ans=0.125 2024-09-19 12:03:32,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=661120.0, ans=0.125 2024-09-19 12:03:41,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2024-09-19 12:03:52,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=661160.0, ans=0.1 2024-09-19 12:03:56,970 INFO [train.py:1198] (1/2) Epoch 37, batch 2400, loss[loss=0.2272, ctc_loss=0.1166, cr_loss=0.3519, attn_decoder_loss=0.2317, over 29544.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1151, cr_loss=0.356, attn_decoder_loss=0.2399, over 5809132.27 frames. ], batch size: 76, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:04:08,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 8.594e+01 9.080e+01 9.693e+01 1.252e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 12:04:10,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=661240.0, ans=0.0 2024-09-19 12:04:22,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=661240.0, ans=0.125 2024-09-19 12:04:23,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=661240.0, ans=0.025 2024-09-19 12:04:23,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=661240.0, ans=0.2 2024-09-19 12:04:32,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=661280.0, ans=0.125 2024-09-19 12:04:44,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=661320.0, ans=0.125 2024-09-19 12:05:17,004 INFO [train.py:1198] (1/2) Epoch 37, batch 2450, loss[loss=0.238, ctc_loss=0.1188, cr_loss=0.355, attn_decoder_loss=0.2433, over 29704.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1155, cr_loss=0.3571, attn_decoder_loss=0.2407, over 5786171.16 frames. ], batch size: 82, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:05:21,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=661400.0, ans=0.125 2024-09-19 12:05:21,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=661400.0, ans=0.125 2024-09-19 12:05:25,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=661400.0, ans=0.125 2024-09-19 12:05:26,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=661400.0, ans=0.125 2024-09-19 12:05:29,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661400.0, ans=0.1 2024-09-19 12:05:34,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.38 vs. limit=22.5 2024-09-19 12:06:04,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=661520.0, ans=0.0 2024-09-19 12:06:10,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=661520.0, ans=0.125 2024-09-19 12:06:19,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=661560.0, ans=0.0 2024-09-19 12:06:21,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=661560.0, ans=0.125 2024-09-19 12:06:23,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.52 vs. limit=15.0 2024-09-19 12:06:28,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=661560.0, ans=0.125 2024-09-19 12:06:33,566 INFO [train.py:1198] (1/2) Epoch 37, batch 2500, loss[loss=0.2439, ctc_loss=0.1224, cr_loss=0.3539, attn_decoder_loss=0.2495, over 29606.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1162, cr_loss=0.3585, attn_decoder_loss=0.2409, over 5796443.66 frames. ], batch size: 86, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:06:44,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=661600.0, ans=0.125 2024-09-19 12:06:45,678 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.640e+01 9.238e+01 1.003e+02 4.668e+02, threshold=1.848e+02, percent-clipped=4.0 2024-09-19 12:06:53,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661640.0, ans=0.1 2024-09-19 12:07:10,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=661680.0, ans=0.125 2024-09-19 12:07:12,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.00 vs. limit=10.0 2024-09-19 12:07:13,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=661680.0, ans=0.2 2024-09-19 12:07:31,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=661720.0, ans=0.0 2024-09-19 12:07:33,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=661760.0, ans=0.125 2024-09-19 12:07:46,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=661760.0, ans=0.2 2024-09-19 12:07:49,438 INFO [train.py:1198] (1/2) Epoch 37, batch 2550, loss[loss=0.2086, ctc_loss=0.09405, cr_loss=0.3084, attn_decoder_loss=0.2145, over 29360.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1156, cr_loss=0.3571, attn_decoder_loss=0.2406, over 5799734.17 frames. ], batch size: 67, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:07:58,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=661800.0, ans=0.2 2024-09-19 12:08:00,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=22.5 2024-09-19 12:08:06,411 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:08:12,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=661840.0, ans=0.125 2024-09-19 12:08:23,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=661880.0, ans=0.125 2024-09-19 12:08:28,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-09-19 12:08:35,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=661920.0, ans=0.125 2024-09-19 12:08:48,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=661920.0, ans=0.125 2024-09-19 12:09:00,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=661960.0, ans=0.05 2024-09-19 12:09:04,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=661960.0, ans=0.035 2024-09-19 12:09:07,398 INFO [train.py:1198] (1/2) Epoch 37, batch 2600, loss[loss=0.2324, ctc_loss=0.1066, cr_loss=0.3465, attn_decoder_loss=0.2387, over 29433.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1155, cr_loss=0.3572, attn_decoder_loss=0.2409, over 5796823.92 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:09:13,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2024-09-19 12:09:21,435 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 8.481e+01 8.933e+01 9.512e+01 2.457e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-19 12:09:21,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=662000.0, ans=0.125 2024-09-19 12:09:21,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=662000.0, ans=0.025 2024-09-19 12:09:26,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=662040.0, ans=0.0 2024-09-19 12:09:32,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-09-19 12:10:23,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=662200.0, ans=0.125 2024-09-19 12:10:23,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.49 vs. limit=12.0 2024-09-19 12:10:24,382 INFO [train.py:1198] (1/2) Epoch 37, batch 2650, loss[loss=0.2482, ctc_loss=0.1225, cr_loss=0.3758, attn_decoder_loss=0.2539, over 29291.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1157, cr_loss=0.3573, attn_decoder_loss=0.2411, over 5802405.51 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:10:26,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=662200.0, ans=0.0 2024-09-19 12:10:35,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=662200.0, ans=0.125 2024-09-19 12:10:42,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=662240.0, ans=0.125 2024-09-19 12:10:58,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=662280.0, ans=0.2 2024-09-19 12:11:09,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=662320.0, ans=0.0 2024-09-19 12:11:37,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=662360.0, ans=0.125 2024-09-19 12:11:40,389 INFO [train.py:1198] (1/2) Epoch 37, batch 2700, loss[loss=0.2337, ctc_loss=0.1148, cr_loss=0.357, attn_decoder_loss=0.239, over 29494.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1158, cr_loss=0.357, attn_decoder_loss=0.2412, over 5797558.37 frames. ], batch size: 87, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:11:52,431 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.575e+01 9.095e+01 9.529e+01 6.705e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-19 12:11:55,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=662440.0, ans=0.0 2024-09-19 12:12:30,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=662520.0, ans=0.125 2024-09-19 12:12:36,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.05 vs. limit=6.0 2024-09-19 12:12:39,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2024-09-19 12:12:58,483 INFO [train.py:1198] (1/2) Epoch 37, batch 2750, loss[loss=0.2291, ctc_loss=0.1126, cr_loss=0.3536, attn_decoder_loss=0.2342, over 29510.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1146, cr_loss=0.355, attn_decoder_loss=0.2399, over 5796664.41 frames. ], batch size: 75, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:13:04,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=662600.0, ans=0.015 2024-09-19 12:13:26,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=662640.0, ans=0.125 2024-09-19 12:13:28,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=662640.0, ans=0.125 2024-09-19 12:13:28,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=662640.0, ans=0.025 2024-09-19 12:13:28,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=662640.0, ans=0.125 2024-09-19 12:13:41,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=662680.0, ans=0.0 2024-09-19 12:14:04,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-09-19 12:14:12,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-09-19 12:14:13,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=662760.0, ans=10.0 2024-09-19 12:14:15,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=662800.0, ans=0.5 2024-09-19 12:14:16,755 INFO [train.py:1198] (1/2) Epoch 37, batch 2800, loss[loss=0.2441, ctc_loss=0.1271, cr_loss=0.3512, attn_decoder_loss=0.2493, over 20405.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1151, cr_loss=0.3559, attn_decoder_loss=0.24, over 5777905.33 frames. ], batch size: 210, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 12:14:24,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=662800.0, ans=0.2 2024-09-19 12:14:27,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=662800.0, ans=0.125 2024-09-19 12:14:30,283 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.502e+01 8.910e+01 9.403e+01 2.471e+02, threshold=1.782e+02, percent-clipped=1.0 2024-09-19 12:14:50,314 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:15:06,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=662920.0, ans=0.07 2024-09-19 12:15:08,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=662920.0, ans=0.2 2024-09-19 12:15:32,005 INFO [train.py:1198] (1/2) Epoch 37, batch 2850, loss[loss=0.2245, ctc_loss=0.1003, cr_loss=0.3194, attn_decoder_loss=0.2312, over 29520.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1153, cr_loss=0.3556, attn_decoder_loss=0.2404, over 5763305.30 frames. ], batch size: 77, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:15:50,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2024-09-19 12:16:42,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=663160.0, ans=0.125 2024-09-19 12:16:50,711 INFO [train.py:1198] (1/2) Epoch 37, batch 2900, loss[loss=0.2274, ctc_loss=0.1032, cr_loss=0.333, attn_decoder_loss=0.2338, over 29427.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1164, cr_loss=0.3585, attn_decoder_loss=0.2419, over 5789212.11 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:16:50,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663200.0, ans=0.1 2024-09-19 12:17:07,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.541e+01 8.975e+01 9.658e+01 1.927e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-19 12:17:10,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2024-09-19 12:17:15,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=663240.0, ans=0.2 2024-09-19 12:17:29,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=663280.0, ans=0.125 2024-09-19 12:17:33,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=663280.0, ans=0.125 2024-09-19 12:17:33,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=663280.0, ans=10.0 2024-09-19 12:17:38,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=15.0 2024-09-19 12:17:53,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=663360.0, ans=0.125 2024-09-19 12:17:59,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.69 vs. limit=15.0 2024-09-19 12:18:02,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=663360.0, ans=0.0 2024-09-19 12:18:04,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.80 vs. limit=12.0 2024-09-19 12:18:07,937 INFO [train.py:1198] (1/2) Epoch 37, batch 2950, loss[loss=0.226, ctc_loss=0.09673, cr_loss=0.3194, attn_decoder_loss=0.2332, over 29519.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1151, cr_loss=0.3555, attn_decoder_loss=0.2406, over 5782382.99 frames. ], batch size: 75, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:18:17,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=663400.0, ans=0.0 2024-09-19 12:18:38,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=663480.0, ans=0.2 2024-09-19 12:18:49,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=663480.0, ans=0.2 2024-09-19 12:18:59,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=663520.0, ans=0.125 2024-09-19 12:19:17,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=663560.0, ans=0.0 2024-09-19 12:19:23,862 INFO [train.py:1198] (1/2) Epoch 37, batch 3000, loss[loss=0.2306, ctc_loss=0.1083, cr_loss=0.33, attn_decoder_loss=0.2368, over 29749.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1144, cr_loss=0.3546, attn_decoder_loss=0.2402, over 5782984.52 frames. ], batch size: 81, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:19:23,862 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 12:19:42,338 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1714, 4.7660, 4.6118, 4.3581], device='cuda:1') 2024-09-19 12:19:43,125 INFO [train.py:1230] (1/2) Epoch 37, validation: loss=0.212, ctc_loss=0.03675, cr_loss=6.305e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-19 12:19:43,126 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 12:19:48,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=663600.0, ans=0.125 2024-09-19 12:19:48,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=663600.0, ans=0.125 2024-09-19 12:19:58,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.507e+01 8.935e+01 9.407e+01 3.949e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-19 12:20:15,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=663680.0, ans=0.2 2024-09-19 12:20:36,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=663720.0, ans=0.1 2024-09-19 12:21:01,306 INFO [train.py:1198] (1/2) Epoch 37, batch 3050, loss[loss=0.2243, ctc_loss=0.1039, cr_loss=0.3336, attn_decoder_loss=0.2303, over 29536.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1152, cr_loss=0.3566, attn_decoder_loss=0.2409, over 5777027.38 frames. ], batch size: 76, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:21:07,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2024-09-19 12:21:23,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.01 vs. limit=15.0 2024-09-19 12:21:30,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=663880.0, ans=0.1 2024-09-19 12:21:42,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=663880.0, ans=0.125 2024-09-19 12:22:17,490 INFO [train.py:1198] (1/2) Epoch 37, batch 3100, loss[loss=0.2543, ctc_loss=0.1368, cr_loss=0.4001, attn_decoder_loss=0.2585, over 29255.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1155, cr_loss=0.3568, attn_decoder_loss=0.2408, over 5776733.47 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:22:25,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=664000.0, ans=0.125 2024-09-19 12:22:32,775 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.715e+01 9.369e+01 9.767e+01 1.782e+02, threshold=1.874e+02, percent-clipped=0.0 2024-09-19 12:22:41,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-19 12:22:49,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=664080.0, ans=0.125 2024-09-19 12:23:05,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=22.5 2024-09-19 12:23:35,721 INFO [train.py:1198] (1/2) Epoch 37, batch 3150, loss[loss=0.2488, ctc_loss=0.1242, cr_loss=0.3742, attn_decoder_loss=0.2544, over 28858.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1154, cr_loss=0.3567, attn_decoder_loss=0.2409, over 5783011.06 frames. ], batch size: 104, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:23:43,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=664200.0, ans=0.125 2024-09-19 12:23:51,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.68 vs. limit=15.0 2024-09-19 12:24:17,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=664280.0, ans=0.0 2024-09-19 12:24:31,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=664320.0, ans=0.125 2024-09-19 12:24:32,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=22.5 2024-09-19 12:24:33,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=664320.0, ans=0.0 2024-09-19 12:24:34,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2024-09-19 12:24:53,361 INFO [train.py:1198] (1/2) Epoch 37, batch 3200, loss[loss=0.2307, ctc_loss=0.1166, cr_loss=0.3692, attn_decoder_loss=0.2352, over 29416.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1149, cr_loss=0.3554, attn_decoder_loss=0.2401, over 5792940.91 frames. ], batch size: 79, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:24:55,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.70 vs. limit=22.5 2024-09-19 12:25:04,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=664400.0, ans=0.0 2024-09-19 12:25:08,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.621e+01 9.120e+01 9.766e+01 2.704e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 12:25:34,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=664480.0, ans=0.125 2024-09-19 12:25:55,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=8.0 2024-09-19 12:26:00,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=664560.0, ans=0.125 2024-09-19 12:26:07,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.76 vs. limit=15.0 2024-09-19 12:26:07,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.65 vs. limit=15.0 2024-09-19 12:26:09,283 INFO [train.py:1198] (1/2) Epoch 37, batch 3250, loss[loss=0.2405, ctc_loss=0.1179, cr_loss=0.3514, attn_decoder_loss=0.2463, over 29696.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1147, cr_loss=0.3555, attn_decoder_loss=0.2404, over 5800251.77 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:26:09,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=664600.0, ans=0.125 2024-09-19 12:26:15,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=664600.0, ans=0.125 2024-09-19 12:26:26,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=664640.0, ans=0.125 2024-09-19 12:26:57,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=664720.0, ans=0.05 2024-09-19 12:27:17,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=664760.0, ans=0.2 2024-09-19 12:27:27,407 INFO [train.py:1198] (1/2) Epoch 37, batch 3300, loss[loss=0.2405, ctc_loss=0.1155, cr_loss=0.367, attn_decoder_loss=0.2462, over 28282.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1141, cr_loss=0.3541, attn_decoder_loss=0.2393, over 5798665.45 frames. ], batch size: 111, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:27:33,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=664800.0, ans=0.0 2024-09-19 12:27:42,599 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.526e+01 9.078e+01 9.888e+01 1.961e+02, threshold=1.816e+02, percent-clipped=2.0 2024-09-19 12:27:42,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=664840.0, ans=0.125 2024-09-19 12:27:56,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=664880.0, ans=0.1 2024-09-19 12:27:59,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=664880.0, ans=0.125 2024-09-19 12:28:07,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.56 vs. limit=22.5 2024-09-19 12:28:08,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=664880.0, ans=0.125 2024-09-19 12:28:43,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=665000.0, ans=0.0 2024-09-19 12:28:45,005 INFO [train.py:1198] (1/2) Epoch 37, batch 3350, loss[loss=0.2482, ctc_loss=0.1245, cr_loss=0.3855, attn_decoder_loss=0.2534, over 28855.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1154, cr_loss=0.3568, attn_decoder_loss=0.2405, over 5774825.08 frames. ], batch size: 104, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:28:54,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=665000.0, ans=0.125 2024-09-19 12:28:55,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=665000.0, ans=0.125 2024-09-19 12:29:01,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=665040.0, ans=0.125 2024-09-19 12:29:10,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=665040.0, ans=0.0 2024-09-19 12:29:22,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-19 12:29:29,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2024-09-19 12:29:50,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=665160.0, ans=0.0 2024-09-19 12:30:00,464 INFO [train.py:1198] (1/2) Epoch 37, batch 3400, loss[loss=0.213, ctc_loss=0.1005, cr_loss=0.3258, attn_decoder_loss=0.2183, over 29319.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1158, cr_loss=0.3572, attn_decoder_loss=0.2405, over 5766111.87 frames. ], batch size: 67, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:30:17,145 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.762e+01 9.202e+01 9.777e+01 2.648e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-19 12:30:32,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=665280.0, ans=0.0 2024-09-19 12:30:54,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=665320.0, ans=0.125 2024-09-19 12:31:05,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.69 vs. limit=10.0 2024-09-19 12:31:06,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=665360.0, ans=0.125 2024-09-19 12:31:18,358 INFO [train.py:1198] (1/2) Epoch 37, batch 3450, loss[loss=0.2396, ctc_loss=0.1144, cr_loss=0.3497, attn_decoder_loss=0.2458, over 28370.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1154, cr_loss=0.3564, attn_decoder_loss=0.2406, over 5774464.72 frames. ], batch size: 112, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:31:27,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=665400.0, ans=0.125 2024-09-19 12:31:56,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=665480.0, ans=0.025 2024-09-19 12:32:07,336 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:32:07,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.58 vs. limit=10.0 2024-09-19 12:32:36,778 INFO [train.py:1198] (1/2) Epoch 37, batch 3500, loss[loss=0.208, ctc_loss=0.09652, cr_loss=0.3171, attn_decoder_loss=0.2134, over 29280.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.115, cr_loss=0.3556, attn_decoder_loss=0.2401, over 5776719.45 frames. ], batch size: 71, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:32:43,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2024-09-19 12:32:52,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-19 12:32:53,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 8.561e+01 8.978e+01 9.459e+01 2.098e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-19 12:32:55,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=665640.0, ans=0.125 2024-09-19 12:33:11,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=665680.0, ans=0.0 2024-09-19 12:33:15,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=665680.0, ans=0.2 2024-09-19 12:33:32,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=665720.0, ans=0.125 2024-09-19 12:33:47,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=665760.0, ans=0.025 2024-09-19 12:33:51,559 INFO [train.py:1198] (1/2) Epoch 37, batch 3550, loss[loss=0.2403, ctc_loss=0.1156, cr_loss=0.3512, attn_decoder_loss=0.2463, over 29707.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1152, cr_loss=0.3562, attn_decoder_loss=0.2403, over 5782404.86 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:33:57,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=665800.0, ans=0.125 2024-09-19 12:33:59,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=665800.0, ans=0.125 2024-09-19 12:34:34,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.62 vs. limit=22.5 2024-09-19 12:34:48,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-09-19 12:35:05,687 INFO [train.py:1198] (1/2) Epoch 37, batch 3600, loss[loss=0.2368, ctc_loss=0.1226, cr_loss=0.3808, attn_decoder_loss=0.241, over 29468.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1153, cr_loss=0.3567, attn_decoder_loss=0.2405, over 5791545.89 frames. ], batch size: 77, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:35:13,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=666000.0, ans=0.2 2024-09-19 12:35:22,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.583e+01 9.106e+01 9.636e+01 2.538e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-19 12:35:22,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=666040.0, ans=0.0 2024-09-19 12:35:30,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.83 vs. limit=15.0 2024-09-19 12:35:44,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=666080.0, ans=0.125 2024-09-19 12:36:20,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=666200.0, ans=0.0 2024-09-19 12:36:22,093 INFO [train.py:1198] (1/2) Epoch 37, batch 3650, loss[loss=0.2468, ctc_loss=0.1212, cr_loss=0.3873, attn_decoder_loss=0.2522, over 29501.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1146, cr_loss=0.3551, attn_decoder_loss=0.2396, over 5794409.01 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:36:31,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=666200.0, ans=0.0 2024-09-19 12:36:41,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=666240.0, ans=0.0 2024-09-19 12:37:00,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=666280.0, ans=0.2 2024-09-19 12:37:23,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=666360.0, ans=0.125 2024-09-19 12:37:32,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=666360.0, ans=0.0 2024-09-19 12:37:33,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=666360.0, ans=0.125 2024-09-19 12:37:36,693 INFO [train.py:1198] (1/2) Epoch 37, batch 3700, loss[loss=0.2378, ctc_loss=0.1159, cr_loss=0.3767, attn_decoder_loss=0.243, over 29728.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1146, cr_loss=0.3556, attn_decoder_loss=0.2398, over 5804343.36 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:37:45,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666400.0, ans=0.125 2024-09-19 12:37:56,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.469e+01 9.062e+01 9.671e+01 3.468e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 12:38:02,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=666440.0, ans=0.0 2024-09-19 12:38:03,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=666440.0, ans=0.125 2024-09-19 12:38:13,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2024-09-19 12:38:15,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666480.0, ans=0.125 2024-09-19 12:38:20,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=666520.0, ans=0.0 2024-09-19 12:38:30,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.87 vs. limit=15.0 2024-09-19 12:38:52,740 INFO [train.py:1198] (1/2) Epoch 37, batch 3750, loss[loss=0.2118, ctc_loss=0.1006, cr_loss=0.3444, attn_decoder_loss=0.2165, over 29398.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1148, cr_loss=0.3561, attn_decoder_loss=0.2398, over 5808087.32 frames. ], batch size: 67, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:39:00,606 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:39:02,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=666600.0, ans=0.025 2024-09-19 12:39:05,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=666600.0, ans=0.0 2024-09-19 12:39:19,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-09-19 12:39:22,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666680.0, ans=0.1 2024-09-19 12:39:30,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=666680.0, ans=0.0 2024-09-19 12:39:30,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=666680.0, ans=0.125 2024-09-19 12:39:44,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.28 vs. limit=12.0 2024-09-19 12:39:51,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=666760.0, ans=0.025 2024-09-19 12:39:58,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2024-09-19 12:40:07,584 INFO [train.py:1198] (1/2) Epoch 37, batch 3800, loss[loss=0.2424, ctc_loss=0.1175, cr_loss=0.3728, attn_decoder_loss=0.248, over 29638.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1145, cr_loss=0.3554, attn_decoder_loss=0.2395, over 5798231.25 frames. ], batch size: 86, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:40:26,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.580e+01 8.987e+01 9.690e+01 1.357e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-19 12:40:31,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=666840.0, ans=0.125 2024-09-19 12:40:37,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=666880.0, ans=0.125 2024-09-19 12:40:48,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=666880.0, ans=0.125 2024-09-19 12:40:52,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666920.0, ans=0.1 2024-09-19 12:41:05,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=666960.0, ans=0.07 2024-09-19 12:41:21,806 INFO [train.py:1198] (1/2) Epoch 37, batch 3850, loss[loss=0.2514, ctc_loss=0.1287, cr_loss=0.3943, attn_decoder_loss=0.2562, over 29252.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1141, cr_loss=0.3551, attn_decoder_loss=0.2391, over 5813457.11 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:41:22,236 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:41:30,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2024-09-19 12:41:30,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667000.0, ans=0.1 2024-09-19 12:41:32,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=667000.0, ans=0.125 2024-09-19 12:41:35,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=667040.0, ans=0.125 2024-09-19 12:41:35,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-09-19 12:41:51,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=667080.0, ans=0.1 2024-09-19 12:42:19,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.37 vs. limit=15.0 2024-09-19 12:42:38,730 INFO [train.py:1198] (1/2) Epoch 37, batch 3900, loss[loss=0.2351, ctc_loss=0.1046, cr_loss=0.3289, attn_decoder_loss=0.2423, over 29627.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.115, cr_loss=0.3566, attn_decoder_loss=0.2398, over 5817760.60 frames. ], batch size: 86, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:42:55,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=667240.0, ans=0.125 2024-09-19 12:42:57,947 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.596e+01 8.935e+01 9.669e+01 1.380e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-19 12:43:52,902 INFO [train.py:1198] (1/2) Epoch 37, batch 3950, loss[loss=0.2376, ctc_loss=0.1162, cr_loss=0.3512, attn_decoder_loss=0.2433, over 29529.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1146, cr_loss=0.3557, attn_decoder_loss=0.2398, over 5836896.49 frames. ], batch size: 97, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:44:11,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667440.0, ans=0.1 2024-09-19 12:44:12,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=667440.0, ans=0.2 2024-09-19 12:44:16,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=667440.0, ans=0.0 2024-09-19 12:44:25,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=667480.0, ans=10.0 2024-09-19 12:44:38,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.87 vs. limit=15.0 2024-09-19 12:44:40,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=667520.0, ans=0.0 2024-09-19 12:44:40,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=667520.0, ans=0.125 2024-09-19 12:44:51,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=12.0 2024-09-19 12:44:59,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.34 vs. limit=10.0 2024-09-19 12:45:01,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=667560.0, ans=0.05 2024-09-19 12:45:08,490 INFO [train.py:1198] (1/2) Epoch 37, batch 4000, loss[loss=0.2212, ctc_loss=0.09703, cr_loss=0.3177, attn_decoder_loss=0.2279, over 29528.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.115, cr_loss=0.3564, attn_decoder_loss=0.2402, over 5813938.58 frames. ], batch size: 74, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:45:10,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667600.0, ans=0.1 2024-09-19 12:45:13,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=667600.0, ans=0.2 2024-09-19 12:45:14,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=667600.0, ans=0.125 2024-09-19 12:45:16,140 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:45:20,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=667600.0, ans=0.2 2024-09-19 12:45:27,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.490e+01 9.030e+01 9.800e+01 2.988e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-19 12:45:32,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=667640.0, ans=0.2 2024-09-19 12:45:33,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=667640.0, ans=0.0 2024-09-19 12:45:35,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=667640.0, ans=0.1 2024-09-19 12:45:52,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.18 vs. limit=15.0 2024-09-19 12:46:15,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=667760.0, ans=15.0 2024-09-19 12:46:22,198 INFO [train.py:1198] (1/2) Epoch 37, batch 4050, loss[loss=0.2547, ctc_loss=0.1453, cr_loss=0.3956, attn_decoder_loss=0.2581, over 19586.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1152, cr_loss=0.3567, attn_decoder_loss=0.2402, over 5795315.72 frames. ], batch size: 211, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:46:31,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-09-19 12:46:35,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=667840.0, ans=0.125 2024-09-19 12:47:19,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667920.0, ans=0.1 2024-09-19 12:47:32,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=667960.0, ans=0.125 2024-09-19 12:47:37,492 INFO [train.py:1198] (1/2) Epoch 37, batch 4100, loss[loss=0.2597, ctc_loss=0.1354, cr_loss=0.3981, attn_decoder_loss=0.2646, over 29510.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1151, cr_loss=0.3564, attn_decoder_loss=0.2402, over 5791610.25 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:47:56,152 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.448e+01 9.033e+01 9.875e+01 1.600e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-19 12:48:02,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=668040.0, ans=0.0 2024-09-19 12:48:08,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2024-09-19 12:48:30,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-19 12:48:34,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=668160.0, ans=0.07 2024-09-19 12:48:42,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.whiten.whitening_limit, batch_count=668160.0, ans=12.0 2024-09-19 12:48:46,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=668160.0, ans=0.0 2024-09-19 12:48:50,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=668200.0, ans=0.1 2024-09-19 12:48:51,883 INFO [train.py:1198] (1/2) Epoch 37, batch 4150, loss[loss=0.2251, ctc_loss=0.1123, cr_loss=0.35, attn_decoder_loss=0.2299, over 29479.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1151, cr_loss=0.3564, attn_decoder_loss=0.2401, over 5797204.14 frames. ], batch size: 77, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:48:53,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=668200.0, ans=0.125 2024-09-19 12:49:05,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=668240.0, ans=0.1 2024-09-19 12:49:12,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=668240.0, ans=0.125 2024-09-19 12:49:24,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=668280.0, ans=0.0 2024-09-19 12:49:27,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=668280.0, ans=0.0 2024-09-19 12:49:48,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=668320.0, ans=0.125 2024-09-19 12:49:51,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=12.0 2024-09-19 12:49:54,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=22.5 2024-09-19 12:50:05,531 INFO [train.py:1198] (1/2) Epoch 37, batch 4200, loss[loss=0.2596, ctc_loss=0.1391, cr_loss=0.4137, attn_decoder_loss=0.2638, over 29512.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1151, cr_loss=0.357, attn_decoder_loss=0.2403, over 5798849.27 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:50:13,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668400.0, ans=0.1 2024-09-19 12:50:13,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.60 vs. limit=10.0 2024-09-19 12:50:16,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=668400.0, ans=0.025 2024-09-19 12:50:24,820 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.799e+01 8.584e+01 9.010e+01 9.647e+01 2.583e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-19 12:50:27,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=668440.0, ans=0.125 2024-09-19 12:50:28,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=668440.0, ans=0.0 2024-09-19 12:50:33,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668480.0, ans=0.1 2024-09-19 12:50:52,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=668520.0, ans=0.025 2024-09-19 12:50:55,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=668520.0, ans=0.125 2024-09-19 12:51:04,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=668560.0, ans=0.125 2024-09-19 12:51:20,678 INFO [train.py:1198] (1/2) Epoch 37, batch 4250, loss[loss=0.2194, ctc_loss=0.1007, cr_loss=0.3328, attn_decoder_loss=0.2252, over 29512.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1148, cr_loss=0.3561, attn_decoder_loss=0.2404, over 5804283.76 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:51:21,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=668600.0, ans=0.0 2024-09-19 12:51:29,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=668600.0, ans=0.125 2024-09-19 12:51:32,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=668600.0, ans=0.0 2024-09-19 12:51:41,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=668640.0, ans=0.0 2024-09-19 12:51:41,230 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:51:47,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2024-09-19 12:51:48,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=668680.0, ans=0.025 2024-09-19 12:51:52,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=668680.0, ans=0.125 2024-09-19 12:52:03,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=668720.0, ans=0.1 2024-09-19 12:52:04,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=668720.0, ans=0.1 2024-09-19 12:52:09,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=668720.0, ans=0.0 2024-09-19 12:52:35,092 INFO [train.py:1198] (1/2) Epoch 37, batch 4300, loss[loss=0.2414, ctc_loss=0.1202, cr_loss=0.3516, attn_decoder_loss=0.247, over 29540.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1144, cr_loss=0.3549, attn_decoder_loss=0.2405, over 5794181.22 frames. ], batch size: 87, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:52:36,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=668800.0, ans=0.125 2024-09-19 12:52:39,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=668800.0, ans=0.035 2024-09-19 12:52:54,262 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 8.796e+01 9.094e+01 9.550e+01 2.475e+02, threshold=1.819e+02, percent-clipped=2.0 2024-09-19 12:52:56,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=668840.0, ans=0.125 2024-09-19 12:52:58,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=668840.0, ans=0.125 2024-09-19 12:53:19,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=668920.0, ans=0.125 2024-09-19 12:53:43,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=668960.0, ans=0.2 2024-09-19 12:53:48,879 INFO [train.py:1198] (1/2) Epoch 37, batch 4350, loss[loss=0.2614, ctc_loss=0.1408, cr_loss=0.4096, attn_decoder_loss=0.2657, over 29508.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.117, cr_loss=0.3609, attn_decoder_loss=0.2436, over 5797010.50 frames. ], batch size: 97, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:54:02,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=669000.0, ans=22.5 2024-09-19 12:54:04,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=669040.0, ans=0.125 2024-09-19 12:54:19,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-19 12:54:29,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=669080.0, ans=0.1 2024-09-19 12:54:33,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=669120.0, ans=0.125 2024-09-19 12:54:47,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.96 vs. limit=15.0 2024-09-19 12:54:48,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=669160.0, ans=0.125 2024-09-19 12:54:56,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=669160.0, ans=0.1 2024-09-19 12:55:02,636 INFO [train.py:1198] (1/2) Epoch 37, batch 4400, loss[loss=0.2453, ctc_loss=0.1202, cr_loss=0.3684, attn_decoder_loss=0.251, over 27421.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1185, cr_loss=0.3634, attn_decoder_loss=0.2458, over 5768764.65 frames. ], batch size: 125, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 12:55:23,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.307e+01 8.852e+01 9.362e+01 9.812e+01 1.394e+02, threshold=1.872e+02, percent-clipped=0.0 2024-09-19 12:55:30,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=669240.0, ans=0.025 2024-09-19 12:55:31,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669280.0, ans=0.1 2024-09-19 12:55:57,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=669320.0, ans=0.125 2024-09-19 12:56:17,475 INFO [train.py:1198] (1/2) Epoch 37, batch 4450, loss[loss=0.2562, ctc_loss=0.1476, cr_loss=0.4016, attn_decoder_loss=0.2593, over 20657.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1216, cr_loss=0.3679, attn_decoder_loss=0.2478, over 5574657.32 frames. ], batch size: 209, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:56:17,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=669400.0, ans=0.125 2024-09-19 12:57:06,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=669520.0, ans=0.2 2024-09-19 12:57:33,291 INFO [train.py:1198] (1/2) Epoch 37, batch 4500, loss[loss=0.2471, ctc_loss=0.1375, cr_loss=0.3742, attn_decoder_loss=0.251, over 20812.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1249, cr_loss=0.3705, attn_decoder_loss=0.2496, over 5234260.86 frames. ], batch size: 209, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:57:33,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=669600.0, ans=0.125 2024-09-19 12:57:33,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=669600.0, ans=0.0 2024-09-19 12:57:35,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=669600.0, ans=0.125 2024-09-19 12:57:45,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=669600.0, ans=0.0 2024-09-19 12:57:54,009 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.699e+01 1.043e+02 1.161e+02 1.270e+02 1.246e+03, threshold=2.323e+02, percent-clipped=2.0 2024-09-19 12:58:56,698 INFO [train.py:1198] (1/2) Epoch 38, batch 0, loss[loss=0.2166, ctc_loss=0.1053, cr_loss=0.3465, attn_decoder_loss=0.2213, over 29622.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1053, cr_loss=0.3465, attn_decoder_loss=0.2213, over 29622.00 frames. ], batch size: 73, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 12:58:56,699 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 12:59:15,170 INFO [train.py:1230] (1/2) Epoch 38, validation: loss=0.2124, ctc_loss=0.03582, cr_loss=6.776e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-19 12:59:15,171 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 12:59:18,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=669700.0, ans=0.2 2024-09-19 12:59:23,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2024-09-19 13:00:09,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=669820.0, ans=0.125 2024-09-19 13:00:14,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=669860.0, ans=0.2 2024-09-19 13:00:20,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=669860.0, ans=0.125 2024-09-19 13:00:27,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2024-09-19 13:00:32,669 INFO [train.py:1198] (1/2) Epoch 38, batch 50, loss[loss=0.2057, ctc_loss=0.09307, cr_loss=0.3089, attn_decoder_loss=0.2114, over 29449.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1162, cr_loss=0.3593, attn_decoder_loss=0.2399, over 1266296.41 frames. ], batch size: 70, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:00:42,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=669900.0, ans=0.125 2024-09-19 13:00:45,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=669900.0, ans=0.125 2024-09-19 13:00:48,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=669940.0, ans=0.0 2024-09-19 13:00:54,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=669940.0, ans=0.0 2024-09-19 13:00:56,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=669940.0, ans=0.2 2024-09-19 13:01:02,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=669980.0, ans=0.0 2024-09-19 13:01:26,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670020.0, ans=0.0 2024-09-19 13:01:28,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=10.19 vs. limit=12.0 2024-09-19 13:01:35,761 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.673e+01 9.380e+01 1.040e+02 1.745e+02, threshold=1.876e+02, percent-clipped=0.0 2024-09-19 13:01:44,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670060.0, ans=0.0 2024-09-19 13:01:45,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.75 vs. limit=22.5 2024-09-19 13:01:47,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=670060.0, ans=0.1 2024-09-19 13:01:48,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.69 vs. limit=15.0 2024-09-19 13:01:50,659 INFO [train.py:1198] (1/2) Epoch 38, batch 100, loss[loss=0.2148, ctc_loss=0.1024, cr_loss=0.3344, attn_decoder_loss=0.2198, over 29531.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1174, cr_loss=0.3622, attn_decoder_loss=0.2421, over 2252308.64 frames. ], batch size: 76, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:02:04,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=670140.0, ans=0.09899494936611666 2024-09-19 13:02:07,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=670140.0, ans=0.125 2024-09-19 13:02:13,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=670140.0, ans=0.0 2024-09-19 13:02:13,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=670140.0, ans=0.125 2024-09-19 13:02:50,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.68 vs. limit=5.0 2024-09-19 13:03:05,242 INFO [train.py:1198] (1/2) Epoch 38, batch 150, loss[loss=0.2147, ctc_loss=0.09888, cr_loss=0.3338, attn_decoder_loss=0.2201, over 29454.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1149, cr_loss=0.3568, attn_decoder_loss=0.2402, over 3047744.70 frames. ], batch size: 70, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:03:05,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=670300.0, ans=0.2 2024-09-19 13:03:16,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=670300.0, ans=0.95 2024-09-19 13:03:52,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=670420.0, ans=0.125 2024-09-19 13:04:05,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.326e+01 8.770e+01 9.236e+01 1.783e+02, threshold=1.754e+02, percent-clipped=0.0 2024-09-19 13:04:20,687 INFO [train.py:1198] (1/2) Epoch 38, batch 200, loss[loss=0.2429, ctc_loss=0.1209, cr_loss=0.3638, attn_decoder_loss=0.2483, over 27468.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1142, cr_loss=0.3551, attn_decoder_loss=0.2393, over 3659843.18 frames. ], batch size: 125, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:04:33,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=670500.0, ans=0.125 2024-09-19 13:04:40,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.32 vs. limit=10.0 2024-09-19 13:04:43,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.66 vs. limit=15.0 2024-09-19 13:05:10,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=670620.0, ans=0.2 2024-09-19 13:05:41,277 INFO [train.py:1198] (1/2) Epoch 38, batch 250, loss[loss=0.2433, ctc_loss=0.1165, cr_loss=0.3479, attn_decoder_loss=0.2496, over 29239.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1141, cr_loss=0.3548, attn_decoder_loss=0.2393, over 4142177.34 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:05:41,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670700.0, ans=0.0 2024-09-19 13:05:49,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.01 vs. limit=22.5 2024-09-19 13:05:50,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=670700.0, ans=0.07 2024-09-19 13:05:55,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-19 13:06:04,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=670740.0, ans=0.0 2024-09-19 13:06:14,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=670780.0, ans=0.0 2024-09-19 13:06:17,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.96 vs. limit=10.0 2024-09-19 13:06:22,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670780.0, ans=0.1 2024-09-19 13:06:27,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-19 13:06:36,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=670820.0, ans=0.1 2024-09-19 13:06:36,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=670820.0, ans=0.0 2024-09-19 13:06:41,637 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.456e+01 8.891e+01 9.506e+01 1.343e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 13:06:56,768 INFO [train.py:1198] (1/2) Epoch 38, batch 300, loss[loss=0.2423, ctc_loss=0.1146, cr_loss=0.3707, attn_decoder_loss=0.2482, over 29528.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1138, cr_loss=0.354, attn_decoder_loss=0.2391, over 4509820.00 frames. ], batch size: 92, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:07:08,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2024-09-19 13:07:22,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=670940.0, ans=0.125 2024-09-19 13:07:24,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=670940.0, ans=0.07 2024-09-19 13:07:28,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=670980.0, ans=0.125 2024-09-19 13:07:58,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=671060.0, ans=0.125 2024-09-19 13:08:03,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=671060.0, ans=0.1 2024-09-19 13:08:12,148 INFO [train.py:1198] (1/2) Epoch 38, batch 350, loss[loss=0.2098, ctc_loss=0.09627, cr_loss=0.3064, attn_decoder_loss=0.2156, over 29327.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1145, cr_loss=0.3555, attn_decoder_loss=0.2399, over 4796808.40 frames. ], batch size: 71, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:08:18,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=671100.0, ans=0.0 2024-09-19 13:08:31,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=671140.0, ans=0.0 2024-09-19 13:08:40,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=671140.0, ans=0.2 2024-09-19 13:09:03,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=671220.0, ans=0.0 2024-09-19 13:09:13,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=671260.0, ans=0.0 2024-09-19 13:09:16,779 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.520e+01 8.939e+01 9.511e+01 1.277e+02, threshold=1.788e+02, percent-clipped=0.0 2024-09-19 13:09:31,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=671300.0, ans=0.0 2024-09-19 13:09:32,532 INFO [train.py:1198] (1/2) Epoch 38, batch 400, loss[loss=0.2384, ctc_loss=0.1179, cr_loss=0.3705, attn_decoder_loss=0.2435, over 29718.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1142, cr_loss=0.3549, attn_decoder_loss=0.2398, over 5026736.18 frames. ], batch size: 82, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:09:37,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=671300.0, ans=0.125 2024-09-19 13:09:52,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=671340.0, ans=0.125 2024-09-19 13:09:54,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.48 vs. limit=10.0 2024-09-19 13:10:00,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=671340.0, ans=0.5 2024-09-19 13:10:09,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=671380.0, ans=0.0 2024-09-19 13:10:19,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=671420.0, ans=0.125 2024-09-19 13:10:22,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=671420.0, ans=0.2 2024-09-19 13:10:46,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2024-09-19 13:10:48,140 INFO [train.py:1198] (1/2) Epoch 38, batch 450, loss[loss=0.2487, ctc_loss=0.1244, cr_loss=0.3728, attn_decoder_loss=0.2543, over 29693.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1145, cr_loss=0.3552, attn_decoder_loss=0.24, over 5190345.72 frames. ], batch size: 83, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:11:15,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=671540.0, ans=0.125 2024-09-19 13:11:16,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.23 vs. limit=10.0 2024-09-19 13:11:51,639 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.658e+01 9.040e+01 9.546e+01 1.503e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 13:11:53,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=671660.0, ans=0.0 2024-09-19 13:12:03,565 INFO [train.py:1198] (1/2) Epoch 38, batch 500, loss[loss=0.2556, ctc_loss=0.1312, cr_loss=0.3961, attn_decoder_loss=0.2606, over 29457.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1145, cr_loss=0.3552, attn_decoder_loss=0.2397, over 5332705.15 frames. ], batch size: 94, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:12:11,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=671700.0, ans=0.0 2024-09-19 13:12:25,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=671740.0, ans=0.125 2024-09-19 13:12:31,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=671740.0, ans=0.2 2024-09-19 13:12:42,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=671780.0, ans=0.0 2024-09-19 13:12:53,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=671820.0, ans=0.0 2024-09-19 13:13:23,862 INFO [train.py:1198] (1/2) Epoch 38, batch 550, loss[loss=0.2482, ctc_loss=0.1276, cr_loss=0.3972, attn_decoder_loss=0.2527, over 28861.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1147, cr_loss=0.3559, attn_decoder_loss=0.2397, over 5423776.54 frames. ], batch size: 104, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:13:28,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=671900.0, ans=0.0 2024-09-19 13:13:30,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=671900.0, ans=0.2 2024-09-19 13:13:39,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=671940.0, ans=0.0 2024-09-19 13:13:40,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=671940.0, ans=0.125 2024-09-19 13:13:47,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=671940.0, ans=0.125 2024-09-19 13:13:48,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=671940.0, ans=0.1 2024-09-19 13:14:17,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=672020.0, ans=0.125 2024-09-19 13:14:26,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=672020.0, ans=0.2 2024-09-19 13:14:35,173 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.563e+01 9.079e+01 9.918e+01 4.106e+02, threshold=1.816e+02, percent-clipped=4.0 2024-09-19 13:14:47,311 INFO [train.py:1198] (1/2) Epoch 38, batch 600, loss[loss=0.2526, ctc_loss=0.1305, cr_loss=0.3947, attn_decoder_loss=0.2574, over 29300.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1149, cr_loss=0.3563, attn_decoder_loss=0.2399, over 5511814.48 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:14:47,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=672100.0, ans=0.125 2024-09-19 13:15:01,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=672140.0, ans=0.2 2024-09-19 13:15:01,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=672140.0, ans=0.1 2024-09-19 13:15:04,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=672140.0, ans=0.2 2024-09-19 13:15:21,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.36 vs. limit=10.0 2024-09-19 13:15:26,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=672180.0, ans=0.0 2024-09-19 13:16:02,923 INFO [train.py:1198] (1/2) Epoch 38, batch 650, loss[loss=0.2384, ctc_loss=0.1145, cr_loss=0.3581, attn_decoder_loss=0.2442, over 29757.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1138, cr_loss=0.3541, attn_decoder_loss=0.2392, over 5587911.89 frames. ], batch size: 81, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:16:09,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=15.0 2024-09-19 13:16:22,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=672340.0, ans=0.125 2024-09-19 13:16:33,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=672340.0, ans=0.0 2024-09-19 13:16:36,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=672380.0, ans=0.0 2024-09-19 13:17:09,154 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.584e+01 9.023e+01 9.741e+01 1.282e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 13:17:10,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=672460.0, ans=0.1 2024-09-19 13:17:10,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=672460.0, ans=0.125 2024-09-19 13:17:20,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=672460.0, ans=0.125 2024-09-19 13:17:23,538 INFO [train.py:1198] (1/2) Epoch 38, batch 700, loss[loss=0.2309, ctc_loss=0.1173, cr_loss=0.3707, attn_decoder_loss=0.2353, over 29534.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1146, cr_loss=0.356, attn_decoder_loss=0.2398, over 5637183.84 frames. ], batch size: 76, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:17:25,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=672500.0, ans=10.0 2024-09-19 13:17:43,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=672540.0, ans=0.07 2024-09-19 13:18:11,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.32 vs. limit=15.0 2024-09-19 13:18:18,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=672620.0, ans=0.125 2024-09-19 13:18:22,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.66 vs. limit=15.0 2024-09-19 13:18:23,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.98 vs. limit=10.0 2024-09-19 13:18:32,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=672660.0, ans=0.125 2024-09-19 13:18:38,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=672700.0, ans=0.125 2024-09-19 13:18:39,486 INFO [train.py:1198] (1/2) Epoch 38, batch 750, loss[loss=0.2392, ctc_loss=0.1168, cr_loss=0.3561, attn_decoder_loss=0.2449, over 29702.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1143, cr_loss=0.3554, attn_decoder_loss=0.2395, over 5675260.50 frames. ], batch size: 82, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:18:39,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=672700.0, ans=0.125 2024-09-19 13:19:43,313 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.368e+01 8.750e+01 9.083e+01 9.607e+01 5.779e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-19 13:19:54,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=672900.0, ans=0.0 2024-09-19 13:19:55,414 INFO [train.py:1198] (1/2) Epoch 38, batch 800, loss[loss=0.2085, ctc_loss=0.0967, cr_loss=0.3106, attn_decoder_loss=0.214, over 29607.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1141, cr_loss=0.3545, attn_decoder_loss=0.2392, over 5706116.09 frames. ], batch size: 73, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:20:15,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=672940.0, ans=0.0 2024-09-19 13:20:21,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=672940.0, ans=0.125 2024-09-19 13:20:41,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=673020.0, ans=0.125 2024-09-19 13:21:15,076 INFO [train.py:1198] (1/2) Epoch 38, batch 850, loss[loss=0.2378, ctc_loss=0.1093, cr_loss=0.3428, attn_decoder_loss=0.2444, over 29729.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1138, cr_loss=0.3541, attn_decoder_loss=0.2393, over 5735093.17 frames. ], batch size: 89, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:21:19,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=673100.0, ans=0.125 2024-09-19 13:21:39,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=673140.0, ans=0.2 2024-09-19 13:22:09,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=673220.0, ans=0.125 2024-09-19 13:22:15,870 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:22:19,981 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.440e+01 8.974e+01 9.392e+01 3.199e+02, threshold=1.795e+02, percent-clipped=2.0 2024-09-19 13:22:26,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=673260.0, ans=0.0 2024-09-19 13:22:30,665 INFO [train.py:1198] (1/2) Epoch 38, batch 900, loss[loss=0.2102, ctc_loss=0.0962, cr_loss=0.3052, attn_decoder_loss=0.2161, over 29625.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1139, cr_loss=0.3538, attn_decoder_loss=0.2396, over 5739499.11 frames. ], batch size: 73, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:22:36,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=22.5 2024-09-19 13:22:37,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=15.0 2024-09-19 13:22:39,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=673300.0, ans=0.0 2024-09-19 13:22:42,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=673300.0, ans=0.125 2024-09-19 13:22:53,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=673340.0, ans=0.05 2024-09-19 13:22:56,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=673340.0, ans=0.0 2024-09-19 13:22:57,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=673340.0, ans=0.1 2024-09-19 13:22:58,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.51 vs. limit=15.0 2024-09-19 13:23:14,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=673420.0, ans=0.09899494936611666 2024-09-19 13:23:14,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=673420.0, ans=0.125 2024-09-19 13:23:17,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=673420.0, ans=0.2 2024-09-19 13:23:25,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673420.0, ans=0.1 2024-09-19 13:23:41,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=673460.0, ans=0.125 2024-09-19 13:23:45,778 INFO [train.py:1198] (1/2) Epoch 38, batch 950, loss[loss=0.2178, ctc_loss=0.09564, cr_loss=0.313, attn_decoder_loss=0.2244, over 29522.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1139, cr_loss=0.3533, attn_decoder_loss=0.2395, over 5741838.91 frames. ], batch size: 74, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:23:46,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=673500.0, ans=0.125 2024-09-19 13:24:17,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=673580.0, ans=0.1 2024-09-19 13:24:28,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=673580.0, ans=0.0 2024-09-19 13:24:34,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=673620.0, ans=0.2 2024-09-19 13:24:51,003 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:24:53,669 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.808e+01 9.253e+01 1.008e+02 2.662e+02, threshold=1.851e+02, percent-clipped=5.0 2024-09-19 13:25:00,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=673660.0, ans=0.125 2024-09-19 13:25:03,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=673660.0, ans=0.125 2024-09-19 13:25:06,338 INFO [train.py:1198] (1/2) Epoch 38, batch 1000, loss[loss=0.2235, ctc_loss=0.1046, cr_loss=0.3294, attn_decoder_loss=0.2294, over 29519.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1153, cr_loss=0.3557, attn_decoder_loss=0.2406, over 5736114.89 frames. ], batch size: 77, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:25:27,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=673740.0, ans=0.0 2024-09-19 13:25:27,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=673740.0, ans=0.95 2024-09-19 13:25:46,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=673780.0, ans=0.1 2024-09-19 13:25:47,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=673780.0, ans=0.125 2024-09-19 13:25:50,790 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:25:51,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2024-09-19 13:25:51,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=12.0 2024-09-19 13:26:08,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=673860.0, ans=0.125 2024-09-19 13:26:09,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=673860.0, ans=0.2 2024-09-19 13:26:21,861 INFO [train.py:1198] (1/2) Epoch 38, batch 1050, loss[loss=0.2438, ctc_loss=0.1191, cr_loss=0.3724, attn_decoder_loss=0.2494, over 29671.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1149, cr_loss=0.3551, attn_decoder_loss=0.2399, over 5744519.33 frames. ], batch size: 85, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:26:22,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=673900.0, ans=0.0 2024-09-19 13:26:29,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=673900.0, ans=0.0 2024-09-19 13:26:37,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=673940.0, ans=0.0 2024-09-19 13:26:38,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.55 vs. limit=6.0 2024-09-19 13:27:13,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=674020.0, ans=0.2 2024-09-19 13:27:21,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=674060.0, ans=0.125 2024-09-19 13:27:27,114 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.465e+01 8.973e+01 9.470e+01 1.777e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-19 13:27:37,781 INFO [train.py:1198] (1/2) Epoch 38, batch 1100, loss[loss=0.2327, ctc_loss=0.1139, cr_loss=0.3433, attn_decoder_loss=0.2382, over 29433.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1146, cr_loss=0.3546, attn_decoder_loss=0.2394, over 5757649.21 frames. ], batch size: 78, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:27:50,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=674100.0, ans=0.09899494936611666 2024-09-19 13:28:03,931 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:28:08,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=674180.0, ans=0.2 2024-09-19 13:28:13,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.58 vs. limit=22.5 2024-09-19 13:28:18,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674180.0, ans=0.1 2024-09-19 13:28:24,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=674220.0, ans=0.125 2024-09-19 13:28:28,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=674220.0, ans=0.07 2024-09-19 13:28:39,547 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:28:57,999 INFO [train.py:1198] (1/2) Epoch 38, batch 1150, loss[loss=0.2292, ctc_loss=0.1134, cr_loss=0.3618, attn_decoder_loss=0.234, over 29421.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1146, cr_loss=0.3549, attn_decoder_loss=0.2393, over 5755070.84 frames. ], batch size: 78, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:29:38,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=674380.0, ans=0.0 2024-09-19 13:29:51,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=674420.0, ans=0.0 2024-09-19 13:29:52,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.03 vs. limit=22.5 2024-09-19 13:30:03,757 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.613e+01 9.064e+01 9.591e+01 1.895e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-19 13:30:12,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-09-19 13:30:14,471 INFO [train.py:1198] (1/2) Epoch 38, batch 1200, loss[loss=0.2498, ctc_loss=0.1237, cr_loss=0.3535, attn_decoder_loss=0.256, over 29668.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1147, cr_loss=0.3552, attn_decoder_loss=0.2398, over 5747559.65 frames. ], batch size: 85, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:30:20,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=674500.0, ans=0.0 2024-09-19 13:30:25,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=674500.0, ans=0.0 2024-09-19 13:30:45,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=15.0 2024-09-19 13:30:52,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=674580.0, ans=0.125 2024-09-19 13:30:54,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=12.0 2024-09-19 13:31:01,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=674620.0, ans=0.1 2024-09-19 13:31:13,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=674660.0, ans=0.025 2024-09-19 13:31:21,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674660.0, ans=0.1 2024-09-19 13:31:22,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=674660.0, ans=0.125 2024-09-19 13:31:29,824 INFO [train.py:1198] (1/2) Epoch 38, batch 1250, loss[loss=0.2424, ctc_loss=0.1272, cr_loss=0.3871, attn_decoder_loss=0.2466, over 29487.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1151, cr_loss=0.3561, attn_decoder_loss=0.2403, over 5774856.66 frames. ], batch size: 92, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:31:31,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=674700.0, ans=0.0 2024-09-19 13:31:54,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674740.0, ans=0.1 2024-09-19 13:32:12,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-09-19 13:32:33,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=15.0 2024-09-19 13:32:37,395 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.525e+01 9.083e+01 9.622e+01 1.847e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-19 13:32:45,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=674860.0, ans=0.125 2024-09-19 13:32:50,168 INFO [train.py:1198] (1/2) Epoch 38, batch 1300, loss[loss=0.2472, ctc_loss=0.1222, cr_loss=0.3745, attn_decoder_loss=0.2528, over 28398.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1148, cr_loss=0.3559, attn_decoder_loss=0.2399, over 5780290.40 frames. ], batch size: 111, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:33:13,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=674940.0, ans=0.125 2024-09-19 13:33:34,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=675020.0, ans=0.125 2024-09-19 13:33:37,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675020.0, ans=0.1 2024-09-19 13:34:05,850 INFO [train.py:1198] (1/2) Epoch 38, batch 1350, loss[loss=0.2249, ctc_loss=0.1022, cr_loss=0.33, attn_decoder_loss=0.2312, over 29758.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1147, cr_loss=0.3559, attn_decoder_loss=0.2397, over 5797174.81 frames. ], batch size: 81, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:34:12,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=675100.0, ans=0.125 2024-09-19 13:34:37,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675180.0, ans=0.1 2024-09-19 13:34:52,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=675220.0, ans=0.0 2024-09-19 13:34:54,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=675220.0, ans=0.125 2024-09-19 13:35:01,781 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:35:01,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2024-09-19 13:35:11,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.540e+01 8.958e+01 9.553e+01 1.189e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 13:35:20,936 INFO [train.py:1198] (1/2) Epoch 38, batch 1400, loss[loss=0.2093, ctc_loss=0.1018, cr_loss=0.3274, attn_decoder_loss=0.2139, over 29561.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1143, cr_loss=0.3556, attn_decoder_loss=0.2394, over 5808076.94 frames. ], batch size: 69, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:35:24,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675300.0, ans=0.1 2024-09-19 13:35:36,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=675340.0, ans=0.0 2024-09-19 13:35:47,150 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:35:50,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=675380.0, ans=0.125 2024-09-19 13:35:58,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-09-19 13:36:19,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=675420.0, ans=0.125 2024-09-19 13:36:23,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=675460.0, ans=0.0 2024-09-19 13:36:26,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=675460.0, ans=0.125 2024-09-19 13:36:39,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675500.0, ans=0.1 2024-09-19 13:36:40,635 INFO [train.py:1198] (1/2) Epoch 38, batch 1450, loss[loss=0.2575, ctc_loss=0.1353, cr_loss=0.4089, attn_decoder_loss=0.262, over 29460.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1145, cr_loss=0.356, attn_decoder_loss=0.24, over 5803915.23 frames. ], batch size: 94, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:36:43,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=22.5 2024-09-19 13:36:46,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=675500.0, ans=0.125 2024-09-19 13:37:18,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=675580.0, ans=0.125 2024-09-19 13:37:23,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=675580.0, ans=0.05 2024-09-19 13:37:30,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=675620.0, ans=0.125 2024-09-19 13:37:30,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=675620.0, ans=10.0 2024-09-19 13:37:37,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2024-09-19 13:37:46,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.618e+01 9.172e+01 9.928e+01 3.328e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-19 13:37:53,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675660.0, ans=0.1 2024-09-19 13:37:56,074 INFO [train.py:1198] (1/2) Epoch 38, batch 1500, loss[loss=0.2429, ctc_loss=0.1172, cr_loss=0.3486, attn_decoder_loss=0.2491, over 29612.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1146, cr_loss=0.3557, attn_decoder_loss=0.2402, over 5804379.24 frames. ], batch size: 86, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:37:58,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.84 vs. limit=22.5 2024-09-19 13:38:19,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=675740.0, ans=0.2 2024-09-19 13:38:19,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=675740.0, ans=0.025 2024-09-19 13:38:34,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=675780.0, ans=0.0 2024-09-19 13:38:44,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2024-09-19 13:38:54,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=675820.0, ans=0.07 2024-09-19 13:38:58,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675860.0, ans=0.1 2024-09-19 13:39:00,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=675860.0, ans=0.025 2024-09-19 13:39:09,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2024-09-19 13:39:10,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675900.0, ans=0.1 2024-09-19 13:39:11,898 INFO [train.py:1198] (1/2) Epoch 38, batch 1550, loss[loss=0.2489, ctc_loss=0.124, cr_loss=0.3915, attn_decoder_loss=0.2541, over 29545.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1148, cr_loss=0.3562, attn_decoder_loss=0.2402, over 5780521.57 frames. ], batch size: 90, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:39:16,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=675900.0, ans=0.1 2024-09-19 13:39:17,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2024-09-19 13:39:43,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=675980.0, ans=0.025 2024-09-19 13:39:48,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=675980.0, ans=0.125 2024-09-19 13:40:20,839 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.489e+01 9.048e+01 9.769e+01 3.941e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 13:40:27,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676060.0, ans=0.1 2024-09-19 13:40:32,004 INFO [train.py:1198] (1/2) Epoch 38, batch 1600, loss[loss=0.2407, ctc_loss=0.117, cr_loss=0.3522, attn_decoder_loss=0.2466, over 29673.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1147, cr_loss=0.3559, attn_decoder_loss=0.24, over 5762751.99 frames. ], batch size: 85, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:40:41,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=676100.0, ans=0.125 2024-09-19 13:40:49,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=676140.0, ans=0.07 2024-09-19 13:41:10,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=676180.0, ans=0.125 2024-09-19 13:41:43,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=676260.0, ans=0.125 2024-09-19 13:41:47,495 INFO [train.py:1198] (1/2) Epoch 38, batch 1650, loss[loss=0.2407, ctc_loss=0.1182, cr_loss=0.3589, attn_decoder_loss=0.2464, over 29704.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1143, cr_loss=0.3552, attn_decoder_loss=0.2397, over 5758036.76 frames. ], batch size: 89, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:41:47,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=676300.0, ans=0.1 2024-09-19 13:41:50,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=676300.0, ans=0.0 2024-09-19 13:42:04,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=676340.0, ans=0.125 2024-09-19 13:42:07,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=676340.0, ans=0.125 2024-09-19 13:42:19,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676380.0, ans=0.1 2024-09-19 13:42:53,543 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.638e+01 9.232e+01 9.728e+01 1.403e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-19 13:43:02,489 INFO [train.py:1198] (1/2) Epoch 38, batch 1700, loss[loss=0.2051, ctc_loss=0.09457, cr_loss=0.3104, attn_decoder_loss=0.2105, over 29550.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1142, cr_loss=0.3555, attn_decoder_loss=0.2396, over 5779824.18 frames. ], batch size: 69, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:43:02,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=676500.0, ans=0.1 2024-09-19 13:43:02,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=676500.0, ans=0.125 2024-09-19 13:43:08,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676500.0, ans=0.1 2024-09-19 13:43:28,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=676540.0, ans=0.125 2024-09-19 13:43:39,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=676580.0, ans=0.1 2024-09-19 13:43:40,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676580.0, ans=0.1 2024-09-19 13:43:48,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=676620.0, ans=0.125 2024-09-19 13:44:22,575 INFO [train.py:1198] (1/2) Epoch 38, batch 1750, loss[loss=0.2081, ctc_loss=0.09593, cr_loss=0.3142, attn_decoder_loss=0.2135, over 29375.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1142, cr_loss=0.3551, attn_decoder_loss=0.2392, over 5786062.17 frames. ], batch size: 67, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:44:27,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2024-09-19 13:44:34,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.15 vs. limit=15.0 2024-09-19 13:44:39,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=676740.0, ans=0.125 2024-09-19 13:44:48,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=676740.0, ans=0.025 2024-09-19 13:44:56,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=676780.0, ans=0.125 2024-09-19 13:45:05,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=676780.0, ans=0.125 2024-09-19 13:45:13,142 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:45:19,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=676820.0, ans=0.95 2024-09-19 13:45:30,735 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.567e+01 9.147e+01 9.670e+01 2.287e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-19 13:45:38,186 INFO [train.py:1198] (1/2) Epoch 38, batch 1800, loss[loss=0.2436, ctc_loss=0.1164, cr_loss=0.3725, attn_decoder_loss=0.2495, over 29696.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1145, cr_loss=0.3552, attn_decoder_loss=0.2393, over 5789443.64 frames. ], batch size: 83, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:45:48,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.69 vs. limit=15.0 2024-09-19 13:45:53,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676940.0, ans=0.1 2024-09-19 13:46:02,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=676940.0, ans=0.025 2024-09-19 13:46:46,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=677060.0, ans=0.5 2024-09-19 13:46:47,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=677060.0, ans=0.125 2024-09-19 13:46:53,820 INFO [train.py:1198] (1/2) Epoch 38, batch 1850, loss[loss=0.2385, ctc_loss=0.1067, cr_loss=0.3275, attn_decoder_loss=0.2459, over 29622.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1143, cr_loss=0.3552, attn_decoder_loss=0.2394, over 5796066.47 frames. ], batch size: 86, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:46:58,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=677100.0, ans=0.2 2024-09-19 13:47:03,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=677100.0, ans=0.125 2024-09-19 13:47:07,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=677140.0, ans=10.0 2024-09-19 13:47:12,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=677140.0, ans=0.2 2024-09-19 13:47:47,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2024-09-19 13:47:51,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=677220.0, ans=0.2 2024-09-19 13:48:01,721 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.507e+01 9.088e+01 9.545e+01 1.586e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-19 13:48:03,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=677260.0, ans=0.125 2024-09-19 13:48:03,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=677260.0, ans=0.1 2024-09-19 13:48:09,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=677260.0, ans=0.1 2024-09-19 13:48:13,529 INFO [train.py:1198] (1/2) Epoch 38, batch 1900, loss[loss=0.2406, ctc_loss=0.1132, cr_loss=0.3649, attn_decoder_loss=0.2466, over 29713.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1141, cr_loss=0.3545, attn_decoder_loss=0.2395, over 5803539.16 frames. ], batch size: 89, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:48:19,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=677300.0, ans=0.125 2024-09-19 13:48:51,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=677380.0, ans=0.125 2024-09-19 13:49:11,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=677420.0, ans=0.2 2024-09-19 13:49:24,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=677460.0, ans=0.125 2024-09-19 13:49:26,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=677460.0, ans=0.125 2024-09-19 13:49:29,173 INFO [train.py:1198] (1/2) Epoch 38, batch 1950, loss[loss=0.2321, ctc_loss=0.1179, cr_loss=0.3731, attn_decoder_loss=0.2365, over 29429.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1147, cr_loss=0.3563, attn_decoder_loss=0.2408, over 5817919.64 frames. ], batch size: 78, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:49:34,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=677500.0, ans=0.125 2024-09-19 13:49:39,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2024-09-19 13:50:06,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=12.0 2024-09-19 13:50:09,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=677580.0, ans=0.0 2024-09-19 13:50:16,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=677620.0, ans=0.125 2024-09-19 13:50:28,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=677660.0, ans=0.125 2024-09-19 13:50:34,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=677660.0, ans=0.1 2024-09-19 13:50:35,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-09-19 13:50:37,364 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.623e+01 9.104e+01 9.387e+01 1.434e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-19 13:50:44,925 INFO [train.py:1198] (1/2) Epoch 38, batch 2000, loss[loss=0.2129, ctc_loss=0.1006, cr_loss=0.3342, attn_decoder_loss=0.2179, over 29361.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1148, cr_loss=0.3559, attn_decoder_loss=0.2412, over 5796779.17 frames. ], batch size: 67, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 13:50:45,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=677700.0, ans=0.125 2024-09-19 13:50:54,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677700.0, ans=0.1 2024-09-19 13:50:59,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=677740.0, ans=0.125 2024-09-19 13:50:59,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=677740.0, ans=0.125 2024-09-19 13:51:05,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=677740.0, ans=0.125 2024-09-19 13:51:25,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-09-19 13:51:38,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=677820.0, ans=0.0 2024-09-19 13:52:04,798 INFO [train.py:1198] (1/2) Epoch 38, batch 2050, loss[loss=0.2111, ctc_loss=0.09285, cr_loss=0.3157, attn_decoder_loss=0.2172, over 29457.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1143, cr_loss=0.3555, attn_decoder_loss=0.2402, over 5789093.06 frames. ], batch size: 70, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 13:52:05,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=677900.0, ans=0.1 2024-09-19 13:52:06,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=677900.0, ans=0.125 2024-09-19 13:52:06,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=677900.0, ans=0.125 2024-09-19 13:52:17,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=677900.0, ans=0.0 2024-09-19 13:52:27,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=677940.0, ans=0.5 2024-09-19 13:52:32,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677940.0, ans=0.1 2024-09-19 13:52:40,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2024-09-19 13:52:56,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678020.0, ans=0.1 2024-09-19 13:53:11,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=678060.0, ans=0.125 2024-09-19 13:53:12,676 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.454e+01 8.907e+01 9.620e+01 4.678e+02, threshold=1.781e+02, percent-clipped=1.0 2024-09-19 13:53:20,341 INFO [train.py:1198] (1/2) Epoch 38, batch 2100, loss[loss=0.2383, ctc_loss=0.1196, cr_loss=0.3685, attn_decoder_loss=0.2433, over 29768.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1138, cr_loss=0.3543, attn_decoder_loss=0.2395, over 5801046.73 frames. ], batch size: 81, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 13:53:20,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=678100.0, ans=0.125 2024-09-19 13:53:23,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=678100.0, ans=0.0 2024-09-19 13:53:41,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=678140.0, ans=0.125 2024-09-19 13:53:45,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-09-19 13:53:45,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-09-19 13:53:46,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=678140.0, ans=0.125 2024-09-19 13:54:26,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-09-19 13:54:28,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2024-09-19 13:54:31,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=678260.0, ans=0.125 2024-09-19 13:54:32,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=678260.0, ans=0.125 2024-09-19 13:54:35,529 INFO [train.py:1198] (1/2) Epoch 38, batch 2150, loss[loss=0.2243, ctc_loss=0.112, cr_loss=0.3441, attn_decoder_loss=0.2292, over 29445.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1132, cr_loss=0.353, attn_decoder_loss=0.2389, over 5814887.28 frames. ], batch size: 78, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 13:54:47,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=22.5 2024-09-19 13:55:00,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=678340.0, ans=0.125 2024-09-19 13:55:12,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=678380.0, ans=0.2 2024-09-19 13:55:32,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=678420.0, ans=0.125 2024-09-19 13:55:36,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=678460.0, ans=0.0 2024-09-19 13:55:38,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=678460.0, ans=0.2 2024-09-19 13:55:45,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.586e+01 8.959e+01 9.597e+01 2.666e+02, threshold=1.792e+02, percent-clipped=1.0 2024-09-19 13:55:53,919 INFO [train.py:1198] (1/2) Epoch 38, batch 2200, loss[loss=0.2519, ctc_loss=0.1287, cr_loss=0.3792, attn_decoder_loss=0.2571, over 29643.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1136, cr_loss=0.3532, attn_decoder_loss=0.239, over 5810750.23 frames. ], batch size: 86, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:55:58,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=678500.0, ans=0.2 2024-09-19 13:56:28,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=678580.0, ans=0.125 2024-09-19 13:56:31,120 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:56:44,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=678620.0, ans=0.2 2024-09-19 13:56:53,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=678620.0, ans=0.0 2024-09-19 13:57:00,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=678660.0, ans=0.1 2024-09-19 13:57:10,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=678700.0, ans=0.025 2024-09-19 13:57:11,727 INFO [train.py:1198] (1/2) Epoch 38, batch 2250, loss[loss=0.236, ctc_loss=0.1125, cr_loss=0.349, attn_decoder_loss=0.242, over 29730.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1135, cr_loss=0.3531, attn_decoder_loss=0.2391, over 5810265.58 frames. ], batch size: 82, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:57:13,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=22.5 2024-09-19 13:57:15,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=678700.0, ans=0.1 2024-09-19 13:57:19,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=678700.0, ans=0.0 2024-09-19 13:57:27,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=12.0 2024-09-19 13:57:41,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=678780.0, ans=0.125 2024-09-19 13:58:20,712 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.565e+01 9.062e+01 9.644e+01 4.463e+02, threshold=1.812e+02, percent-clipped=2.0 2024-09-19 13:58:22,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=678860.0, ans=0.125 2024-09-19 13:58:26,882 INFO [train.py:1198] (1/2) Epoch 38, batch 2300, loss[loss=0.1983, ctc_loss=0.09069, cr_loss=0.2863, attn_decoder_loss=0.2039, over 29304.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.113, cr_loss=0.3521, attn_decoder_loss=0.2381, over 5796051.18 frames. ], batch size: 71, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:58:43,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=678940.0, ans=0.2 2024-09-19 13:58:58,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=678980.0, ans=0.125 2024-09-19 13:59:10,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=679020.0, ans=0.2 2024-09-19 13:59:24,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=679020.0, ans=0.1 2024-09-19 13:59:28,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-09-19 13:59:33,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=679060.0, ans=0.125 2024-09-19 13:59:36,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=679060.0, ans=0.125 2024-09-19 13:59:42,416 INFO [train.py:1198] (1/2) Epoch 38, batch 2350, loss[loss=0.2471, ctc_loss=0.1221, cr_loss=0.3758, attn_decoder_loss=0.2526, over 29689.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1132, cr_loss=0.3533, attn_decoder_loss=0.2385, over 5802452.08 frames. ], batch size: 83, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:59:44,952 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:59:46,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=12.0 2024-09-19 13:59:47,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=679100.0, ans=0.125 2024-09-19 14:00:47,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=679260.0, ans=0.125 2024-09-19 14:00:56,207 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.471e+01 8.979e+01 9.530e+01 2.043e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-19 14:01:02,398 INFO [train.py:1198] (1/2) Epoch 38, batch 2400, loss[loss=0.2186, ctc_loss=0.1057, cr_loss=0.3537, attn_decoder_loss=0.2233, over 29535.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1139, cr_loss=0.355, attn_decoder_loss=0.2391, over 5806503.72 frames. ], batch size: 76, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 14:01:02,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=679300.0, ans=0.2 2024-09-19 14:01:25,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=679340.0, ans=0.1 2024-09-19 14:01:31,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=679380.0, ans=0.125 2024-09-19 14:01:36,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=679380.0, ans=0.2 2024-09-19 14:01:54,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=679420.0, ans=0.0 2024-09-19 14:02:15,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=679460.0, ans=0.2 2024-09-19 14:02:18,225 INFO [train.py:1198] (1/2) Epoch 38, batch 2450, loss[loss=0.2367, ctc_loss=0.1095, cr_loss=0.3456, attn_decoder_loss=0.2432, over 29738.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1143, cr_loss=0.3556, attn_decoder_loss=0.2397, over 5786531.00 frames. ], batch size: 82, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:02:31,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=679540.0, ans=0.125 2024-09-19 14:02:46,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=679580.0, ans=0.125 2024-09-19 14:02:56,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=679580.0, ans=0.125 2024-09-19 14:03:06,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=679620.0, ans=0.125 2024-09-19 14:03:23,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=679660.0, ans=0.0 2024-09-19 14:03:28,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.520e+01 8.981e+01 9.531e+01 3.262e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-19 14:03:34,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679700.0, ans=0.1 2024-09-19 14:03:35,433 INFO [train.py:1198] (1/2) Epoch 38, batch 2500, loss[loss=0.2345, ctc_loss=0.1108, cr_loss=0.3519, attn_decoder_loss=0.2405, over 29605.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1142, cr_loss=0.3553, attn_decoder_loss=0.2396, over 5795061.52 frames. ], batch size: 86, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:04:22,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2024-09-19 14:04:47,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=679860.0, ans=0.025 2024-09-19 14:04:53,320 INFO [train.py:1198] (1/2) Epoch 38, batch 2550, loss[loss=0.2128, ctc_loss=0.1091, cr_loss=0.3266, attn_decoder_loss=0.217, over 29358.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1145, cr_loss=0.3558, attn_decoder_loss=0.2399, over 5798359.77 frames. ], batch size: 67, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:04:53,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=679900.0, ans=0.0 2024-09-19 14:05:11,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679940.0, ans=0.1 2024-09-19 14:05:22,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=679980.0, ans=0.0 2024-09-19 14:05:51,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-19 14:06:03,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=680060.0, ans=0.2 2024-09-19 14:06:04,811 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.447e+01 9.059e+01 9.443e+01 1.451e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-19 14:06:06,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=680060.0, ans=0.125 2024-09-19 14:06:09,435 INFO [train.py:1198] (1/2) Epoch 38, batch 2600, loss[loss=0.2262, ctc_loss=0.1087, cr_loss=0.3384, attn_decoder_loss=0.2318, over 29447.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1143, cr_loss=0.3554, attn_decoder_loss=0.2403, over 5794823.66 frames. ], batch size: 78, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:06:11,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=680100.0, ans=0.025 2024-09-19 14:06:18,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=680100.0, ans=0.125 2024-09-19 14:06:18,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680100.0, ans=0.1 2024-09-19 14:06:20,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=680100.0, ans=0.0 2024-09-19 14:06:33,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=680140.0, ans=0.0 2024-09-19 14:06:36,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=680140.0, ans=0.125 2024-09-19 14:06:41,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=680180.0, ans=0.125 2024-09-19 14:06:41,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680180.0, ans=0.1 2024-09-19 14:06:50,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=680180.0, ans=0.0 2024-09-19 14:06:51,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=680180.0, ans=0.2 2024-09-19 14:07:06,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-19 14:07:10,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=680260.0, ans=0.125 2024-09-19 14:07:16,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=680260.0, ans=0.0 2024-09-19 14:07:24,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=680260.0, ans=0.05 2024-09-19 14:07:26,861 INFO [train.py:1198] (1/2) Epoch 38, batch 2650, loss[loss=0.2503, ctc_loss=0.1247, cr_loss=0.3758, attn_decoder_loss=0.2559, over 29296.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1142, cr_loss=0.3553, attn_decoder_loss=0.2406, over 5802224.17 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:07:27,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680300.0, ans=0.1 2024-09-19 14:07:32,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.51 vs. limit=15.0 2024-09-19 14:07:42,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=680340.0, ans=0.125 2024-09-19 14:08:00,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=680380.0, ans=0.0 2024-09-19 14:08:12,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=680420.0, ans=0.125 2024-09-19 14:08:39,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.526e+01 9.066e+01 9.711e+01 1.379e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-19 14:08:44,058 INFO [train.py:1198] (1/2) Epoch 38, batch 2700, loss[loss=0.2388, ctc_loss=0.1069, cr_loss=0.3441, attn_decoder_loss=0.2458, over 29527.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1144, cr_loss=0.3554, attn_decoder_loss=0.2406, over 5797039.47 frames. ], batch size: 87, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:08:47,447 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:09:04,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=680540.0, ans=0.0 2024-09-19 14:09:24,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.61 vs. limit=15.0 2024-09-19 14:09:27,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2024-09-19 14:09:27,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-09-19 14:09:59,761 INFO [train.py:1198] (1/2) Epoch 38, batch 2750, loss[loss=0.2169, ctc_loss=0.1046, cr_loss=0.3276, attn_decoder_loss=0.2221, over 29492.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.114, cr_loss=0.3544, attn_decoder_loss=0.2395, over 5794564.14 frames. ], batch size: 75, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:10:01,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=680700.0, ans=0.2 2024-09-19 14:10:16,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=680740.0, ans=0.125 2024-09-19 14:10:27,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=680740.0, ans=0.125 2024-09-19 14:10:28,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=680780.0, ans=0.125 2024-09-19 14:10:34,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-19 14:10:34,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=680780.0, ans=0.025 2024-09-19 14:10:41,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=680780.0, ans=0.125 2024-09-19 14:11:00,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=680860.0, ans=0.2 2024-09-19 14:11:03,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.52 vs. limit=10.0 2024-09-19 14:11:07,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2024-09-19 14:11:10,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=680860.0, ans=0.125 2024-09-19 14:11:13,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.653e+01 9.096e+01 9.746e+01 4.436e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-19 14:11:18,085 INFO [train.py:1198] (1/2) Epoch 38, batch 2800, loss[loss=0.2543, ctc_loss=0.1372, cr_loss=0.3702, attn_decoder_loss=0.2591, over 20258.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1144, cr_loss=0.3547, attn_decoder_loss=0.24, over 5775730.78 frames. ], batch size: 210, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 14:11:36,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=680940.0, ans=0.025 2024-09-19 14:11:41,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=680940.0, ans=0.0 2024-09-19 14:11:43,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680940.0, ans=0.1 2024-09-19 14:11:49,102 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:12:05,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=681020.0, ans=0.04949747468305833 2024-09-19 14:12:14,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=681020.0, ans=0.05 2024-09-19 14:12:25,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_na.min_abs, batch_count=681060.0, ans=0.02 2024-09-19 14:12:26,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=681060.0, ans=0.125 2024-09-19 14:12:32,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=681060.0, ans=0.125 2024-09-19 14:12:35,543 INFO [train.py:1198] (1/2) Epoch 38, batch 2850, loss[loss=0.2401, ctc_loss=0.1203, cr_loss=0.3845, attn_decoder_loss=0.2449, over 29521.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1148, cr_loss=0.3553, attn_decoder_loss=0.2406, over 5760870.52 frames. ], batch size: 77, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:13:04,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681180.0, ans=0.1 2024-09-19 14:13:19,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=22.5 2024-09-19 14:13:24,265 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:13:25,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=681220.0, ans=10.0 2024-09-19 14:13:35,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.59 vs. limit=15.0 2024-09-19 14:13:47,917 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.614e+01 9.082e+01 1.001e+02 4.152e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 14:13:50,897 INFO [train.py:1198] (1/2) Epoch 38, batch 2900, loss[loss=0.2223, ctc_loss=0.1058, cr_loss=0.335, attn_decoder_loss=0.2278, over 29409.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1154, cr_loss=0.3568, attn_decoder_loss=0.2414, over 5786085.04 frames. ], batch size: 79, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:14:10,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=681340.0, ans=0.125 2024-09-19 14:14:28,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=681380.0, ans=0.125 2024-09-19 14:14:34,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=681420.0, ans=0.2 2024-09-19 14:15:08,292 INFO [train.py:1198] (1/2) Epoch 38, batch 2950, loss[loss=0.226, ctc_loss=0.1064, cr_loss=0.3396, attn_decoder_loss=0.2317, over 29529.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1145, cr_loss=0.3549, attn_decoder_loss=0.2402, over 5781989.18 frames. ], batch size: 75, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:15:17,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=681500.0, ans=0.125 2024-09-19 14:15:23,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=681540.0, ans=0.1 2024-09-19 14:15:34,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=681540.0, ans=0.125 2024-09-19 14:15:34,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=681540.0, ans=0.1 2024-09-19 14:15:35,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=681540.0, ans=0.125 2024-09-19 14:15:54,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=681620.0, ans=0.5 2024-09-19 14:16:05,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=681620.0, ans=0.2 2024-09-19 14:16:23,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.330e+01 8.968e+01 9.568e+01 1.287e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 14:16:26,709 INFO [train.py:1198] (1/2) Epoch 38, batch 3000, loss[loss=0.2349, ctc_loss=0.1108, cr_loss=0.3491, attn_decoder_loss=0.241, over 29769.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.114, cr_loss=0.3539, attn_decoder_loss=0.2401, over 5782569.21 frames. ], batch size: 81, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:16:26,710 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 14:16:30,946 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8763, 2.1476, 2.3465, 2.3023, 2.3014, 2.3400, 2.5334, 2.4661], device='cuda:1') 2024-09-19 14:16:45,081 INFO [train.py:1230] (1/2) Epoch 38, validation: loss=0.2118, ctc_loss=0.03653, cr_loss=5.871e-15, attn_decoder_loss=0.2312, over 944034.00 frames. 2024-09-19 14:16:45,081 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 14:16:59,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=681740.0, ans=0.025 2024-09-19 14:17:03,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=681740.0, ans=0.0 2024-09-19 14:17:03,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=681740.0, ans=0.0 2024-09-19 14:17:14,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=681780.0, ans=0.0 2024-09-19 14:17:15,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=681780.0, ans=0.125 2024-09-19 14:17:30,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=681820.0, ans=0.1 2024-09-19 14:17:44,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=681860.0, ans=0.025 2024-09-19 14:17:49,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=681860.0, ans=0.125 2024-09-19 14:17:52,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=681860.0, ans=0.95 2024-09-19 14:17:53,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=681860.0, ans=0.0 2024-09-19 14:18:00,865 INFO [train.py:1198] (1/2) Epoch 38, batch 3050, loss[loss=0.2192, ctc_loss=0.1031, cr_loss=0.3265, attn_decoder_loss=0.2248, over 29505.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1145, cr_loss=0.3551, attn_decoder_loss=0.2408, over 5776899.83 frames. ], batch size: 76, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:18:01,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=681900.0, ans=0.0 2024-09-19 14:18:04,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=681900.0, ans=0.1 2024-09-19 14:18:07,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=681900.0, ans=0.1 2024-09-19 14:18:09,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.75 vs. limit=15.0 2024-09-19 14:18:39,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681980.0, ans=0.1 2024-09-19 14:18:47,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=682020.0, ans=10.0 2024-09-19 14:18:52,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682020.0, ans=0.1 2024-09-19 14:18:53,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=682020.0, ans=0.125 2024-09-19 14:19:09,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=682060.0, ans=0.0 2024-09-19 14:19:15,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.484e+01 8.987e+01 9.703e+01 1.967e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 14:19:18,298 INFO [train.py:1198] (1/2) Epoch 38, batch 3100, loss[loss=0.2481, ctc_loss=0.1191, cr_loss=0.3707, attn_decoder_loss=0.2542, over 29266.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1148, cr_loss=0.3563, attn_decoder_loss=0.2407, over 5776327.88 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:19:23,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.39 vs. limit=15.0 2024-09-19 14:19:59,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=682180.0, ans=0.1 2024-09-19 14:20:13,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=682220.0, ans=0.0 2024-09-19 14:20:21,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=682260.0, ans=0.025 2024-09-19 14:20:27,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.50 vs. limit=22.5 2024-09-19 14:20:35,963 INFO [train.py:1198] (1/2) Epoch 38, batch 3150, loss[loss=0.2473, ctc_loss=0.1238, cr_loss=0.3655, attn_decoder_loss=0.2529, over 28778.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1147, cr_loss=0.3559, attn_decoder_loss=0.2406, over 5782840.58 frames. ], batch size: 104, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:20:37,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=682300.0, ans=0.2 2024-09-19 14:20:50,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2024-09-19 14:20:51,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=682340.0, ans=0.125 2024-09-19 14:20:53,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.09 vs. limit=15.0 2024-09-19 14:21:01,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=682340.0, ans=0.2 2024-09-19 14:21:11,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=682380.0, ans=0.5 2024-09-19 14:21:32,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=682420.0, ans=0.05 2024-09-19 14:21:48,495 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.507e+01 9.169e+01 9.644e+01 2.178e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-19 14:21:51,678 INFO [train.py:1198] (1/2) Epoch 38, batch 3200, loss[loss=0.2349, ctc_loss=0.1127, cr_loss=0.3282, attn_decoder_loss=0.2412, over 29412.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1144, cr_loss=0.3556, attn_decoder_loss=0.2401, over 5793040.62 frames. ], batch size: 79, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:21:55,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-09-19 14:22:35,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=682580.0, ans=0.0 2024-09-19 14:23:02,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682660.0, ans=0.1 2024-09-19 14:23:09,408 INFO [train.py:1198] (1/2) Epoch 38, batch 3250, loss[loss=0.2337, ctc_loss=0.1097, cr_loss=0.3529, attn_decoder_loss=0.2397, over 29701.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1144, cr_loss=0.3555, attn_decoder_loss=0.2404, over 5799681.90 frames. ], batch size: 84, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:23:12,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=682700.0, ans=0.04949747468305833 2024-09-19 14:23:12,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=682700.0, ans=0.025 2024-09-19 14:23:17,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=682700.0, ans=0.0 2024-09-19 14:23:25,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2024-09-19 14:23:37,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=682740.0, ans=0.125 2024-09-19 14:23:41,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=682780.0, ans=0.0 2024-09-19 14:23:56,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.43 vs. limit=15.0 2024-09-19 14:23:58,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=682820.0, ans=0.0 2024-09-19 14:24:01,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=682820.0, ans=0.125 2024-09-19 14:24:19,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=682860.0, ans=0.0 2024-09-19 14:24:23,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.537e+01 9.091e+01 9.701e+01 1.814e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-19 14:24:26,790 INFO [train.py:1198] (1/2) Epoch 38, batch 3300, loss[loss=0.246, ctc_loss=0.1242, cr_loss=0.3863, attn_decoder_loss=0.2509, over 28194.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1139, cr_loss=0.3539, attn_decoder_loss=0.2393, over 5796553.55 frames. ], batch size: 111, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:24:31,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682900.0, ans=0.1 2024-09-19 14:24:37,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=682900.0, ans=0.125 2024-09-19 14:25:01,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=682980.0, ans=0.2 2024-09-19 14:25:03,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=682980.0, ans=0.125 2024-09-19 14:25:09,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=682980.0, ans=0.125 2024-09-19 14:25:12,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.55 vs. limit=15.0 2024-09-19 14:25:15,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=683020.0, ans=0.2 2024-09-19 14:25:27,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=683060.0, ans=10.0 2024-09-19 14:25:42,133 INFO [train.py:1198] (1/2) Epoch 38, batch 3350, loss[loss=0.2503, ctc_loss=0.1314, cr_loss=0.3797, attn_decoder_loss=0.2551, over 28829.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1144, cr_loss=0.3545, attn_decoder_loss=0.24, over 5773472.23 frames. ], batch size: 104, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:26:10,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2024-09-19 14:26:25,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=683180.0, ans=0.125 2024-09-19 14:26:42,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=683220.0, ans=0.125 2024-09-19 14:26:57,152 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.721e+01 9.366e+01 9.877e+01 4.380e+02, threshold=1.873e+02, percent-clipped=1.0 2024-09-19 14:27:00,142 INFO [train.py:1198] (1/2) Epoch 38, batch 3400, loss[loss=0.2089, ctc_loss=0.09958, cr_loss=0.3273, attn_decoder_loss=0.2138, over 29379.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1146, cr_loss=0.3546, attn_decoder_loss=0.24, over 5764992.65 frames. ], batch size: 67, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:27:01,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=683300.0, ans=0.04949747468305833 2024-09-19 14:27:41,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=683380.0, ans=0.125 2024-09-19 14:27:47,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=683420.0, ans=0.125 2024-09-19 14:27:49,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=683420.0, ans=0.0 2024-09-19 14:27:50,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=683420.0, ans=0.0 2024-09-19 14:28:04,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=683460.0, ans=0.125 2024-09-19 14:28:17,514 INFO [train.py:1198] (1/2) Epoch 38, batch 3450, loss[loss=0.2439, ctc_loss=0.1167, cr_loss=0.3617, attn_decoder_loss=0.25, over 28118.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1147, cr_loss=0.3554, attn_decoder_loss=0.2403, over 5773696.14 frames. ], batch size: 111, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:28:22,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2024-09-19 14:28:23,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=683500.0, ans=0.125 2024-09-19 14:28:42,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=683540.0, ans=0.125 2024-09-19 14:28:45,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=683540.0, ans=0.0 2024-09-19 14:28:45,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=683540.0, ans=0.125 2024-09-19 14:28:49,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=683580.0, ans=0.125 2024-09-19 14:28:52,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=683580.0, ans=0.2 2024-09-19 14:28:57,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.38 vs. limit=15.0 2024-09-19 14:29:00,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.10 vs. limit=22.5 2024-09-19 14:29:26,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=683660.0, ans=0.0 2024-09-19 14:29:31,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.608e+01 9.088e+01 9.945e+01 4.659e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-19 14:29:33,407 INFO [train.py:1198] (1/2) Epoch 38, batch 3500, loss[loss=0.2112, ctc_loss=0.09454, cr_loss=0.3248, attn_decoder_loss=0.217, over 29314.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1142, cr_loss=0.3544, attn_decoder_loss=0.2396, over 5776626.43 frames. ], batch size: 71, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:29:41,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=683700.0, ans=0.0 2024-09-19 14:29:44,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=683700.0, ans=0.1 2024-09-19 14:30:14,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=683780.0, ans=0.2 2024-09-19 14:30:40,436 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:30:49,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=683900.0, ans=0.125 2024-09-19 14:30:50,286 INFO [train.py:1198] (1/2) Epoch 38, batch 3550, loss[loss=0.2389, ctc_loss=0.1119, cr_loss=0.3173, attn_decoder_loss=0.2459, over 29700.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1142, cr_loss=0.3544, attn_decoder_loss=0.2396, over 5782106.74 frames. ], batch size: 89, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:30:55,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2024-09-19 14:31:12,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=683940.0, ans=0.125 2024-09-19 14:31:15,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=683940.0, ans=0.125 2024-09-19 14:31:23,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=683980.0, ans=0.0 2024-09-19 14:31:34,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=684020.0, ans=0.2 2024-09-19 14:31:48,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.01 vs. limit=15.0 2024-09-19 14:31:50,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684060.0, ans=0.1 2024-09-19 14:32:05,114 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.514e+01 8.990e+01 9.421e+01 1.244e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-19 14:32:06,675 INFO [train.py:1198] (1/2) Epoch 38, batch 3600, loss[loss=0.2316, ctc_loss=0.1136, cr_loss=0.3635, attn_decoder_loss=0.2366, over 29489.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1147, cr_loss=0.356, attn_decoder_loss=0.24, over 5791653.82 frames. ], batch size: 77, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:32:13,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=684100.0, ans=0.125 2024-09-19 14:32:26,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=684140.0, ans=0.125 2024-09-19 14:32:36,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.10 vs. limit=6.0 2024-09-19 14:33:05,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=684260.0, ans=0.0 2024-09-19 14:33:06,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=684260.0, ans=0.1 2024-09-19 14:33:21,071 INFO [train.py:1198] (1/2) Epoch 38, batch 3650, loss[loss=0.2519, ctc_loss=0.1273, cr_loss=0.3952, attn_decoder_loss=0.257, over 29484.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.114, cr_loss=0.3548, attn_decoder_loss=0.2392, over 5793262.44 frames. ], batch size: 90, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:33:28,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=684300.0, ans=0.125 2024-09-19 14:33:33,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=684300.0, ans=0.025 2024-09-19 14:33:41,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2024-09-19 14:34:16,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=684420.0, ans=0.0 2024-09-19 14:34:28,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=684460.0, ans=0.125 2024-09-19 14:34:29,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=684460.0, ans=0.0 2024-09-19 14:34:34,296 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.479e+01 8.884e+01 9.372e+01 5.863e+02, threshold=1.777e+02, percent-clipped=1.0 2024-09-19 14:34:35,782 INFO [train.py:1198] (1/2) Epoch 38, batch 3700, loss[loss=0.2386, ctc_loss=0.1119, cr_loss=0.3407, attn_decoder_loss=0.2451, over 29704.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.114, cr_loss=0.3547, attn_decoder_loss=0.2393, over 5803957.88 frames. ], batch size: 84, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:35:04,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=684580.0, ans=0.2 2024-09-19 14:35:06,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=684580.0, ans=0.0 2024-09-19 14:35:21,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=684620.0, ans=0.1 2024-09-19 14:35:29,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.89 vs. limit=12.0 2024-09-19 14:35:35,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=684660.0, ans=0.2 2024-09-19 14:35:47,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=684660.0, ans=0.0 2024-09-19 14:35:49,916 INFO [train.py:1198] (1/2) Epoch 38, batch 3750, loss[loss=0.2081, ctc_loss=0.09909, cr_loss=0.3251, attn_decoder_loss=0.213, over 29362.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.114, cr_loss=0.355, attn_decoder_loss=0.2394, over 5807769.95 frames. ], batch size: 67, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:36:32,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=684780.0, ans=0.125 2024-09-19 14:36:48,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=684820.0, ans=0.0 2024-09-19 14:37:04,809 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.242e+01 8.375e+01 8.926e+01 9.574e+01 1.662e+02, threshold=1.785e+02, percent-clipped=0.0 2024-09-19 14:37:06,352 INFO [train.py:1198] (1/2) Epoch 38, batch 3800, loss[loss=0.2592, ctc_loss=0.1361, cr_loss=0.4064, attn_decoder_loss=0.2638, over 29633.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1139, cr_loss=0.3545, attn_decoder_loss=0.2392, over 5797225.53 frames. ], batch size: 86, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:37:11,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=684900.0, ans=0.1 2024-09-19 14:37:14,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.25 vs. limit=15.0 2024-09-19 14:37:18,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=684900.0, ans=0.07 2024-09-19 14:37:23,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=684940.0, ans=0.125 2024-09-19 14:37:33,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=684940.0, ans=0.125 2024-09-19 14:37:38,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.88 vs. limit=15.0 2024-09-19 14:38:10,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=685060.0, ans=0.125 2024-09-19 14:38:20,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.55 vs. limit=15.0 2024-09-19 14:38:22,275 INFO [train.py:1198] (1/2) Epoch 38, batch 3850, loss[loss=0.2423, ctc_loss=0.117, cr_loss=0.3581, attn_decoder_loss=0.2483, over 29297.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1138, cr_loss=0.3545, attn_decoder_loss=0.239, over 5811231.48 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:38:31,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=685100.0, ans=0.09899494936611666 2024-09-19 14:38:33,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=685100.0, ans=0.0 2024-09-19 14:38:51,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=685180.0, ans=0.125 2024-09-19 14:38:52,722 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=22.5 2024-09-19 14:38:54,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=685180.0, ans=0.2 2024-09-19 14:38:57,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=685180.0, ans=0.0 2024-09-19 14:39:05,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=685220.0, ans=0.0 2024-09-19 14:39:15,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=685220.0, ans=0.0 2024-09-19 14:39:16,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.59 vs. limit=22.5 2024-09-19 14:39:23,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=685260.0, ans=0.07 2024-09-19 14:39:34,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.495e+01 8.957e+01 9.535e+01 1.173e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 14:39:36,179 INFO [train.py:1198] (1/2) Epoch 38, batch 3900, loss[loss=0.2451, ctc_loss=0.1124, cr_loss=0.3448, attn_decoder_loss=0.2521, over 29617.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1141, cr_loss=0.3549, attn_decoder_loss=0.2394, over 5815515.96 frames. ], batch size: 86, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:39:37,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.40 vs. limit=15.0 2024-09-19 14:39:42,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685300.0, ans=0.1 2024-09-19 14:39:51,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.01 vs. limit=10.0 2024-09-19 14:39:54,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-09-19 14:40:08,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=685380.0, ans=0.125 2024-09-19 14:40:23,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=685420.0, ans=0.125 2024-09-19 14:40:35,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685460.0, ans=0.1 2024-09-19 14:40:49,773 INFO [train.py:1198] (1/2) Epoch 38, batch 3950, loss[loss=0.2431, ctc_loss=0.1147, cr_loss=0.3558, attn_decoder_loss=0.2495, over 29534.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1141, cr_loss=0.3555, attn_decoder_loss=0.2395, over 5835479.91 frames. ], batch size: 97, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:41:07,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685540.0, ans=0.1 2024-09-19 14:41:34,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=685620.0, ans=0.125 2024-09-19 14:41:52,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=685660.0, ans=0.2 2024-09-19 14:41:53,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685660.0, ans=0.1 2024-09-19 14:42:01,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2024-09-19 14:42:03,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.719e+01 9.341e+01 1.012e+02 2.118e+02, threshold=1.868e+02, percent-clipped=1.0 2024-09-19 14:42:05,291 INFO [train.py:1198] (1/2) Epoch 38, batch 4000, loss[loss=0.2245, ctc_loss=0.104, cr_loss=0.3283, attn_decoder_loss=0.2306, over 29521.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1142, cr_loss=0.3555, attn_decoder_loss=0.2397, over 5812510.78 frames. ], batch size: 74, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 14:42:15,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=685700.0, ans=0.125 2024-09-19 14:42:41,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=685780.0, ans=0.125 2024-09-19 14:43:07,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=685860.0, ans=0.125 2024-09-19 14:43:20,734 INFO [train.py:1198] (1/2) Epoch 38, batch 4050, loss[loss=0.2537, ctc_loss=0.1484, cr_loss=0.4087, attn_decoder_loss=0.2563, over 20447.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1141, cr_loss=0.355, attn_decoder_loss=0.2394, over 5796746.42 frames. ], batch size: 209, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:43:31,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=685900.0, ans=0.125 2024-09-19 14:43:35,463 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:43:59,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=15.0 2024-09-19 14:44:20,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-09-19 14:44:25,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=686060.0, ans=0.125 2024-09-19 14:44:25,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=686060.0, ans=0.125 2024-09-19 14:44:33,762 INFO [train.py:1198] (1/2) Epoch 38, batch 4100, loss[loss=0.25, ctc_loss=0.1297, cr_loss=0.3796, attn_decoder_loss=0.255, over 29488.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1142, cr_loss=0.3545, attn_decoder_loss=0.2393, over 5792947.30 frames. ], batch size: 90, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:44:35,195 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.495e+01 9.024e+01 9.584e+01 1.415e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 14:45:25,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=686220.0, ans=0.0 2024-09-19 14:45:36,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=686260.0, ans=0.125 2024-09-19 14:45:47,168 INFO [train.py:1198] (1/2) Epoch 38, batch 4150, loss[loss=0.2304, ctc_loss=0.1139, cr_loss=0.3608, attn_decoder_loss=0.2353, over 29487.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1144, cr_loss=0.3551, attn_decoder_loss=0.2395, over 5797985.57 frames. ], batch size: 77, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:45:52,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2024-09-19 14:46:07,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=686340.0, ans=0.125 2024-09-19 14:46:31,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=686420.0, ans=0.07 2024-09-19 14:46:50,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=686460.0, ans=0.0 2024-09-19 14:47:01,641 INFO [train.py:1198] (1/2) Epoch 38, batch 4200, loss[loss=0.2497, ctc_loss=0.127, cr_loss=0.38, attn_decoder_loss=0.2549, over 29493.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1142, cr_loss=0.3546, attn_decoder_loss=0.2397, over 5799544.37 frames. ], batch size: 90, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:47:03,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.618e+01 9.071e+01 9.625e+01 1.972e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-19 14:47:04,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=686500.0, ans=0.5 2024-09-19 14:47:13,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-09-19 14:47:21,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=686540.0, ans=0.125 2024-09-19 14:47:21,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=686540.0, ans=0.125 2024-09-19 14:47:50,245 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:47:52,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2024-09-19 14:48:07,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=686660.0, ans=0.5 2024-09-19 14:48:16,372 INFO [train.py:1198] (1/2) Epoch 38, batch 4250, loss[loss=0.2096, ctc_loss=0.0897, cr_loss=0.3027, attn_decoder_loss=0.2162, over 29520.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1135, cr_loss=0.3531, attn_decoder_loss=0.2395, over 5806421.86 frames. ], batch size: 74, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:48:29,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=686740.0, ans=0.05 2024-09-19 14:48:31,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.07 vs. limit=22.5 2024-09-19 14:48:38,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=686740.0, ans=0.2 2024-09-19 14:49:12,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=686820.0, ans=0.0 2024-09-19 14:49:12,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=686820.0, ans=0.125 2024-09-19 14:49:30,555 INFO [train.py:1198] (1/2) Epoch 38, batch 4300, loss[loss=0.2406, ctc_loss=0.1159, cr_loss=0.3681, attn_decoder_loss=0.2463, over 29521.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1135, cr_loss=0.3532, attn_decoder_loss=0.2398, over 5796458.71 frames. ], batch size: 87, lr: 2.88e-03, grad_scale: 8.0 2024-09-19 14:49:32,030 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.697e+01 9.242e+01 9.593e+01 9.804e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-19 14:49:34,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=686900.0, ans=15.0 2024-09-19 14:49:36,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686900.0, ans=0.1 2024-09-19 14:49:53,405 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:50:05,162 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:50:28,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=687060.0, ans=0.0 2024-09-19 14:50:41,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=687060.0, ans=0.07 2024-09-19 14:50:45,380 INFO [train.py:1198] (1/2) Epoch 38, batch 4350, loss[loss=0.2503, ctc_loss=0.123, cr_loss=0.3722, attn_decoder_loss=0.2561, over 29489.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1164, cr_loss=0.3595, attn_decoder_loss=0.243, over 5799326.02 frames. ], batch size: 97, lr: 2.88e-03, grad_scale: 8.0 2024-09-19 14:50:55,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.04 vs. limit=22.5 2024-09-19 14:51:10,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=687140.0, ans=0.125 2024-09-19 14:51:22,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=687180.0, ans=0.125 2024-09-19 14:51:26,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=687180.0, ans=0.125 2024-09-19 14:51:33,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-19 14:51:42,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=687260.0, ans=0.125 2024-09-19 14:51:58,763 INFO [train.py:1198] (1/2) Epoch 38, batch 4400, loss[loss=0.2491, ctc_loss=0.1246, cr_loss=0.3732, attn_decoder_loss=0.2547, over 27449.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1172, cr_loss=0.3613, attn_decoder_loss=0.2447, over 5767820.61 frames. ], batch size: 124, lr: 2.88e-03, grad_scale: 16.0 2024-09-19 14:52:00,220 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.169e+01 8.939e+01 9.261e+01 9.709e+01 1.293e+02, threshold=1.852e+02, percent-clipped=0.0 2024-09-19 14:52:06,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=687300.0, ans=0.0 2024-09-19 14:52:13,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=687340.0, ans=0.125 2024-09-19 14:52:13,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=687340.0, ans=0.2 2024-09-19 14:52:15,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=687340.0, ans=0.125 2024-09-19 14:52:18,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=687340.0, ans=0.1 2024-09-19 14:52:30,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=687380.0, ans=0.2 2024-09-19 14:52:37,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=687380.0, ans=0.5 2024-09-19 14:52:37,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=687380.0, ans=0.0 2024-09-19 14:53:01,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.58 vs. limit=10.0 2024-09-19 14:53:13,726 INFO [train.py:1198] (1/2) Epoch 38, batch 4450, loss[loss=0.261, ctc_loss=0.1539, cr_loss=0.3939, attn_decoder_loss=0.2642, over 19951.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1205, cr_loss=0.367, attn_decoder_loss=0.2468, over 5574662.16 frames. ], batch size: 209, lr: 2.88e-03, grad_scale: 16.0 2024-09-19 14:53:17,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=687500.0, ans=0.125 2024-09-19 14:53:18,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=687500.0, ans=0.125 2024-09-19 14:53:20,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.12 vs. limit=10.0 2024-09-19 14:53:21,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=687500.0, ans=0.07 2024-09-19 14:53:24,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=687500.0, ans=0.025 2024-09-19 14:53:29,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=687540.0, ans=0.025 2024-09-19 14:53:53,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=687580.0, ans=0.125 2024-09-19 14:54:14,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=687660.0, ans=0.0 2024-09-19 14:54:28,750 INFO [train.py:1198] (1/2) Epoch 38, batch 4500, loss[loss=0.2604, ctc_loss=0.1438, cr_loss=0.4014, attn_decoder_loss=0.2644, over 20642.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1239, cr_loss=0.3694, attn_decoder_loss=0.2486, over 5232654.02 frames. ], batch size: 211, lr: 2.88e-03, grad_scale: 8.0 2024-09-19 14:54:31,692 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.974e+01 1.104e+02 1.169e+02 2.298e+02, threshold=2.208e+02, percent-clipped=1.0 2024-09-19 14:54:55,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=687740.0, ans=0.125 2024-09-19 14:54:55,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=687740.0, ans=0.125 2024-09-19 14:55:50,039 INFO [train.py:1198] (1/2) Epoch 39, batch 0, loss[loss=0.2146, ctc_loss=0.09725, cr_loss=0.3361, attn_decoder_loss=0.2202, over 29632.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.09725, cr_loss=0.3361, attn_decoder_loss=0.2202, over 29632.00 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 14:55:50,039 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 14:56:08,887 INFO [train.py:1230] (1/2) Epoch 39, validation: loss=0.2125, ctc_loss=0.03631, cr_loss=6.129e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-19 14:56:08,887 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 14:56:37,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=687840.0, ans=0.125 2024-09-19 14:56:40,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=687880.0, ans=0.125 2024-09-19 14:56:44,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2024-09-19 14:56:46,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=687880.0, ans=0.125 2024-09-19 14:57:03,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=687920.0, ans=0.2 2024-09-19 14:57:17,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.26 vs. limit=10.0 2024-09-19 14:57:32,783 INFO [train.py:1198] (1/2) Epoch 39, batch 50, loss[loss=0.2038, ctc_loss=0.09344, cr_loss=0.2908, attn_decoder_loss=0.2095, over 29406.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.114, cr_loss=0.357, attn_decoder_loss=0.2396, over 1267293.14 frames. ], batch size: 70, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 14:57:36,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=688000.0, ans=0.07 2024-09-19 14:58:03,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-19 14:58:14,950 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.884e+01 9.468e+01 1.073e+02 2.116e+02, threshold=1.894e+02, percent-clipped=0.0 2024-09-19 14:58:16,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=688120.0, ans=0.125 2024-09-19 14:58:18,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=688120.0, ans=0.0 2024-09-19 14:58:43,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=688160.0, ans=0.125 2024-09-19 14:58:44,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-09-19 14:58:45,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=688160.0, ans=15.0 2024-09-19 14:58:47,906 INFO [train.py:1198] (1/2) Epoch 39, batch 100, loss[loss=0.2294, ctc_loss=0.1133, cr_loss=0.3612, attn_decoder_loss=0.2342, over 29545.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1164, cr_loss=0.3599, attn_decoder_loss=0.2423, over 2250840.67 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 14:58:51,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.37 vs. limit=15.0 2024-09-19 14:59:00,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=688200.0, ans=0.1 2024-09-19 14:59:09,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-09-19 14:59:12,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=688240.0, ans=0.0 2024-09-19 14:59:27,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=688280.0, ans=0.0 2024-09-19 14:59:39,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=22.5 2024-09-19 15:00:04,827 INFO [train.py:1198] (1/2) Epoch 39, batch 150, loss[loss=0.2108, ctc_loss=0.1062, cr_loss=0.3375, attn_decoder_loss=0.215, over 29394.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.114, cr_loss=0.3551, attn_decoder_loss=0.2399, over 3045595.18 frames. ], batch size: 70, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:00:29,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=12.0 2024-09-19 15:00:37,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=688480.0, ans=0.0 2024-09-19 15:00:47,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=688480.0, ans=10.0 2024-09-19 15:00:48,857 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.419e+01 8.955e+01 9.625e+01 1.555e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 15:00:59,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=688520.0, ans=0.2 2024-09-19 15:01:07,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=688560.0, ans=0.1 2024-09-19 15:01:22,060 INFO [train.py:1198] (1/2) Epoch 39, batch 200, loss[loss=0.2476, ctc_loss=0.1228, cr_loss=0.375, attn_decoder_loss=0.2531, over 27343.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1135, cr_loss=0.3535, attn_decoder_loss=0.239, over 3658256.86 frames. ], batch size: 124, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:01:33,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.64 vs. limit=15.0 2024-09-19 15:01:49,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=688640.0, ans=0.125 2024-09-19 15:01:58,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=688680.0, ans=0.0 2024-09-19 15:01:59,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=688680.0, ans=0.125 2024-09-19 15:02:02,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=688680.0, ans=0.125 2024-09-19 15:02:08,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=688720.0, ans=0.2 2024-09-19 15:02:36,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=688800.0, ans=0.05 2024-09-19 15:02:37,645 INFO [train.py:1198] (1/2) Epoch 39, batch 250, loss[loss=0.2589, ctc_loss=0.1369, cr_loss=0.4088, attn_decoder_loss=0.2633, over 29237.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1133, cr_loss=0.353, attn_decoder_loss=0.2387, over 4141688.56 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:03:19,941 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.603e+01 9.098e+01 9.821e+01 6.363e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-19 15:03:21,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=688920.0, ans=0.125 2024-09-19 15:03:52,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=688960.0, ans=0.125 2024-09-19 15:03:52,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=688960.0, ans=0.0 2024-09-19 15:03:55,540 INFO [train.py:1198] (1/2) Epoch 39, batch 300, loss[loss=0.2519, ctc_loss=0.1275, cr_loss=0.4035, attn_decoder_loss=0.2568, over 29553.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1131, cr_loss=0.3523, attn_decoder_loss=0.2388, over 4509597.31 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:04:28,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=689080.0, ans=0.0 2024-09-19 15:04:42,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-09-19 15:05:06,023 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:05:13,145 INFO [train.py:1198] (1/2) Epoch 39, batch 350, loss[loss=0.2156, ctc_loss=0.101, cr_loss=0.3284, attn_decoder_loss=0.221, over 29341.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1136, cr_loss=0.3536, attn_decoder_loss=0.2395, over 4795725.71 frames. ], batch size: 71, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:05:13,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=689200.0, ans=0.0 2024-09-19 15:05:24,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=689200.0, ans=0.2 2024-09-19 15:05:25,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=689200.0, ans=0.0 2024-09-19 15:05:30,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-09-19 15:05:32,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689240.0, ans=0.1 2024-09-19 15:05:32,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=689240.0, ans=0.125 2024-09-19 15:05:46,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-09-19 15:05:52,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=689280.0, ans=0.125 2024-09-19 15:05:55,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.459e+01 8.983e+01 9.522e+01 3.712e+02, threshold=1.797e+02, percent-clipped=2.0 2024-09-19 15:06:00,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=689320.0, ans=0.04949747468305833 2024-09-19 15:06:18,584 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:06:28,538 INFO [train.py:1198] (1/2) Epoch 39, batch 400, loss[loss=0.2473, ctc_loss=0.1274, cr_loss=0.3941, attn_decoder_loss=0.2518, over 29713.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1139, cr_loss=0.355, attn_decoder_loss=0.2394, over 5024357.85 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:06:36,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=689400.0, ans=0.125 2024-09-19 15:06:42,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=689440.0, ans=0.125 2024-09-19 15:07:12,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=689520.0, ans=0.125 2024-09-19 15:07:27,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=689560.0, ans=0.125 2024-09-19 15:07:46,823 INFO [train.py:1198] (1/2) Epoch 39, batch 450, loss[loss=0.2498, ctc_loss=0.1285, cr_loss=0.3669, attn_decoder_loss=0.2552, over 29694.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1141, cr_loss=0.3546, attn_decoder_loss=0.2396, over 5186924.97 frames. ], batch size: 83, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:07:48,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=689600.0, ans=0.2 2024-09-19 15:07:53,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=689600.0, ans=0.125 2024-09-19 15:07:54,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=689600.0, ans=0.95 2024-09-19 15:08:05,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=12.0 2024-09-19 15:08:13,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=689640.0, ans=0.125 2024-09-19 15:08:24,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=689680.0, ans=0.0 2024-09-19 15:08:30,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=689680.0, ans=0.0 2024-09-19 15:08:32,805 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.422e+01 8.949e+01 9.558e+01 1.384e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-19 15:08:39,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=689720.0, ans=0.125 2024-09-19 15:09:04,769 INFO [train.py:1198] (1/2) Epoch 39, batch 500, loss[loss=0.2545, ctc_loss=0.1311, cr_loss=0.3853, attn_decoder_loss=0.2597, over 29436.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1134, cr_loss=0.3536, attn_decoder_loss=0.2387, over 5330712.87 frames. ], batch size: 94, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:09:06,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=689800.0, ans=0.0 2024-09-19 15:09:25,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.16 vs. limit=15.0 2024-09-19 15:09:38,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=689880.0, ans=0.125 2024-09-19 15:09:44,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=689880.0, ans=0.125 2024-09-19 15:09:47,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=689880.0, ans=0.0 2024-09-19 15:09:50,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=689920.0, ans=0.125 2024-09-19 15:10:08,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2024-09-19 15:10:15,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2024-09-19 15:10:20,414 INFO [train.py:1198] (1/2) Epoch 39, batch 550, loss[loss=0.24, ctc_loss=0.1139, cr_loss=0.3538, attn_decoder_loss=0.2462, over 28839.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.113, cr_loss=0.353, attn_decoder_loss=0.2385, over 5423072.97 frames. ], batch size: 104, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:10:29,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=690000.0, ans=0.125 2024-09-19 15:10:32,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=690000.0, ans=0.0 2024-09-19 15:10:50,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-09-19 15:11:04,425 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.632e+01 8.977e+01 9.526e+01 2.010e+02, threshold=1.795e+02, percent-clipped=2.0 2024-09-19 15:11:19,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=690120.0, ans=15.0 2024-09-19 15:11:27,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=690160.0, ans=0.2 2024-09-19 15:11:38,692 INFO [train.py:1198] (1/2) Epoch 39, batch 600, loss[loss=0.2507, ctc_loss=0.1266, cr_loss=0.3929, attn_decoder_loss=0.2557, over 29304.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1134, cr_loss=0.3538, attn_decoder_loss=0.2392, over 5509685.80 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:11:51,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=690200.0, ans=0.0 2024-09-19 15:11:55,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.17 vs. limit=12.0 2024-09-19 15:12:24,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=690320.0, ans=0.125 2024-09-19 15:12:26,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=690320.0, ans=0.125 2024-09-19 15:12:31,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=690320.0, ans=0.125 2024-09-19 15:12:33,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=690320.0, ans=0.025 2024-09-19 15:12:45,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=690360.0, ans=0.05 2024-09-19 15:12:56,381 INFO [train.py:1198] (1/2) Epoch 39, batch 650, loss[loss=0.2324, ctc_loss=0.1089, cr_loss=0.327, attn_decoder_loss=0.2388, over 29755.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1125, cr_loss=0.3514, attn_decoder_loss=0.2383, over 5586395.96 frames. ], batch size: 81, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:13:05,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=690400.0, ans=0.125 2024-09-19 15:13:37,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=690480.0, ans=0.125 2024-09-19 15:13:40,405 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.432e+01 8.976e+01 9.547e+01 1.845e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-19 15:13:45,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=690520.0, ans=0.1 2024-09-19 15:13:49,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=690520.0, ans=0.05 2024-09-19 15:13:56,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=690560.0, ans=0.125 2024-09-19 15:14:04,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=690560.0, ans=0.09899494936611666 2024-09-19 15:14:06,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690560.0, ans=0.1 2024-09-19 15:14:12,092 INFO [train.py:1198] (1/2) Epoch 39, batch 700, loss[loss=0.2272, ctc_loss=0.1143, cr_loss=0.3677, attn_decoder_loss=0.2316, over 29501.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1133, cr_loss=0.3534, attn_decoder_loss=0.2392, over 5635426.57 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:14:16,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=690600.0, ans=10.0 2024-09-19 15:14:27,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=690640.0, ans=0.1 2024-09-19 15:14:38,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.87 vs. limit=15.0 2024-09-19 15:14:43,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=690680.0, ans=0.125 2024-09-19 15:14:45,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=690680.0, ans=0.125 2024-09-19 15:15:03,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=690720.0, ans=0.2 2024-09-19 15:15:06,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=690720.0, ans=0.0 2024-09-19 15:15:17,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=690760.0, ans=0.125 2024-09-19 15:15:27,463 INFO [train.py:1198] (1/2) Epoch 39, batch 750, loss[loss=0.2337, ctc_loss=0.112, cr_loss=0.3523, attn_decoder_loss=0.2394, over 29712.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.113, cr_loss=0.3527, attn_decoder_loss=0.2388, over 5675397.28 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:15:30,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=690800.0, ans=0.0 2024-09-19 15:15:50,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=690840.0, ans=0.0 2024-09-19 15:16:15,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.497e+01 9.078e+01 9.651e+01 1.974e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 15:16:31,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=690960.0, ans=0.125 2024-09-19 15:16:47,094 INFO [train.py:1198] (1/2) Epoch 39, batch 800, loss[loss=0.2093, ctc_loss=0.0983, cr_loss=0.3262, attn_decoder_loss=0.2144, over 29613.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1131, cr_loss=0.3528, attn_decoder_loss=0.2388, over 5706619.66 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:16:59,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=691000.0, ans=0.0 2024-09-19 15:17:09,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.59 vs. limit=15.0 2024-09-19 15:17:16,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=15.0 2024-09-19 15:17:20,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=691080.0, ans=0.125 2024-09-19 15:17:41,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=691120.0, ans=0.09899494936611666 2024-09-19 15:18:02,587 INFO [train.py:1198] (1/2) Epoch 39, batch 850, loss[loss=0.2465, ctc_loss=0.1251, cr_loss=0.3965, attn_decoder_loss=0.2512, over 29702.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1122, cr_loss=0.3509, attn_decoder_loss=0.238, over 5736265.04 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:18:08,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=691200.0, ans=0.07 2024-09-19 15:18:14,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-09-19 15:18:19,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=691240.0, ans=15.0 2024-09-19 15:18:20,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=691240.0, ans=0.025 2024-09-19 15:18:28,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=691240.0, ans=0.125 2024-09-19 15:18:44,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=691280.0, ans=0.125 2024-09-19 15:18:46,233 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.459e+01 8.825e+01 9.402e+01 1.909e+02, threshold=1.765e+02, percent-clipped=1.0 2024-09-19 15:18:47,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=691320.0, ans=0.125 2024-09-19 15:19:10,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=691360.0, ans=0.125 2024-09-19 15:19:18,058 INFO [train.py:1198] (1/2) Epoch 39, batch 900, loss[loss=0.2164, ctc_loss=0.102, cr_loss=0.3193, attn_decoder_loss=0.222, over 29573.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1122, cr_loss=0.3515, attn_decoder_loss=0.2384, over 5741784.77 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:19:29,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-09-19 15:19:32,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=691400.0, ans=0.2 2024-09-19 15:20:29,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=691560.0, ans=0.025 2024-09-19 15:20:35,665 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.30 vs. limit=15.0 2024-09-19 15:20:37,720 INFO [train.py:1198] (1/2) Epoch 39, batch 950, loss[loss=0.2344, ctc_loss=0.1136, cr_loss=0.3649, attn_decoder_loss=0.2398, over 29503.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1127, cr_loss=0.3523, attn_decoder_loss=0.239, over 5743468.51 frames. ], batch size: 74, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:20:38,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2024-09-19 15:20:42,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=691600.0, ans=0.125 2024-09-19 15:20:52,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.51 vs. limit=15.0 2024-09-19 15:21:15,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.57 vs. limit=5.0 2024-09-19 15:21:20,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=691680.0, ans=0.125 2024-09-19 15:21:21,643 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.688e+01 9.370e+01 1.012e+02 2.860e+02, threshold=1.874e+02, percent-clipped=2.0 2024-09-19 15:21:24,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=691720.0, ans=0.025 2024-09-19 15:21:53,217 INFO [train.py:1198] (1/2) Epoch 39, batch 1000, loss[loss=0.2362, ctc_loss=0.1199, cr_loss=0.3636, attn_decoder_loss=0.241, over 29510.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1137, cr_loss=0.3537, attn_decoder_loss=0.2398, over 5737576.32 frames. ], batch size: 77, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:21:59,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=691800.0, ans=0.125 2024-09-19 15:22:13,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=691840.0, ans=22.5 2024-09-19 15:22:17,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=691840.0, ans=0.125 2024-09-19 15:22:35,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=691880.0, ans=0.125 2024-09-19 15:22:42,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2024-09-19 15:23:02,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=691960.0, ans=0.125 2024-09-19 15:23:08,636 INFO [train.py:1198] (1/2) Epoch 39, batch 1050, loss[loss=0.2471, ctc_loss=0.1257, cr_loss=0.3924, attn_decoder_loss=0.2518, over 29670.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1135, cr_loss=0.3532, attn_decoder_loss=0.2395, over 5745939.50 frames. ], batch size: 85, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:23:08,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=692000.0, ans=0.125 2024-09-19 15:23:34,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2024-09-19 15:23:36,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.78 vs. limit=22.5 2024-09-19 15:23:46,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=692080.0, ans=10.0 2024-09-19 15:23:54,906 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.552e+01 9.121e+01 9.553e+01 1.921e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 15:24:13,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=692160.0, ans=0.125 2024-09-19 15:24:14,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=692160.0, ans=0.125 2024-09-19 15:24:26,504 INFO [train.py:1198] (1/2) Epoch 39, batch 1100, loss[loss=0.2313, ctc_loss=0.1105, cr_loss=0.3571, attn_decoder_loss=0.2367, over 29445.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1132, cr_loss=0.3534, attn_decoder_loss=0.2393, over 5757487.83 frames. ], batch size: 78, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:24:43,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=692240.0, ans=0.125 2024-09-19 15:24:45,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=692240.0, ans=0.0 2024-09-19 15:25:04,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=692280.0, ans=0.07 2024-09-19 15:25:13,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=692320.0, ans=0.125 2024-09-19 15:25:42,568 INFO [train.py:1198] (1/2) Epoch 39, batch 1150, loss[loss=0.2262, ctc_loss=0.1074, cr_loss=0.3504, attn_decoder_loss=0.2316, over 29465.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1134, cr_loss=0.3531, attn_decoder_loss=0.2391, over 5755561.54 frames. ], batch size: 78, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:25:44,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=692400.0, ans=0.025 2024-09-19 15:25:49,058 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:25:49,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=692400.0, ans=0.125 2024-09-19 15:25:55,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=692400.0, ans=0.0 2024-09-19 15:26:01,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=692440.0, ans=0.2 2024-09-19 15:26:07,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=692440.0, ans=0.125 2024-09-19 15:26:25,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=692480.0, ans=0.2 2024-09-19 15:26:26,558 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 8.488e+01 9.080e+01 9.695e+01 1.564e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 15:26:55,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=692560.0, ans=0.0 2024-09-19 15:26:58,173 INFO [train.py:1198] (1/2) Epoch 39, batch 1200, loss[loss=0.2394, ctc_loss=0.12, cr_loss=0.3586, attn_decoder_loss=0.2447, over 29688.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1138, cr_loss=0.3542, attn_decoder_loss=0.2398, over 5748007.53 frames. ], batch size: 85, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 15:27:00,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=692600.0, ans=0.125 2024-09-19 15:27:14,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=692640.0, ans=0.125 2024-09-19 15:27:33,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.43 vs. limit=10.0 2024-09-19 15:27:42,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692680.0, ans=0.1 2024-09-19 15:27:44,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.49 vs. limit=15.0 2024-09-19 15:28:18,500 INFO [train.py:1198] (1/2) Epoch 39, batch 1250, loss[loss=0.2547, ctc_loss=0.122, cr_loss=0.3713, attn_decoder_loss=0.2612, over 29539.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1143, cr_loss=0.3557, attn_decoder_loss=0.2404, over 5774727.82 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:28:44,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=692840.0, ans=0.0 2024-09-19 15:28:44,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=692840.0, ans=0.2 2024-09-19 15:28:49,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2024-09-19 15:28:56,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=692880.0, ans=0.2 2024-09-19 15:28:57,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2024-09-19 15:29:05,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.604e+01 9.074e+01 9.816e+01 4.150e+02, threshold=1.815e+02, percent-clipped=2.0 2024-09-19 15:29:32,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=693000.0, ans=0.125 2024-09-19 15:29:33,892 INFO [train.py:1198] (1/2) Epoch 39, batch 1300, loss[loss=0.2441, ctc_loss=0.12, cr_loss=0.3781, attn_decoder_loss=0.2495, over 28335.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1138, cr_loss=0.3551, attn_decoder_loss=0.2397, over 5778853.22 frames. ], batch size: 111, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:29:36,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.28 vs. limit=15.0 2024-09-19 15:29:44,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2024-09-19 15:29:52,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=693040.0, ans=0.2 2024-09-19 15:29:55,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693040.0, ans=0.1 2024-09-19 15:29:55,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=693040.0, ans=0.1 2024-09-19 15:30:04,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=693080.0, ans=0.125 2024-09-19 15:30:13,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=693080.0, ans=0.125 2024-09-19 15:30:28,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=693120.0, ans=0.125 2024-09-19 15:30:49,669 INFO [train.py:1198] (1/2) Epoch 39, batch 1350, loss[loss=0.2352, ctc_loss=0.114, cr_loss=0.3743, attn_decoder_loss=0.2403, over 29752.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1135, cr_loss=0.3542, attn_decoder_loss=0.2394, over 5795428.33 frames. ], batch size: 81, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:31:16,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=693240.0, ans=0.125 2024-09-19 15:31:22,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=693280.0, ans=0.0 2024-09-19 15:31:29,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-09-19 15:31:35,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=693280.0, ans=0.025 2024-09-19 15:31:38,210 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:31:39,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=693320.0, ans=0.2 2024-09-19 15:31:40,745 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.563e+01 8.987e+01 9.374e+01 1.474e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-19 15:31:46,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-09-19 15:31:59,233 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:32:02,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=693360.0, ans=0.125 2024-09-19 15:32:09,542 INFO [train.py:1198] (1/2) Epoch 39, batch 1400, loss[loss=0.2054, ctc_loss=0.08855, cr_loss=0.2893, attn_decoder_loss=0.2119, over 29604.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1134, cr_loss=0.3538, attn_decoder_loss=0.239, over 5807143.27 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:32:09,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=693400.0, ans=0.2 2024-09-19 15:32:19,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.78 vs. limit=15.0 2024-09-19 15:32:38,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=693480.0, ans=0.2 2024-09-19 15:32:44,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=693480.0, ans=0.025 2024-09-19 15:32:48,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.37 vs. limit=22.5 2024-09-19 15:33:02,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=693520.0, ans=0.125 2024-09-19 15:33:13,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=693560.0, ans=0.0 2024-09-19 15:33:25,284 INFO [train.py:1198] (1/2) Epoch 39, batch 1450, loss[loss=0.2542, ctc_loss=0.1333, cr_loss=0.3897, attn_decoder_loss=0.259, over 29422.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1134, cr_loss=0.3535, attn_decoder_loss=0.2394, over 5803405.64 frames. ], batch size: 94, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:33:31,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=693600.0, ans=0.125 2024-09-19 15:33:35,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=693600.0, ans=0.125 2024-09-19 15:34:11,639 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.532e+01 9.213e+01 9.668e+01 2.812e+02, threshold=1.843e+02, percent-clipped=2.0 2024-09-19 15:34:12,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=693720.0, ans=0.0 2024-09-19 15:34:13,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-09-19 15:34:25,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=693760.0, ans=0.025 2024-09-19 15:34:40,278 INFO [train.py:1198] (1/2) Epoch 39, batch 1500, loss[loss=0.2416, ctc_loss=0.1186, cr_loss=0.3719, attn_decoder_loss=0.247, over 29635.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1136, cr_loss=0.3542, attn_decoder_loss=0.2398, over 5803726.68 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:34:45,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-19 15:34:51,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2024-09-19 15:35:21,751 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:35:30,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=693920.0, ans=0.0 2024-09-19 15:35:35,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=693920.0, ans=0.125 2024-09-19 15:35:36,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=693920.0, ans=0.1 2024-09-19 15:35:41,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-09-19 15:35:45,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=22.5 2024-09-19 15:35:48,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=22.5 2024-09-19 15:35:54,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=8.0 2024-09-19 15:35:57,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=693960.0, ans=0.125 2024-09-19 15:36:00,509 INFO [train.py:1198] (1/2) Epoch 39, batch 1550, loss[loss=0.2472, ctc_loss=0.1252, cr_loss=0.3796, attn_decoder_loss=0.2524, over 29473.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1138, cr_loss=0.3545, attn_decoder_loss=0.2398, over 5780471.69 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:36:09,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=694000.0, ans=0.2 2024-09-19 15:36:19,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-19 15:36:22,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=22.5 2024-09-19 15:36:23,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=694040.0, ans=0.125 2024-09-19 15:36:44,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694120.0, ans=0.1 2024-09-19 15:36:47,233 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.376e+01 8.823e+01 9.525e+01 1.389e+02, threshold=1.765e+02, percent-clipped=0.0 2024-09-19 15:36:56,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=694120.0, ans=0.2 2024-09-19 15:37:03,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-19 15:37:04,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=694160.0, ans=0.0 2024-09-19 15:37:14,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=694200.0, ans=0.125 2024-09-19 15:37:14,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=694200.0, ans=0.0 2024-09-19 15:37:16,080 INFO [train.py:1198] (1/2) Epoch 39, batch 1600, loss[loss=0.2409, ctc_loss=0.121, cr_loss=0.372, attn_decoder_loss=0.246, over 29681.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1138, cr_loss=0.3543, attn_decoder_loss=0.2396, over 5764066.68 frames. ], batch size: 85, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:37:17,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=694200.0, ans=0.0 2024-09-19 15:37:19,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=12.0 2024-09-19 15:37:25,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=694200.0, ans=0.125 2024-09-19 15:38:01,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=694320.0, ans=0.2 2024-09-19 15:38:01,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694320.0, ans=0.1 2024-09-19 15:38:07,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=694320.0, ans=0.0 2024-09-19 15:38:22,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=694360.0, ans=0.025 2024-09-19 15:38:28,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=694360.0, ans=0.125 2024-09-19 15:38:30,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.99 vs. limit=6.0 2024-09-19 15:38:31,551 INFO [train.py:1198] (1/2) Epoch 39, batch 1650, loss[loss=0.2503, ctc_loss=0.1142, cr_loss=0.3665, attn_decoder_loss=0.2573, over 29718.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1136, cr_loss=0.3539, attn_decoder_loss=0.2395, over 5760057.53 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:38:43,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-19 15:38:44,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=694400.0, ans=0.0 2024-09-19 15:38:47,625 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:38:59,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=694440.0, ans=0.125 2024-09-19 15:39:07,905 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.75 vs. limit=12.0 2024-09-19 15:39:13,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-09-19 15:39:17,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=694480.0, ans=0.0 2024-09-19 15:39:22,828 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.471e+01 8.887e+01 9.578e+01 2.740e+02, threshold=1.777e+02, percent-clipped=2.0 2024-09-19 15:39:26,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=694520.0, ans=0.2 2024-09-19 15:39:51,167 INFO [train.py:1198] (1/2) Epoch 39, batch 1700, loss[loss=0.2009, ctc_loss=0.08864, cr_loss=0.2975, attn_decoder_loss=0.2067, over 29552.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1134, cr_loss=0.3536, attn_decoder_loss=0.2395, over 5781507.61 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:39:58,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.82 vs. limit=22.5 2024-09-19 15:40:44,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=694720.0, ans=10.0 2024-09-19 15:40:53,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694760.0, ans=0.1 2024-09-19 15:41:06,500 INFO [train.py:1198] (1/2) Epoch 39, batch 1750, loss[loss=0.2064, ctc_loss=0.09019, cr_loss=0.3019, attn_decoder_loss=0.2126, over 29355.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.113, cr_loss=0.3526, attn_decoder_loss=0.2388, over 5789972.63 frames. ], batch size: 67, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:41:06,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=694800.0, ans=0.125 2024-09-19 15:41:07,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.15 vs. limit=15.0 2024-09-19 15:41:11,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=694800.0, ans=0.1 2024-09-19 15:41:17,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=694800.0, ans=0.125 2024-09-19 15:41:21,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=694840.0, ans=0.0 2024-09-19 15:41:41,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=694880.0, ans=0.125 2024-09-19 15:41:46,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=694880.0, ans=0.125 2024-09-19 15:41:53,563 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.741e+01 9.226e+01 9.687e+01 1.772e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-19 15:41:53,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=694920.0, ans=0.0 2024-09-19 15:41:53,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=694920.0, ans=0.0 2024-09-19 15:41:55,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=694920.0, ans=0.125 2024-09-19 15:41:55,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694920.0, ans=0.1 2024-09-19 15:41:56,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=694920.0, ans=0.1 2024-09-19 15:41:57,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.30 vs. limit=10.0 2024-09-19 15:42:01,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2024-09-19 15:42:10,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.19 vs. limit=22.5 2024-09-19 15:42:16,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=694960.0, ans=0.2 2024-09-19 15:42:22,081 INFO [train.py:1198] (1/2) Epoch 39, batch 1800, loss[loss=0.2422, ctc_loss=0.1282, cr_loss=0.386, attn_decoder_loss=0.2463, over 29672.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1133, cr_loss=0.3531, attn_decoder_loss=0.2391, over 5792050.46 frames. ], batch size: 83, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:42:32,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2024-09-19 15:42:32,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=695000.0, ans=0.125 2024-09-19 15:42:41,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=695040.0, ans=0.125 2024-09-19 15:42:48,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=695040.0, ans=0.125 2024-09-19 15:42:56,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=695080.0, ans=0.0 2024-09-19 15:42:57,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=695080.0, ans=0.2 2024-09-19 15:43:00,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=695080.0, ans=0.0 2024-09-19 15:43:35,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=695160.0, ans=0.1 2024-09-19 15:43:41,997 INFO [train.py:1198] (1/2) Epoch 39, batch 1850, loss[loss=0.2403, ctc_loss=0.1073, cr_loss=0.3307, attn_decoder_loss=0.2478, over 29628.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1133, cr_loss=0.3532, attn_decoder_loss=0.2391, over 5797491.23 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:43:51,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=695200.0, ans=0.125 2024-09-19 15:43:57,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=695240.0, ans=0.125 2024-09-19 15:44:01,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=695240.0, ans=0.0 2024-09-19 15:44:20,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=695280.0, ans=0.125 2024-09-19 15:44:21,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=695280.0, ans=0.125 2024-09-19 15:44:28,629 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 8.566e+01 9.030e+01 9.513e+01 1.502e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-19 15:44:41,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.41 vs. limit=15.0 2024-09-19 15:44:46,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-09-19 15:44:56,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=22.5 2024-09-19 15:44:57,274 INFO [train.py:1198] (1/2) Epoch 39, batch 1900, loss[loss=0.2345, ctc_loss=0.1052, cr_loss=0.3386, attn_decoder_loss=0.2413, over 29713.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1135, cr_loss=0.3537, attn_decoder_loss=0.2395, over 5804986.63 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:44:58,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.78 vs. limit=12.0 2024-09-19 15:45:05,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=695400.0, ans=0.125 2024-09-19 15:45:19,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=695440.0, ans=0.0 2024-09-19 15:45:28,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=695480.0, ans=0.0 2024-09-19 15:45:43,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=695520.0, ans=0.0 2024-09-19 15:46:13,429 INFO [train.py:1198] (1/2) Epoch 39, batch 1950, loss[loss=0.2323, ctc_loss=0.1056, cr_loss=0.3294, attn_decoder_loss=0.239, over 29460.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1145, cr_loss=0.3566, attn_decoder_loss=0.2409, over 5820183.65 frames. ], batch size: 78, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:46:24,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=695600.0, ans=0.025 2024-09-19 15:46:26,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.29 vs. limit=22.5 2024-09-19 15:46:32,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=695640.0, ans=0.0 2024-09-19 15:46:43,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695640.0, ans=0.1 2024-09-19 15:46:45,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=695680.0, ans=0.125 2024-09-19 15:47:04,138 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.858e+01 9.313e+01 9.741e+01 2.178e+02, threshold=1.863e+02, percent-clipped=1.0 2024-09-19 15:47:05,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=695720.0, ans=0.0 2024-09-19 15:47:25,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=695760.0, ans=0.0 2024-09-19 15:47:25,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=695760.0, ans=0.2 2024-09-19 15:47:32,836 INFO [train.py:1198] (1/2) Epoch 39, batch 2000, loss[loss=0.2055, ctc_loss=0.0959, cr_loss=0.3087, attn_decoder_loss=0.2108, over 29335.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1151, cr_loss=0.357, attn_decoder_loss=0.2412, over 5797738.69 frames. ], batch size: 67, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 15:47:48,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=695840.0, ans=0.125 2024-09-19 15:47:49,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2024-09-19 15:47:50,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=695840.0, ans=0.1 2024-09-19 15:48:44,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=695960.0, ans=0.0 2024-09-19 15:48:48,812 INFO [train.py:1198] (1/2) Epoch 39, batch 2050, loss[loss=0.2122, ctc_loss=0.0968, cr_loss=0.3178, attn_decoder_loss=0.218, over 29434.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1141, cr_loss=0.3548, attn_decoder_loss=0.2399, over 5789172.79 frames. ], batch size: 70, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:48:52,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=696000.0, ans=0.025 2024-09-19 15:49:29,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=696080.0, ans=0.0 2024-09-19 15:49:32,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=696120.0, ans=0.1 2024-09-19 15:49:37,159 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.542e+01 8.929e+01 9.648e+01 1.386e+02, threshold=1.786e+02, percent-clipped=0.0 2024-09-19 15:49:52,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=696160.0, ans=0.125 2024-09-19 15:50:02,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=12.0 2024-09-19 15:50:04,426 INFO [train.py:1198] (1/2) Epoch 39, batch 2100, loss[loss=0.2414, ctc_loss=0.1165, cr_loss=0.3603, attn_decoder_loss=0.2472, over 29755.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1138, cr_loss=0.3544, attn_decoder_loss=0.2396, over 5801458.05 frames. ], batch size: 81, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:50:36,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.38 vs. limit=10.0 2024-09-19 15:50:39,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2024-09-19 15:50:40,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=696280.0, ans=0.0 2024-09-19 15:50:44,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=696280.0, ans=0.035 2024-09-19 15:50:44,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=696280.0, ans=0.0 2024-09-19 15:50:45,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=696280.0, ans=0.0 2024-09-19 15:50:50,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=696320.0, ans=0.07 2024-09-19 15:50:51,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.17 vs. limit=10.0 2024-09-19 15:50:59,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=696320.0, ans=0.125 2024-09-19 15:51:05,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-19 15:51:06,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=696320.0, ans=0.125 2024-09-19 15:51:23,877 INFO [train.py:1198] (1/2) Epoch 39, batch 2150, loss[loss=0.228, ctc_loss=0.1091, cr_loss=0.3456, attn_decoder_loss=0.2336, over 29445.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.113, cr_loss=0.3526, attn_decoder_loss=0.2389, over 5816008.36 frames. ], batch size: 78, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:51:27,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=696400.0, ans=0.0 2024-09-19 15:51:28,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=696400.0, ans=0.025 2024-09-19 15:51:55,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=696480.0, ans=0.025 2024-09-19 15:52:10,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=696520.0, ans=0.125 2024-09-19 15:52:12,124 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.545e+01 9.048e+01 9.484e+01 1.799e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 15:52:17,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=696520.0, ans=0.2 2024-09-19 15:52:32,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=696560.0, ans=0.2 2024-09-19 15:52:34,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.44 vs. limit=15.0 2024-09-19 15:52:36,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=696560.0, ans=0.2 2024-09-19 15:52:39,538 INFO [train.py:1198] (1/2) Epoch 39, batch 2200, loss[loss=0.2401, ctc_loss=0.1098, cr_loss=0.3403, attn_decoder_loss=0.247, over 29617.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1135, cr_loss=0.3535, attn_decoder_loss=0.2392, over 5812251.86 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:52:39,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=696600.0, ans=0.125 2024-09-19 15:52:54,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=696640.0, ans=0.125 2024-09-19 15:52:55,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.83 vs. limit=22.5 2024-09-19 15:52:57,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-19 15:53:43,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2024-09-19 15:53:54,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=12.0 2024-09-19 15:53:55,432 INFO [train.py:1198] (1/2) Epoch 39, batch 2250, loss[loss=0.2494, ctc_loss=0.1181, cr_loss=0.3814, attn_decoder_loss=0.2555, over 29710.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.113, cr_loss=0.353, attn_decoder_loss=0.2389, over 5811904.62 frames. ], batch size: 82, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:54:06,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=696800.0, ans=0.2 2024-09-19 15:54:25,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=696840.0, ans=0.07 2024-09-19 15:54:28,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2024-09-19 15:54:45,789 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.499e+01 9.039e+01 9.530e+01 1.426e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 15:55:04,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=696960.0, ans=0.125 2024-09-19 15:55:06,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=11.77 vs. limit=15.0 2024-09-19 15:55:15,228 INFO [train.py:1198] (1/2) Epoch 39, batch 2300, loss[loss=0.209, ctc_loss=0.09698, cr_loss=0.3205, attn_decoder_loss=0.2143, over 29337.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1124, cr_loss=0.3512, attn_decoder_loss=0.238, over 5798315.21 frames. ], batch size: 71, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:55:33,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=697040.0, ans=0.025 2024-09-19 15:55:34,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=697040.0, ans=0.125 2024-09-19 15:55:50,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.97 vs. limit=15.0 2024-09-19 15:56:08,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2024-09-19 15:56:23,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=697160.0, ans=0.125 2024-09-19 15:56:30,680 INFO [train.py:1198] (1/2) Epoch 39, batch 2350, loss[loss=0.2414, ctc_loss=0.1183, cr_loss=0.3801, attn_decoder_loss=0.2466, over 29698.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1129, cr_loss=0.3521, attn_decoder_loss=0.2386, over 5802985.09 frames. ], batch size: 83, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:56:47,577 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:56:56,421 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:56:56,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=697240.0, ans=0.09899494936611666 2024-09-19 15:57:07,056 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:57:16,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=697320.0, ans=0.025 2024-09-19 15:57:20,254 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.672e+01 9.121e+01 9.858e+01 6.738e+02, threshold=1.824e+02, percent-clipped=2.0 2024-09-19 15:57:30,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=697360.0, ans=0.015 2024-09-19 15:57:46,029 INFO [train.py:1198] (1/2) Epoch 39, batch 2400, loss[loss=0.2194, ctc_loss=0.0994, cr_loss=0.3317, attn_decoder_loss=0.2254, over 29543.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1136, cr_loss=0.3536, attn_decoder_loss=0.2392, over 5807307.53 frames. ], batch size: 76, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:57:46,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=697400.0, ans=0.0 2024-09-19 15:58:14,353 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:58:20,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-19 15:58:24,826 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:58:26,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=697480.0, ans=0.125 2024-09-19 15:58:27,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=697480.0, ans=0.05 2024-09-19 15:58:36,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=697520.0, ans=0.125 2024-09-19 15:58:40,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=22.5 2024-09-19 15:58:44,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=697520.0, ans=0.0 2024-09-19 15:59:01,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=697560.0, ans=0.125 2024-09-19 15:59:06,339 INFO [train.py:1198] (1/2) Epoch 39, batch 2450, loss[loss=0.2403, ctc_loss=0.1142, cr_loss=0.3588, attn_decoder_loss=0.2463, over 29683.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1141, cr_loss=0.3555, attn_decoder_loss=0.24, over 5783734.85 frames. ], batch size: 82, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 15:59:24,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=697640.0, ans=0.2 2024-09-19 15:59:29,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.75 vs. limit=10.0 2024-09-19 15:59:29,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.24 vs. limit=15.0 2024-09-19 15:59:55,610 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 8.647e+01 9.273e+01 9.890e+01 2.382e+02, threshold=1.855e+02, percent-clipped=2.0 2024-09-19 16:00:16,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2024-09-19 16:00:21,282 INFO [train.py:1198] (1/2) Epoch 39, batch 2500, loss[loss=0.2568, ctc_loss=0.1302, cr_loss=0.3879, attn_decoder_loss=0.2623, over 29620.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1139, cr_loss=0.3552, attn_decoder_loss=0.2399, over 5794497.71 frames. ], batch size: 86, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:00:24,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-09-19 16:00:26,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=697800.0, ans=0.125 2024-09-19 16:00:39,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=697840.0, ans=0.2 2024-09-19 16:00:47,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2024-09-19 16:00:50,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-09-19 16:01:06,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-09-19 16:01:11,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=697920.0, ans=0.0 2024-09-19 16:01:17,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=697920.0, ans=0.125 2024-09-19 16:01:19,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=697920.0, ans=0.125 2024-09-19 16:01:37,121 INFO [train.py:1198] (1/2) Epoch 39, batch 2550, loss[loss=0.1988, ctc_loss=0.08845, cr_loss=0.3036, attn_decoder_loss=0.2043, over 29353.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1133, cr_loss=0.3538, attn_decoder_loss=0.2394, over 5797464.60 frames. ], batch size: 67, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:02:01,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698040.0, ans=0.1 2024-09-19 16:02:05,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2024-09-19 16:02:06,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=698040.0, ans=0.2 2024-09-19 16:02:28,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.383e+01 8.876e+01 9.415e+01 4.021e+02, threshold=1.775e+02, percent-clipped=1.0 2024-09-19 16:02:56,977 INFO [train.py:1198] (1/2) Epoch 39, batch 2600, loss[loss=0.2257, ctc_loss=0.1105, cr_loss=0.3472, attn_decoder_loss=0.2308, over 29461.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1138, cr_loss=0.3545, attn_decoder_loss=0.24, over 5793140.24 frames. ], batch size: 78, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:02:59,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-09-19 16:03:13,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=15.0 2024-09-19 16:03:30,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=15.0 2024-09-19 16:03:56,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.22 vs. limit=15.0 2024-09-19 16:04:02,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-19 16:04:12,645 INFO [train.py:1198] (1/2) Epoch 39, batch 2650, loss[loss=0.2482, ctc_loss=0.1291, cr_loss=0.3954, attn_decoder_loss=0.2527, over 29238.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1142, cr_loss=0.3555, attn_decoder_loss=0.2404, over 5800944.17 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:05:02,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.675e+01 8.983e+01 9.685e+01 2.002e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 16:05:17,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=698560.0, ans=0.2 2024-09-19 16:05:26,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=698600.0, ans=0.2 2024-09-19 16:05:27,840 INFO [train.py:1198] (1/2) Epoch 39, batch 2700, loss[loss=0.2404, ctc_loss=0.1089, cr_loss=0.331, attn_decoder_loss=0.2476, over 29512.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1142, cr_loss=0.3557, attn_decoder_loss=0.2406, over 5796201.14 frames. ], batch size: 87, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:05:28,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=698600.0, ans=0.125 2024-09-19 16:05:28,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=698600.0, ans=0.125 2024-09-19 16:05:32,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=698600.0, ans=0.125 2024-09-19 16:05:49,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=698640.0, ans=0.125 2024-09-19 16:06:03,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=698680.0, ans=0.125 2024-09-19 16:06:06,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=698680.0, ans=0.125 2024-09-19 16:06:37,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-09-19 16:06:45,671 INFO [train.py:1198] (1/2) Epoch 39, batch 2750, loss[loss=0.231, ctc_loss=0.1145, cr_loss=0.367, attn_decoder_loss=0.2358, over 29503.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1133, cr_loss=0.3532, attn_decoder_loss=0.2391, over 5794568.35 frames. ], batch size: 75, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:06:45,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=698800.0, ans=0.125 2024-09-19 16:06:48,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.51 vs. limit=6.0 2024-09-19 16:07:00,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=698800.0, ans=0.125 2024-09-19 16:07:04,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=698840.0, ans=0.125 2024-09-19 16:07:09,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.02 vs. limit=15.0 2024-09-19 16:07:20,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=15.0 2024-09-19 16:07:38,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.394e+01 9.092e+01 9.647e+01 2.225e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-19 16:08:03,256 INFO [train.py:1198] (1/2) Epoch 39, batch 2800, loss[loss=0.2511, ctc_loss=0.1354, cr_loss=0.3685, attn_decoder_loss=0.2558, over 20333.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1137, cr_loss=0.3542, attn_decoder_loss=0.2394, over 5774757.65 frames. ], batch size: 210, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:08:24,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=699040.0, ans=0.125 2024-09-19 16:08:53,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.91 vs. limit=10.0 2024-09-19 16:09:04,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2024-09-19 16:09:18,776 INFO [train.py:1198] (1/2) Epoch 39, batch 2850, loss[loss=0.2289, ctc_loss=0.1202, cr_loss=0.3768, attn_decoder_loss=0.2326, over 29499.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1138, cr_loss=0.3545, attn_decoder_loss=0.2397, over 5760549.15 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:09:30,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=699200.0, ans=0.125 2024-09-19 16:09:46,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=699240.0, ans=0.025 2024-09-19 16:09:58,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=699280.0, ans=0.125 2024-09-19 16:10:09,389 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:10:10,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=699320.0, ans=0.125 2024-09-19 16:10:13,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.628e+01 9.119e+01 9.691e+01 3.191e+02, threshold=1.824e+02, percent-clipped=2.0 2024-09-19 16:10:32,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=699360.0, ans=0.0 2024-09-19 16:10:35,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=699400.0, ans=0.0 2024-09-19 16:10:36,337 INFO [train.py:1198] (1/2) Epoch 39, batch 2900, loss[loss=0.2294, ctc_loss=0.1069, cr_loss=0.3506, attn_decoder_loss=0.2353, over 29424.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1147, cr_loss=0.3566, attn_decoder_loss=0.2409, over 5786803.73 frames. ], batch size: 79, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:10:58,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=699440.0, ans=0.125 2024-09-19 16:11:28,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=699520.0, ans=0.0 2024-09-19 16:11:43,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=699560.0, ans=0.025 2024-09-19 16:11:53,779 INFO [train.py:1198] (1/2) Epoch 39, batch 2950, loss[loss=0.2306, ctc_loss=0.1184, cr_loss=0.3758, attn_decoder_loss=0.2348, over 29528.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1135, cr_loss=0.3539, attn_decoder_loss=0.2394, over 5782045.11 frames. ], batch size: 75, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:11:55,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=699600.0, ans=0.125 2024-09-19 16:12:24,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=699680.0, ans=0.125 2024-09-19 16:12:36,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699680.0, ans=0.1 2024-09-19 16:12:46,697 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.658e+01 9.205e+01 9.936e+01 3.321e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-19 16:12:56,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=699760.0, ans=0.0 2024-09-19 16:12:57,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=699760.0, ans=0.125 2024-09-19 16:13:00,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=699760.0, ans=0.0 2024-09-19 16:13:09,520 INFO [train.py:1198] (1/2) Epoch 39, batch 3000, loss[loss=0.2342, ctc_loss=0.1158, cr_loss=0.3638, attn_decoder_loss=0.2392, over 29752.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1129, cr_loss=0.3525, attn_decoder_loss=0.239, over 5784048.78 frames. ], batch size: 81, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:13:09,520 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 16:13:16,250 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.9441, 5.3603, 5.7330, 5.8899], device='cuda:1') 2024-09-19 16:13:28,815 INFO [train.py:1230] (1/2) Epoch 39, validation: loss=0.2123, ctc_loss=0.03671, cr_loss=6.289e-15, attn_decoder_loss=0.2318, over 944034.00 frames. 2024-09-19 16:13:28,815 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 16:13:43,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=699840.0, ans=0.125 2024-09-19 16:14:31,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=699960.0, ans=0.125 2024-09-19 16:14:44,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.11 vs. limit=15.0 2024-09-19 16:14:46,912 INFO [train.py:1198] (1/2) Epoch 39, batch 3050, loss[loss=0.2229, ctc_loss=0.09988, cr_loss=0.3242, attn_decoder_loss=0.2294, over 29529.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1134, cr_loss=0.3533, attn_decoder_loss=0.2396, over 5777083.02 frames. ], batch size: 76, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:14:50,419 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:15:09,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700040.0, ans=0.1 2024-09-19 16:15:39,617 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.454e+01 9.058e+01 9.630e+01 1.961e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 16:16:02,117 INFO [train.py:1198] (1/2) Epoch 39, batch 3100, loss[loss=0.2453, ctc_loss=0.1143, cr_loss=0.3696, attn_decoder_loss=0.2517, over 29220.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1131, cr_loss=0.3528, attn_decoder_loss=0.2393, over 5777074.55 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:16:02,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=700200.0, ans=0.04949747468305833 2024-09-19 16:16:05,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=700200.0, ans=0.125 2024-09-19 16:16:21,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=700240.0, ans=0.125 2024-09-19 16:16:35,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=700280.0, ans=0.0 2024-09-19 16:16:42,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=700280.0, ans=0.0 2024-09-19 16:17:13,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=700360.0, ans=0.1 2024-09-19 16:17:13,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=700360.0, ans=0.09899494936611666 2024-09-19 16:17:19,629 INFO [train.py:1198] (1/2) Epoch 39, batch 3150, loss[loss=0.2431, ctc_loss=0.1237, cr_loss=0.3702, attn_decoder_loss=0.2481, over 28908.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1137, cr_loss=0.354, attn_decoder_loss=0.2395, over 5783023.93 frames. ], batch size: 104, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:17:38,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=700440.0, ans=0.0 2024-09-19 16:17:53,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=700480.0, ans=0.125 2024-09-19 16:18:12,226 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.732e+01 9.135e+01 9.638e+01 1.512e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 16:18:26,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=700560.0, ans=0.125 2024-09-19 16:18:26,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=700560.0, ans=0.1 2024-09-19 16:18:32,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=700560.0, ans=0.1 2024-09-19 16:18:34,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.39 vs. limit=10.0 2024-09-19 16:18:36,811 INFO [train.py:1198] (1/2) Epoch 39, batch 3200, loss[loss=0.2266, ctc_loss=0.1059, cr_loss=0.3457, attn_decoder_loss=0.2323, over 29430.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1134, cr_loss=0.3531, attn_decoder_loss=0.2392, over 5793537.46 frames. ], batch size: 79, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:18:47,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=700600.0, ans=0.5 2024-09-19 16:18:54,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-09-19 16:19:07,685 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:19:13,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=700680.0, ans=0.125 2024-09-19 16:19:53,012 INFO [train.py:1198] (1/2) Epoch 39, batch 3250, loss[loss=0.2428, ctc_loss=0.1208, cr_loss=0.3737, attn_decoder_loss=0.248, over 29715.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1137, cr_loss=0.3543, attn_decoder_loss=0.2397, over 5799840.01 frames. ], batch size: 84, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:19:56,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-09-19 16:19:57,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=700800.0, ans=0.125 2024-09-19 16:20:25,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700880.0, ans=0.1 2024-09-19 16:20:28,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700880.0, ans=0.1 2024-09-19 16:20:28,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=700880.0, ans=0.125 2024-09-19 16:20:34,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=700880.0, ans=0.125 2024-09-19 16:20:46,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.633e+01 8.604e+01 9.197e+01 9.698e+01 1.830e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-19 16:20:55,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=700960.0, ans=0.1 2024-09-19 16:20:58,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=700960.0, ans=0.0 2024-09-19 16:21:09,875 INFO [train.py:1198] (1/2) Epoch 39, batch 3300, loss[loss=0.239, ctc_loss=0.1117, cr_loss=0.3536, attn_decoder_loss=0.2453, over 28354.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1132, cr_loss=0.3532, attn_decoder_loss=0.2389, over 5796249.38 frames. ], batch size: 111, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:21:52,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=701080.0, ans=0.125 2024-09-19 16:21:58,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=701120.0, ans=0.125 2024-09-19 16:22:01,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=701120.0, ans=0.125 2024-09-19 16:22:19,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=701160.0, ans=0.125 2024-09-19 16:22:27,048 INFO [train.py:1198] (1/2) Epoch 39, batch 3350, loss[loss=0.2381, ctc_loss=0.1094, cr_loss=0.3463, attn_decoder_loss=0.2447, over 28828.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1139, cr_loss=0.3545, attn_decoder_loss=0.2397, over 5772877.34 frames. ], batch size: 104, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:22:41,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.17 vs. limit=15.0 2024-09-19 16:22:51,664 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:22:54,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=701240.0, ans=0.125 2024-09-19 16:23:21,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.630e+01 9.121e+01 9.700e+01 6.720e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 16:23:36,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=701360.0, ans=0.1 2024-09-19 16:23:36,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=701360.0, ans=0.125 2024-09-19 16:23:42,603 INFO [train.py:1198] (1/2) Epoch 39, batch 3400, loss[loss=0.1969, ctc_loss=0.08936, cr_loss=0.2975, attn_decoder_loss=0.2022, over 29352.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1142, cr_loss=0.3553, attn_decoder_loss=0.2398, over 5767222.58 frames. ], batch size: 67, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:24:00,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=701440.0, ans=0.125 2024-09-19 16:24:03,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=701440.0, ans=0.0 2024-09-19 16:24:07,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-19 16:24:19,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=701480.0, ans=0.0 2024-09-19 16:24:19,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-09-19 16:24:43,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=701560.0, ans=0.125 2024-09-19 16:24:46,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=701560.0, ans=0.0 2024-09-19 16:24:46,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=701560.0, ans=0.125 2024-09-19 16:25:00,307 INFO [train.py:1198] (1/2) Epoch 39, batch 3450, loss[loss=0.2415, ctc_loss=0.1143, cr_loss=0.3515, attn_decoder_loss=0.2478, over 28307.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.114, cr_loss=0.3549, attn_decoder_loss=0.2398, over 5774792.97 frames. ], batch size: 111, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:25:08,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.77 vs. limit=15.0 2024-09-19 16:25:51,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=701720.0, ans=0.0 2024-09-19 16:25:54,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.636e+01 9.201e+01 9.668e+01 2.196e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-19 16:25:58,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=12.0 2024-09-19 16:26:00,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=701760.0, ans=0.125 2024-09-19 16:26:17,576 INFO [train.py:1198] (1/2) Epoch 39, batch 3500, loss[loss=0.2024, ctc_loss=0.08832, cr_loss=0.2847, attn_decoder_loss=0.2087, over 29337.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1139, cr_loss=0.3544, attn_decoder_loss=0.2394, over 5776439.66 frames. ], batch size: 71, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:26:22,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=701800.0, ans=0.125 2024-09-19 16:26:36,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=701840.0, ans=0.2 2024-09-19 16:26:49,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=701880.0, ans=0.125 2024-09-19 16:26:49,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=701880.0, ans=0.125 2024-09-19 16:27:24,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=22.5 2024-09-19 16:27:31,755 INFO [train.py:1198] (1/2) Epoch 39, batch 3550, loss[loss=0.2422, ctc_loss=0.1154, cr_loss=0.3681, attn_decoder_loss=0.2481, over 29722.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1139, cr_loss=0.3546, attn_decoder_loss=0.2395, over 5783376.57 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:27:48,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=702040.0, ans=10.0 2024-09-19 16:27:55,590 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:27:56,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-09-19 16:27:56,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=702040.0, ans=0.0 2024-09-19 16:28:10,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=702080.0, ans=0.125 2024-09-19 16:28:24,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.449e+01 9.039e+01 9.569e+01 2.236e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 16:28:24,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=702120.0, ans=0.0 2024-09-19 16:28:24,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=702120.0, ans=0.1 2024-09-19 16:28:27,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=702120.0, ans=0.125 2024-09-19 16:28:45,267 INFO [train.py:1198] (1/2) Epoch 39, batch 3600, loss[loss=0.2304, ctc_loss=0.108, cr_loss=0.3297, attn_decoder_loss=0.2366, over 29496.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.114, cr_loss=0.3544, attn_decoder_loss=0.2397, over 5791925.81 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:29:05,239 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:29:21,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=702280.0, ans=0.125 2024-09-19 16:29:30,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2024-09-19 16:29:50,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-19 16:29:52,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=702360.0, ans=0.025 2024-09-19 16:30:01,925 INFO [train.py:1198] (1/2) Epoch 39, batch 3650, loss[loss=0.2439, ctc_loss=0.1256, cr_loss=0.3776, attn_decoder_loss=0.2487, over 29501.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1134, cr_loss=0.3533, attn_decoder_loss=0.2391, over 5793000.93 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:30:02,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-09-19 16:30:28,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=702440.0, ans=0.0 2024-09-19 16:30:33,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=702480.0, ans=0.0 2024-09-19 16:30:48,515 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:30:53,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2024-09-19 16:30:55,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.559e+01 9.136e+01 9.465e+01 1.942e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-19 16:31:04,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=702560.0, ans=0.0 2024-09-19 16:31:13,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=702560.0, ans=0.125 2024-09-19 16:31:16,132 INFO [train.py:1198] (1/2) Epoch 39, batch 3700, loss[loss=0.2457, ctc_loss=0.127, cr_loss=0.3957, attn_decoder_loss=0.2501, over 29714.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1131, cr_loss=0.3525, attn_decoder_loss=0.239, over 5803048.56 frames. ], batch size: 84, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:31:32,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702640.0, ans=0.1 2024-09-19 16:31:34,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.24 vs. limit=10.0 2024-09-19 16:31:37,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=702640.0, ans=0.2 2024-09-19 16:31:56,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=702680.0, ans=0.125 2024-09-19 16:32:03,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=702720.0, ans=0.125 2024-09-19 16:32:08,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2024-09-19 16:32:13,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=702760.0, ans=0.0 2024-09-19 16:32:19,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=702760.0, ans=0.125 2024-09-19 16:32:20,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=12.0 2024-09-19 16:32:23,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=12.0 2024-09-19 16:32:31,622 INFO [train.py:1198] (1/2) Epoch 39, batch 3750, loss[loss=0.2093, ctc_loss=0.1016, cr_loss=0.347, attn_decoder_loss=0.2135, over 29340.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1132, cr_loss=0.3528, attn_decoder_loss=0.2388, over 5807512.50 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:32:36,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=702800.0, ans=0.0 2024-09-19 16:32:52,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=702840.0, ans=0.125 2024-09-19 16:33:06,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.57 vs. limit=15.0 2024-09-19 16:33:19,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=15.0 2024-09-19 16:33:23,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.25 vs. limit=15.0 2024-09-19 16:33:26,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.464e+01 8.930e+01 9.588e+01 2.704e+02, threshold=1.786e+02, percent-clipped=2.0 2024-09-19 16:33:38,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=702960.0, ans=0.125 2024-09-19 16:33:38,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=702960.0, ans=0.1 2024-09-19 16:33:45,847 INFO [train.py:1198] (1/2) Epoch 39, batch 3800, loss[loss=0.2338, ctc_loss=0.1079, cr_loss=0.3466, attn_decoder_loss=0.2401, over 29627.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1133, cr_loss=0.3525, attn_decoder_loss=0.2386, over 5798360.21 frames. ], batch size: 86, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:33:56,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=703000.0, ans=0.2 2024-09-19 16:34:06,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.63 vs. limit=12.0 2024-09-19 16:34:17,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=703080.0, ans=0.0 2024-09-19 16:34:36,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2024-09-19 16:34:57,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=703160.0, ans=0.0 2024-09-19 16:35:00,348 INFO [train.py:1198] (1/2) Epoch 39, batch 3850, loss[loss=0.2464, ctc_loss=0.1249, cr_loss=0.3723, attn_decoder_loss=0.2516, over 29235.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1129, cr_loss=0.3525, attn_decoder_loss=0.2384, over 5812949.82 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:35:18,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=703240.0, ans=0.0 2024-09-19 16:35:24,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=703240.0, ans=0.125 2024-09-19 16:35:33,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.13 vs. limit=6.0 2024-09-19 16:35:56,712 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.648e+01 9.079e+01 9.833e+01 2.007e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 16:36:15,978 INFO [train.py:1198] (1/2) Epoch 39, batch 3900, loss[loss=0.2441, ctc_loss=0.1203, cr_loss=0.3717, attn_decoder_loss=0.2496, over 29622.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1131, cr_loss=0.3527, attn_decoder_loss=0.2388, over 5816832.52 frames. ], batch size: 86, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:36:32,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=703440.0, ans=0.0 2024-09-19 16:36:38,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=703440.0, ans=0.025 2024-09-19 16:36:50,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=703480.0, ans=0.0 2024-09-19 16:37:07,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=703520.0, ans=0.04949747468305833 2024-09-19 16:37:09,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=703520.0, ans=0.125 2024-09-19 16:37:29,843 INFO [train.py:1198] (1/2) Epoch 39, batch 3950, loss[loss=0.2554, ctc_loss=0.1324, cr_loss=0.4045, attn_decoder_loss=0.2601, over 29458.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.113, cr_loss=0.3526, attn_decoder_loss=0.2388, over 5835848.30 frames. ], batch size: 97, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:37:35,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=703600.0, ans=0.125 2024-09-19 16:37:37,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=703600.0, ans=0.0 2024-09-19 16:37:55,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=703640.0, ans=0.0 2024-09-19 16:38:06,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=12.0 2024-09-19 16:38:11,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=703680.0, ans=0.035 2024-09-19 16:38:25,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.999e+01 8.633e+01 9.078e+01 9.598e+01 1.411e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 16:38:29,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=703760.0, ans=0.125 2024-09-19 16:38:36,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=703760.0, ans=0.0 2024-09-19 16:38:42,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=703760.0, ans=0.2 2024-09-19 16:38:44,845 INFO [train.py:1198] (1/2) Epoch 39, batch 4000, loss[loss=0.2248, ctc_loss=0.1085, cr_loss=0.3441, attn_decoder_loss=0.2301, over 29497.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1133, cr_loss=0.3529, attn_decoder_loss=0.239, over 5812337.42 frames. ], batch size: 74, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:38:48,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2024-09-19 16:39:30,933 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:39:36,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=703920.0, ans=10.0 2024-09-19 16:39:44,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=703960.0, ans=0.125 2024-09-19 16:39:53,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=703960.0, ans=0.125 2024-09-19 16:40:06,232 INFO [train.py:1198] (1/2) Epoch 39, batch 4050, loss[loss=0.2515, ctc_loss=0.1411, cr_loss=0.3843, attn_decoder_loss=0.2552, over 20326.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1136, cr_loss=0.3532, attn_decoder_loss=0.239, over 5796603.51 frames. ], batch size: 210, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:40:15,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-19 16:40:16,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=704000.0, ans=0.125 2024-09-19 16:40:18,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704000.0, ans=0.1 2024-09-19 16:40:28,133 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:40:29,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=704040.0, ans=0.125 2024-09-19 16:40:47,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=704080.0, ans=0.125 2024-09-19 16:40:47,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=704080.0, ans=0.125 2024-09-19 16:41:01,438 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 8.633e+01 9.112e+01 9.845e+01 1.931e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 16:41:01,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=704120.0, ans=0.0 2024-09-19 16:41:05,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.58 vs. limit=5.0 2024-09-19 16:41:08,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.28 vs. limit=15.0 2024-09-19 16:41:20,592 INFO [train.py:1198] (1/2) Epoch 39, batch 4100, loss[loss=0.2435, ctc_loss=0.1237, cr_loss=0.3597, attn_decoder_loss=0.2489, over 29517.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.114, cr_loss=0.3541, attn_decoder_loss=0.2394, over 5791371.30 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:41:30,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=704200.0, ans=0.125 2024-09-19 16:41:31,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=704200.0, ans=0.0 2024-09-19 16:41:44,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=704240.0, ans=0.125 2024-09-19 16:42:07,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=704320.0, ans=0.125 2024-09-19 16:42:29,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=704360.0, ans=0.125 2024-09-19 16:42:35,215 INFO [train.py:1198] (1/2) Epoch 39, batch 4150, loss[loss=0.2313, ctc_loss=0.115, cr_loss=0.3457, attn_decoder_loss=0.2365, over 29484.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1138, cr_loss=0.3539, attn_decoder_loss=0.2393, over 5797243.27 frames. ], batch size: 77, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:42:53,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=704440.0, ans=0.2 2024-09-19 16:43:01,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=704440.0, ans=0.125 2024-09-19 16:43:04,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704480.0, ans=0.1 2024-09-19 16:43:22,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=704520.0, ans=0.2 2024-09-19 16:43:29,356 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.507e+01 9.056e+01 9.500e+01 2.477e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-19 16:43:31,252 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:43:32,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=704560.0, ans=0.09899494936611666 2024-09-19 16:43:42,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=704560.0, ans=0.05 2024-09-19 16:43:48,513 INFO [train.py:1198] (1/2) Epoch 39, batch 4200, loss[loss=0.2599, ctc_loss=0.134, cr_loss=0.4147, attn_decoder_loss=0.2647, over 29501.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1137, cr_loss=0.3536, attn_decoder_loss=0.2395, over 5799543.84 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:43:59,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=12.0 2024-09-19 16:44:04,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.61 vs. limit=22.5 2024-09-19 16:44:09,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=704640.0, ans=0.125 2024-09-19 16:44:20,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=704680.0, ans=0.0 2024-09-19 16:44:35,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=704720.0, ans=0.125 2024-09-19 16:44:47,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=704760.0, ans=0.125 2024-09-19 16:45:03,371 INFO [train.py:1198] (1/2) Epoch 39, batch 4250, loss[loss=0.2214, ctc_loss=0.1007, cr_loss=0.3284, attn_decoder_loss=0.2275, over 29505.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1132, cr_loss=0.3528, attn_decoder_loss=0.2393, over 5805570.47 frames. ], batch size: 74, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:45:06,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=704800.0, ans=0.025 2024-09-19 16:45:16,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=704840.0, ans=0.125 2024-09-19 16:45:28,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=704840.0, ans=0.0 2024-09-19 16:45:38,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=704880.0, ans=0.125 2024-09-19 16:45:53,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=704920.0, ans=0.02 2024-09-19 16:45:55,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.78 vs. limit=12.0 2024-09-19 16:45:56,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=704920.0, ans=0.09899494936611666 2024-09-19 16:45:57,876 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.522e+01 9.039e+01 9.490e+01 2.336e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 16:46:05,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=704960.0, ans=0.2 2024-09-19 16:46:17,663 INFO [train.py:1198] (1/2) Epoch 39, batch 4300, loss[loss=0.2379, ctc_loss=0.105, cr_loss=0.3422, attn_decoder_loss=0.245, over 29545.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1132, cr_loss=0.3526, attn_decoder_loss=0.2396, over 5795464.63 frames. ], batch size: 87, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:46:31,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=705040.0, ans=0.0 2024-09-19 16:46:49,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705080.0, ans=0.1 2024-09-19 16:46:59,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=705080.0, ans=0.125 2024-09-19 16:47:16,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-09-19 16:47:26,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705160.0, ans=0.1 2024-09-19 16:47:26,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=705160.0, ans=0.125 2024-09-19 16:47:32,376 INFO [train.py:1198] (1/2) Epoch 39, batch 4350, loss[loss=0.2437, ctc_loss=0.1205, cr_loss=0.345, attn_decoder_loss=0.2497, over 29438.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1157, cr_loss=0.3582, attn_decoder_loss=0.2427, over 5797539.57 frames. ], batch size: 97, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:48:04,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=705280.0, ans=0.125 2024-09-19 16:48:06,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=705280.0, ans=0.125 2024-09-19 16:48:17,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=705320.0, ans=0.125 2024-09-19 16:48:27,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 8.862e+01 9.354e+01 9.777e+01 1.379e+02, threshold=1.871e+02, percent-clipped=0.0 2024-09-19 16:48:41,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=705360.0, ans=0.2 2024-09-19 16:48:45,016 INFO [train.py:1198] (1/2) Epoch 39, batch 4400, loss[loss=0.2451, ctc_loss=0.1222, cr_loss=0.3787, attn_decoder_loss=0.2504, over 27281.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1168, cr_loss=0.3603, attn_decoder_loss=0.2446, over 5766311.55 frames. ], batch size: 124, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:49:20,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2024-09-19 16:49:31,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=12.0 2024-09-19 16:49:34,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=705520.0, ans=0.125 2024-09-19 16:49:38,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=705520.0, ans=0.025 2024-09-19 16:49:54,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2024-09-19 16:50:00,204 INFO [train.py:1198] (1/2) Epoch 39, batch 4450, loss[loss=0.2559, ctc_loss=0.1444, cr_loss=0.3996, attn_decoder_loss=0.2594, over 19523.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1203, cr_loss=0.3658, attn_decoder_loss=0.2466, over 5571792.21 frames. ], batch size: 209, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:50:20,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=705640.0, ans=0.0 2024-09-19 16:50:24,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=705640.0, ans=0.125 2024-09-19 16:50:25,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-09-19 16:50:30,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=705680.0, ans=0.0 2024-09-19 16:50:42,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2024-09-19 16:50:45,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=705720.0, ans=0.125 2024-09-19 16:50:47,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=705720.0, ans=0.0 2024-09-19 16:50:48,736 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:50:58,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.301e+01 9.418e+01 1.029e+02 1.185e+02 3.823e+02, threshold=2.058e+02, percent-clipped=1.0 2024-09-19 16:51:15,116 INFO [train.py:1198] (1/2) Epoch 39, batch 4500, loss[loss=0.2541, ctc_loss=0.1384, cr_loss=0.3787, attn_decoder_loss=0.2586, over 20059.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1234, cr_loss=0.3684, attn_decoder_loss=0.2486, over 5233921.96 frames. ], batch size: 209, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:51:24,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=705800.0, ans=0.0 2024-09-19 16:51:39,257 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:52:42,177 INFO [train.py:1198] (1/2) Epoch 40, batch 0, loss[loss=0.2158, ctc_loss=0.09235, cr_loss=0.3223, attn_decoder_loss=0.2223, over 29618.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.09235, cr_loss=0.3223, attn_decoder_loss=0.2223, over 29618.00 frames. ], batch size: 73, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 16:52:42,177 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 16:53:00,472 INFO [train.py:1230] (1/2) Epoch 40, validation: loss=0.2128, ctc_loss=0.03605, cr_loss=6.84e-15, attn_decoder_loss=0.2324, over 944034.00 frames. 2024-09-19 16:53:00,473 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 16:53:27,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=705940.0, ans=0.2 2024-09-19 16:53:35,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=705980.0, ans=0.1 2024-09-19 16:53:38,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=705980.0, ans=0.07 2024-09-19 16:53:51,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=706020.0, ans=0.2 2024-09-19 16:53:56,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706020.0, ans=0.1 2024-09-19 16:53:57,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2024-09-19 16:54:12,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=706060.0, ans=0.0 2024-09-19 16:54:17,771 INFO [train.py:1198] (1/2) Epoch 40, batch 50, loss[loss=0.2125, ctc_loss=0.102, cr_loss=0.3246, attn_decoder_loss=0.2175, over 29429.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1153, cr_loss=0.3574, attn_decoder_loss=0.2409, over 1267482.11 frames. ], batch size: 70, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:54:24,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.52 vs. limit=10.0 2024-09-19 16:54:42,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.881e+01 9.876e+01 1.118e+02 1.337e+02, threshold=1.975e+02, percent-clipped=0.0 2024-09-19 16:55:04,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=706220.0, ans=0.2 2024-09-19 16:55:06,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.21 vs. limit=22.5 2024-09-19 16:55:07,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=15.0 2024-09-19 16:55:16,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=706220.0, ans=0.2 2024-09-19 16:55:23,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=706260.0, ans=0.125 2024-09-19 16:55:26,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=706260.0, ans=0.125 2024-09-19 16:55:32,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=706260.0, ans=0.95 2024-09-19 16:55:35,520 INFO [train.py:1198] (1/2) Epoch 40, batch 100, loss[loss=0.2258, ctc_loss=0.1131, cr_loss=0.3485, attn_decoder_loss=0.2306, over 29543.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1166, cr_loss=0.3609, attn_decoder_loss=0.2425, over 2252855.82 frames. ], batch size: 76, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:55:56,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=706340.0, ans=0.125 2024-09-19 16:55:59,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=706340.0, ans=0.125 2024-09-19 16:55:59,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=706340.0, ans=0.1 2024-09-19 16:56:22,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=706420.0, ans=0.0 2024-09-19 16:56:25,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=706420.0, ans=0.125 2024-09-19 16:56:28,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=706420.0, ans=0.125 2024-09-19 16:56:30,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=22.5 2024-09-19 16:56:34,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=706460.0, ans=0.035 2024-09-19 16:56:37,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=706460.0, ans=0.0 2024-09-19 16:56:40,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-09-19 16:56:50,194 INFO [train.py:1198] (1/2) Epoch 40, batch 150, loss[loss=0.2078, ctc_loss=0.09043, cr_loss=0.3031, attn_decoder_loss=0.2141, over 29457.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1143, cr_loss=0.356, attn_decoder_loss=0.2404, over 3047184.18 frames. ], batch size: 70, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:57:02,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=706500.0, ans=0.125 2024-09-19 16:57:11,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=706540.0, ans=0.125 2024-09-19 16:57:12,855 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.727e+01 9.012e+01 9.533e+01 1.739e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 16:57:35,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.69 vs. limit=15.0 2024-09-19 16:57:36,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=706620.0, ans=0.025 2024-09-19 16:57:44,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=706620.0, ans=0.0 2024-09-19 16:57:55,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706660.0, ans=0.1 2024-09-19 16:58:05,147 INFO [train.py:1198] (1/2) Epoch 40, batch 200, loss[loss=0.2465, ctc_loss=0.1245, cr_loss=0.3847, attn_decoder_loss=0.2515, over 27156.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1134, cr_loss=0.3547, attn_decoder_loss=0.2394, over 3658886.09 frames. ], batch size: 124, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:58:05,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=706700.0, ans=0.2 2024-09-19 16:58:08,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=706700.0, ans=0.0 2024-09-19 16:58:23,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=706740.0, ans=0.2 2024-09-19 16:58:45,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2024-09-19 16:59:04,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=706820.0, ans=0.0 2024-09-19 16:59:13,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=706860.0, ans=0.2 2024-09-19 16:59:21,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=706860.0, ans=0.0 2024-09-19 16:59:25,387 INFO [train.py:1198] (1/2) Epoch 40, batch 250, loss[loss=0.2434, ctc_loss=0.1168, cr_loss=0.3588, attn_decoder_loss=0.2495, over 29261.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1129, cr_loss=0.3534, attn_decoder_loss=0.2391, over 4141166.67 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:59:29,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.28 vs. limit=10.0 2024-09-19 16:59:31,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=706900.0, ans=0.2 2024-09-19 16:59:40,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=706940.0, ans=0.1 2024-09-19 16:59:47,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.510e+01 9.023e+01 9.427e+01 1.559e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 16:59:52,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=706940.0, ans=0.0 2024-09-19 17:00:03,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=706980.0, ans=0.0 2024-09-19 17:00:18,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=707020.0, ans=0.125 2024-09-19 17:00:40,604 INFO [train.py:1198] (1/2) Epoch 40, batch 300, loss[loss=0.2499, ctc_loss=0.1211, cr_loss=0.3833, attn_decoder_loss=0.2557, over 29540.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1126, cr_loss=0.3523, attn_decoder_loss=0.2386, over 4508875.05 frames. ], batch size: 92, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:00:58,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-09-19 17:00:59,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=707140.0, ans=0.5 2024-09-19 17:01:15,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=707180.0, ans=0.0 2024-09-19 17:01:17,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=707180.0, ans=0.0 2024-09-19 17:01:25,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=707220.0, ans=0.2 2024-09-19 17:01:28,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=707220.0, ans=0.2 2024-09-19 17:01:38,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=707220.0, ans=0.125 2024-09-19 17:01:41,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=707260.0, ans=0.125 2024-09-19 17:01:55,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=707300.0, ans=0.0 2024-09-19 17:01:56,854 INFO [train.py:1198] (1/2) Epoch 40, batch 350, loss[loss=0.2093, ctc_loss=0.09134, cr_loss=0.3015, attn_decoder_loss=0.2157, over 29351.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1131, cr_loss=0.3529, attn_decoder_loss=0.2391, over 4794398.47 frames. ], batch size: 71, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:02:21,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.445e+01 8.881e+01 9.307e+01 1.282e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-19 17:02:26,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=707340.0, ans=0.1 2024-09-19 17:02:43,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=707420.0, ans=0.2 2024-09-19 17:02:50,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=707420.0, ans=0.125 2024-09-19 17:02:56,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2024-09-19 17:02:56,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=707420.0, ans=0.125 2024-09-19 17:02:58,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=707460.0, ans=0.125 2024-09-19 17:03:04,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=707460.0, ans=0.2 2024-09-19 17:03:13,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=707500.0, ans=0.125 2024-09-19 17:03:14,404 INFO [train.py:1198] (1/2) Epoch 40, batch 400, loss[loss=0.2523, ctc_loss=0.1315, cr_loss=0.3981, attn_decoder_loss=0.2569, over 29689.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.113, cr_loss=0.3528, attn_decoder_loss=0.239, over 5023608.94 frames. ], batch size: 82, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:03:48,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=707580.0, ans=0.125 2024-09-19 17:03:56,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2024-09-19 17:03:58,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707620.0, ans=0.1 2024-09-19 17:04:04,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=707620.0, ans=0.025 2024-09-19 17:04:30,150 INFO [train.py:1198] (1/2) Epoch 40, batch 450, loss[loss=0.2469, ctc_loss=0.1245, cr_loss=0.3762, attn_decoder_loss=0.2521, over 29699.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1131, cr_loss=0.3527, attn_decoder_loss=0.2391, over 5186687.82 frames. ], batch size: 83, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:04:52,851 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.522e+01 8.945e+01 9.353e+01 2.975e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-19 17:05:02,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=707780.0, ans=0.125 2024-09-19 17:05:03,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=707780.0, ans=0.1 2024-09-19 17:05:45,818 INFO [train.py:1198] (1/2) Epoch 40, batch 500, loss[loss=0.2568, ctc_loss=0.1301, cr_loss=0.3898, attn_decoder_loss=0.2622, over 29408.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1123, cr_loss=0.3515, attn_decoder_loss=0.2383, over 5330296.94 frames. ], batch size: 94, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:06:03,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=707940.0, ans=0.125 2024-09-19 17:06:14,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=707940.0, ans=0.125 2024-09-19 17:06:16,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=707940.0, ans=0.025 2024-09-19 17:06:47,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=708020.0, ans=0.125 2024-09-19 17:07:06,550 INFO [train.py:1198] (1/2) Epoch 40, batch 550, loss[loss=0.2486, ctc_loss=0.1222, cr_loss=0.355, attn_decoder_loss=0.2548, over 28813.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1123, cr_loss=0.3514, attn_decoder_loss=0.2384, over 5422224.83 frames. ], batch size: 104, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:07:17,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=708100.0, ans=0.0 2024-09-19 17:07:22,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=708140.0, ans=0.2 2024-09-19 17:07:26,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708140.0, ans=0.1 2024-09-19 17:07:30,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.557e+01 8.930e+01 9.623e+01 2.134e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-19 17:07:58,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=708220.0, ans=0.125 2024-09-19 17:08:08,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=708260.0, ans=0.125 2024-09-19 17:08:11,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=708260.0, ans=0.0 2024-09-19 17:08:14,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=708260.0, ans=0.0 2024-09-19 17:08:17,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=708260.0, ans=0.125 2024-09-19 17:08:22,290 INFO [train.py:1198] (1/2) Epoch 40, batch 600, loss[loss=0.242, ctc_loss=0.1197, cr_loss=0.3726, attn_decoder_loss=0.2474, over 29310.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1124, cr_loss=0.3522, attn_decoder_loss=0.2385, over 5507515.68 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:08:27,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=708300.0, ans=0.07 2024-09-19 17:08:33,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=708300.0, ans=0.0 2024-09-19 17:08:41,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2024-09-19 17:08:58,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=708380.0, ans=0.125 2024-09-19 17:09:13,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=708420.0, ans=0.125 2024-09-19 17:09:13,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-09-19 17:09:25,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=708460.0, ans=0.2 2024-09-19 17:09:37,339 INFO [train.py:1198] (1/2) Epoch 40, batch 650, loss[loss=0.2311, ctc_loss=0.1136, cr_loss=0.3682, attn_decoder_loss=0.236, over 29776.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1119, cr_loss=0.3514, attn_decoder_loss=0.238, over 5585257.19 frames. ], batch size: 81, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:10:03,863 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.522e+01 8.894e+01 9.367e+01 2.518e+02, threshold=1.779e+02, percent-clipped=2.0 2024-09-19 17:10:50,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=12.0 2024-09-19 17:10:57,354 INFO [train.py:1198] (1/2) Epoch 40, batch 700, loss[loss=0.2137, ctc_loss=0.0987, cr_loss=0.3082, attn_decoder_loss=0.2196, over 29565.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1124, cr_loss=0.3526, attn_decoder_loss=0.2385, over 5635604.78 frames. ], batch size: 76, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:11:05,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=708700.0, ans=0.0 2024-09-19 17:11:09,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=708700.0, ans=0.2 2024-09-19 17:11:49,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=708820.0, ans=0.0 2024-09-19 17:12:07,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=708860.0, ans=0.1 2024-09-19 17:12:13,436 INFO [train.py:1198] (1/2) Epoch 40, batch 750, loss[loss=0.2354, ctc_loss=0.1099, cr_loss=0.3393, attn_decoder_loss=0.2418, over 29710.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1126, cr_loss=0.353, attn_decoder_loss=0.2385, over 5674893.38 frames. ], batch size: 82, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:12:15,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=708900.0, ans=0.0 2024-09-19 17:12:25,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=708900.0, ans=10.0 2024-09-19 17:12:28,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=708940.0, ans=0.2 2024-09-19 17:12:37,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.374e+01 9.046e+01 9.655e+01 1.904e+02, threshold=1.809e+02, percent-clipped=1.0 2024-09-19 17:12:59,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=709020.0, ans=0.125 2024-09-19 17:13:13,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-19 17:13:21,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=709060.0, ans=0.125 2024-09-19 17:13:23,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=709060.0, ans=0.125 2024-09-19 17:13:26,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=709060.0, ans=0.125 2024-09-19 17:13:27,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=709100.0, ans=0.0 2024-09-19 17:13:28,941 INFO [train.py:1198] (1/2) Epoch 40, batch 800, loss[loss=0.2105, ctc_loss=0.09718, cr_loss=0.313, attn_decoder_loss=0.2162, over 29614.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1124, cr_loss=0.3525, attn_decoder_loss=0.2383, over 5707538.68 frames. ], batch size: 73, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:13:42,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=709140.0, ans=0.2 2024-09-19 17:14:04,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709180.0, ans=0.1 2024-09-19 17:14:22,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=709220.0, ans=0.2 2024-09-19 17:14:29,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=709220.0, ans=0.2 2024-09-19 17:14:29,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=709220.0, ans=0.0 2024-09-19 17:14:48,728 INFO [train.py:1198] (1/2) Epoch 40, batch 850, loss[loss=0.2396, ctc_loss=0.1121, cr_loss=0.3581, attn_decoder_loss=0.2458, over 29694.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1121, cr_loss=0.3516, attn_decoder_loss=0.2382, over 5737430.86 frames. ], batch size: 89, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:14:50,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=22.5 2024-09-19 17:14:57,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=709300.0, ans=0.125 2024-09-19 17:15:05,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=709340.0, ans=0.05 2024-09-19 17:15:12,634 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.469e+01 8.929e+01 9.566e+01 2.198e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-19 17:16:01,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=709460.0, ans=0.125 2024-09-19 17:16:03,933 INFO [train.py:1198] (1/2) Epoch 40, batch 900, loss[loss=0.208, ctc_loss=0.09007, cr_loss=0.2932, attn_decoder_loss=0.2146, over 29575.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1125, cr_loss=0.3526, attn_decoder_loss=0.2384, over 5741826.40 frames. ], batch size: 73, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:16:53,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=12.0 2024-09-19 17:17:00,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=709620.0, ans=0.0 2024-09-19 17:17:04,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=709660.0, ans=0.125 2024-09-19 17:17:19,238 INFO [train.py:1198] (1/2) Epoch 40, batch 950, loss[loss=0.221, ctc_loss=0.09635, cr_loss=0.3217, attn_decoder_loss=0.2278, over 29504.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1125, cr_loss=0.3527, attn_decoder_loss=0.2384, over 5743174.66 frames. ], batch size: 74, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:17:28,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=709700.0, ans=10.0 2024-09-19 17:17:32,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=709740.0, ans=0.125 2024-09-19 17:17:45,396 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.548e+01 9.083e+01 9.830e+01 2.215e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-19 17:18:19,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=709820.0, ans=0.025 2024-09-19 17:18:25,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=709860.0, ans=0.5 2024-09-19 17:18:29,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=709860.0, ans=0.0 2024-09-19 17:18:39,001 INFO [train.py:1198] (1/2) Epoch 40, batch 1000, loss[loss=0.2307, ctc_loss=0.1222, cr_loss=0.3691, attn_decoder_loss=0.2345, over 29495.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1132, cr_loss=0.3541, attn_decoder_loss=0.2392, over 5735692.17 frames. ], batch size: 77, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:18:39,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=709900.0, ans=0.125 2024-09-19 17:18:49,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=709900.0, ans=0.125 2024-09-19 17:18:51,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=709900.0, ans=0.125 2024-09-19 17:18:51,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.65 vs. limit=10.0 2024-09-19 17:19:08,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=709980.0, ans=0.125 2024-09-19 17:19:20,108 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:19:32,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=710020.0, ans=0.125 2024-09-19 17:19:40,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710060.0, ans=0.1 2024-09-19 17:19:49,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=710060.0, ans=0.125 2024-09-19 17:19:51,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=710060.0, ans=0.125 2024-09-19 17:19:51,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=710060.0, ans=0.0 2024-09-19 17:19:51,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2024-09-19 17:19:52,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=710100.0, ans=0.125 2024-09-19 17:19:54,090 INFO [train.py:1198] (1/2) Epoch 40, batch 1050, loss[loss=0.2397, ctc_loss=0.12, cr_loss=0.3826, attn_decoder_loss=0.2445, over 29666.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1131, cr_loss=0.354, attn_decoder_loss=0.2387, over 5743505.14 frames. ], batch size: 85, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:20:15,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=710140.0, ans=0.125 2024-09-19 17:20:20,063 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.590e+01 9.048e+01 9.519e+01 1.628e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 17:20:21,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=710140.0, ans=0.0 2024-09-19 17:20:32,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=710180.0, ans=0.0 2024-09-19 17:21:09,833 INFO [train.py:1198] (1/2) Epoch 40, batch 1100, loss[loss=0.231, ctc_loss=0.1112, cr_loss=0.3427, attn_decoder_loss=0.2367, over 29433.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1127, cr_loss=0.3531, attn_decoder_loss=0.2387, over 5755235.93 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:21:20,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710300.0, ans=0.1 2024-09-19 17:21:26,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710340.0, ans=0.1 2024-09-19 17:22:10,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=710420.0, ans=0.125 2024-09-19 17:22:24,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=710460.0, ans=0.0 2024-09-19 17:22:30,002 INFO [train.py:1198] (1/2) Epoch 40, batch 1150, loss[loss=0.2264, ctc_loss=0.1099, cr_loss=0.3528, attn_decoder_loss=0.2315, over 29426.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1128, cr_loss=0.3527, attn_decoder_loss=0.2386, over 5753605.23 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:22:42,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=710500.0, ans=0.125 2024-09-19 17:22:51,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=710540.0, ans=0.0 2024-09-19 17:22:55,728 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.424e+01 8.898e+01 9.617e+01 1.555e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-19 17:23:17,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=710620.0, ans=0.0 2024-09-19 17:23:17,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=710620.0, ans=0.125 2024-09-19 17:23:26,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=710620.0, ans=0.125 2024-09-19 17:23:27,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=710620.0, ans=0.0 2024-09-19 17:23:35,105 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:23:37,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.82 vs. limit=10.0 2024-09-19 17:23:45,222 INFO [train.py:1198] (1/2) Epoch 40, batch 1200, loss[loss=0.2392, ctc_loss=0.1075, cr_loss=0.3473, attn_decoder_loss=0.2461, over 29657.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.113, cr_loss=0.3533, attn_decoder_loss=0.2393, over 5746527.51 frames. ], batch size: 85, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:24:26,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=710780.0, ans=0.0 2024-09-19 17:24:36,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-19 17:24:42,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-09-19 17:24:42,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=710820.0, ans=15.0 2024-09-19 17:24:47,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=710860.0, ans=0.0 2024-09-19 17:25:01,045 INFO [train.py:1198] (1/2) Epoch 40, batch 1250, loss[loss=0.2471, ctc_loss=0.1201, cr_loss=0.364, attn_decoder_loss=0.2531, over 29511.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1133, cr_loss=0.3544, attn_decoder_loss=0.2398, over 5774400.44 frames. ], batch size: 92, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:25:01,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=710900.0, ans=0.125 2024-09-19 17:25:07,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710900.0, ans=0.1 2024-09-19 17:25:08,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=710900.0, ans=0.0 2024-09-19 17:25:21,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=710940.0, ans=0.1 2024-09-19 17:25:29,035 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 8.708e+01 9.133e+01 9.581e+01 1.854e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-19 17:25:50,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=711020.0, ans=0.125 2024-09-19 17:26:13,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2024-09-19 17:26:21,603 INFO [train.py:1198] (1/2) Epoch 40, batch 1300, loss[loss=0.2437, ctc_loss=0.1098, cr_loss=0.3567, attn_decoder_loss=0.2507, over 28294.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1132, cr_loss=0.3544, attn_decoder_loss=0.2392, over 5778883.75 frames. ], batch size: 111, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:26:25,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=711100.0, ans=0.2 2024-09-19 17:26:28,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=711100.0, ans=15.0 2024-09-19 17:26:55,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=711180.0, ans=0.0 2024-09-19 17:26:57,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=711180.0, ans=0.0 2024-09-19 17:26:59,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-19 17:27:11,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=711220.0, ans=0.2 2024-09-19 17:27:37,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.77 vs. limit=22.5 2024-09-19 17:27:38,126 INFO [train.py:1198] (1/2) Epoch 40, batch 1350, loss[loss=0.235, ctc_loss=0.1084, cr_loss=0.3301, attn_decoder_loss=0.2417, over 29761.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1126, cr_loss=0.3529, attn_decoder_loss=0.239, over 5796107.60 frames. ], batch size: 81, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:27:45,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=711300.0, ans=0.125 2024-09-19 17:27:46,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.96 vs. limit=22.5 2024-09-19 17:27:51,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=711340.0, ans=0.125 2024-09-19 17:27:58,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2024-09-19 17:28:03,655 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.275e+01 9.002e+01 9.355e+01 2.084e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-19 17:28:12,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=711380.0, ans=0.0 2024-09-19 17:28:15,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=711380.0, ans=0.0 2024-09-19 17:28:26,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=711420.0, ans=0.125 2024-09-19 17:28:49,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=711460.0, ans=0.125 2024-09-19 17:28:53,210 INFO [train.py:1198] (1/2) Epoch 40, batch 1400, loss[loss=0.2129, ctc_loss=0.1012, cr_loss=0.3228, attn_decoder_loss=0.2182, over 29575.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1123, cr_loss=0.3522, attn_decoder_loss=0.2386, over 5806872.26 frames. ], batch size: 69, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:28:56,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=711500.0, ans=0.0 2024-09-19 17:29:02,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=711500.0, ans=0.025 2024-09-19 17:29:11,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=711540.0, ans=0.125 2024-09-19 17:29:23,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=711540.0, ans=0.125 2024-09-19 17:29:27,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=711580.0, ans=0.125 2024-09-19 17:29:35,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2024-09-19 17:29:56,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=711660.0, ans=0.2 2024-09-19 17:30:04,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=711660.0, ans=0.0 2024-09-19 17:30:13,142 INFO [train.py:1198] (1/2) Epoch 40, batch 1450, loss[loss=0.23, ctc_loss=0.1054, cr_loss=0.3425, attn_decoder_loss=0.2362, over 29402.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1128, cr_loss=0.3537, attn_decoder_loss=0.2392, over 5803082.13 frames. ], batch size: 94, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:30:38,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.710e+01 9.115e+01 9.620e+01 3.738e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-19 17:30:46,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=711780.0, ans=0.2 2024-09-19 17:30:49,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=711780.0, ans=0.0 2024-09-19 17:31:28,344 INFO [train.py:1198] (1/2) Epoch 40, batch 1500, loss[loss=0.2439, ctc_loss=0.1219, cr_loss=0.3496, attn_decoder_loss=0.2497, over 29622.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.113, cr_loss=0.3537, attn_decoder_loss=0.2396, over 5804829.78 frames. ], batch size: 86, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:31:43,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=711940.0, ans=0.125 2024-09-19 17:31:48,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=711940.0, ans=0.125 2024-09-19 17:32:00,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff2.min_abs, batch_count=711980.0, ans=0.1 2024-09-19 17:32:15,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.65 vs. limit=10.0 2024-09-19 17:32:17,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=712020.0, ans=0.125 2024-09-19 17:32:17,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=712020.0, ans=0.125 2024-09-19 17:32:28,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=712060.0, ans=0.125 2024-09-19 17:32:32,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=712060.0, ans=0.125 2024-09-19 17:32:42,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-09-19 17:32:44,589 INFO [train.py:1198] (1/2) Epoch 40, batch 1550, loss[loss=0.2592, ctc_loss=0.1383, cr_loss=0.413, attn_decoder_loss=0.2634, over 29491.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1134, cr_loss=0.3546, attn_decoder_loss=0.2394, over 5780688.47 frames. ], batch size: 90, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:32:46,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=712100.0, ans=0.025 2024-09-19 17:32:57,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-09-19 17:33:08,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=712140.0, ans=0.0 2024-09-19 17:33:14,041 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.677e+01 9.047e+01 9.758e+01 3.580e+02, threshold=1.809e+02, percent-clipped=1.0 2024-09-19 17:33:27,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=712180.0, ans=0.125 2024-09-19 17:33:46,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=712260.0, ans=0.025 2024-09-19 17:33:55,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=712260.0, ans=0.125 2024-09-19 17:33:57,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=712260.0, ans=0.0 2024-09-19 17:34:04,515 INFO [train.py:1198] (1/2) Epoch 40, batch 1600, loss[loss=0.2407, ctc_loss=0.1142, cr_loss=0.3565, attn_decoder_loss=0.2468, over 29679.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1134, cr_loss=0.354, attn_decoder_loss=0.2393, over 5763728.92 frames. ], batch size: 85, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:34:06,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=712300.0, ans=0.1 2024-09-19 17:34:07,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.52 vs. limit=15.0 2024-09-19 17:34:29,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712340.0, ans=0.1 2024-09-19 17:34:49,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=712420.0, ans=0.1 2024-09-19 17:35:06,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2024-09-19 17:35:13,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=712460.0, ans=0.125 2024-09-19 17:35:20,309 INFO [train.py:1198] (1/2) Epoch 40, batch 1650, loss[loss=0.2513, ctc_loss=0.1194, cr_loss=0.3703, attn_decoder_loss=0.2577, over 29707.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1128, cr_loss=0.3529, attn_decoder_loss=0.2389, over 5757142.70 frames. ], batch size: 89, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:35:48,737 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.375e+01 9.140e+01 9.741e+01 3.230e+02, threshold=1.828e+02, percent-clipped=2.0 2024-09-19 17:35:49,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=712580.0, ans=0.1 2024-09-19 17:35:55,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=712580.0, ans=0.125 2024-09-19 17:35:59,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=712580.0, ans=0.1 2024-09-19 17:36:25,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=712660.0, ans=0.125 2024-09-19 17:36:35,509 INFO [train.py:1198] (1/2) Epoch 40, batch 1700, loss[loss=0.2103, ctc_loss=0.09984, cr_loss=0.3265, attn_decoder_loss=0.2154, over 29596.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1126, cr_loss=0.3526, attn_decoder_loss=0.2388, over 5778179.56 frames. ], batch size: 69, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:36:43,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=712700.0, ans=0.125 2024-09-19 17:37:08,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.04 vs. limit=15.0 2024-09-19 17:37:31,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.64 vs. limit=10.0 2024-09-19 17:37:33,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.88 vs. limit=10.0 2024-09-19 17:37:53,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=712860.0, ans=0.025 2024-09-19 17:37:53,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.80 vs. limit=12.0 2024-09-19 17:37:55,711 INFO [train.py:1198] (1/2) Epoch 40, batch 1750, loss[loss=0.206, ctc_loss=0.09212, cr_loss=0.3093, attn_decoder_loss=0.2118, over 29345.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1122, cr_loss=0.3513, attn_decoder_loss=0.2383, over 5785729.38 frames. ], batch size: 67, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:38:00,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=712900.0, ans=0.125 2024-09-19 17:38:06,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2024-09-19 17:38:20,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=712940.0, ans=0.125 2024-09-19 17:38:24,561 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.436e+01 8.990e+01 9.570e+01 1.574e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-19 17:38:33,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=712980.0, ans=0.0 2024-09-19 17:38:35,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=712980.0, ans=0.125 2024-09-19 17:38:39,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=713020.0, ans=0.125 2024-09-19 17:38:53,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-09-19 17:38:54,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2024-09-19 17:39:10,862 INFO [train.py:1198] (1/2) Epoch 40, batch 1800, loss[loss=0.2516, ctc_loss=0.1264, cr_loss=0.392, attn_decoder_loss=0.2568, over 29690.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1126, cr_loss=0.3529, attn_decoder_loss=0.2387, over 5788913.42 frames. ], batch size: 83, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:39:37,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=713140.0, ans=0.125 2024-09-19 17:39:41,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=713180.0, ans=0.125 2024-09-19 17:39:41,949 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.12 vs. limit=22.5 2024-09-19 17:39:47,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=713180.0, ans=0.0 2024-09-19 17:39:52,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.04 vs. limit=15.0 2024-09-19 17:39:54,978 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:40:06,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=713220.0, ans=0.2 2024-09-19 17:40:06,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=713220.0, ans=0.125 2024-09-19 17:40:12,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.15 vs. limit=15.0 2024-09-19 17:40:26,411 INFO [train.py:1198] (1/2) Epoch 40, batch 1850, loss[loss=0.2396, ctc_loss=0.108, cr_loss=0.3351, attn_decoder_loss=0.2468, over 29623.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1124, cr_loss=0.3525, attn_decoder_loss=0.2384, over 5795752.21 frames. ], batch size: 86, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:40:57,146 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.615e+01 9.088e+01 9.758e+01 2.205e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-19 17:41:14,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2024-09-19 17:41:27,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=713460.0, ans=0.0 2024-09-19 17:41:43,532 INFO [train.py:1198] (1/2) Epoch 40, batch 1900, loss[loss=0.2461, ctc_loss=0.1228, cr_loss=0.372, attn_decoder_loss=0.2516, over 29692.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1125, cr_loss=0.3526, attn_decoder_loss=0.239, over 5803091.11 frames. ], batch size: 89, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:42:13,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=713540.0, ans=0.125 2024-09-19 17:42:40,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=713620.0, ans=0.125 2024-09-19 17:42:42,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=713620.0, ans=0.125 2024-09-19 17:42:55,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=713660.0, ans=0.125 2024-09-19 17:43:01,613 INFO [train.py:1198] (1/2) Epoch 40, batch 1950, loss[loss=0.2234, ctc_loss=0.1043, cr_loss=0.3423, attn_decoder_loss=0.2291, over 29443.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1131, cr_loss=0.3539, attn_decoder_loss=0.24, over 5818129.96 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:43:06,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=713700.0, ans=0.125 2024-09-19 17:43:18,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=713740.0, ans=0.125 2024-09-19 17:43:24,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=713740.0, ans=0.125 2024-09-19 17:43:30,166 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.694e+01 9.094e+01 9.637e+01 1.422e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-19 17:43:33,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=713780.0, ans=0.125 2024-09-19 17:44:08,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-09-19 17:44:13,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.16 vs. limit=10.0 2024-09-19 17:44:16,979 INFO [train.py:1198] (1/2) Epoch 40, batch 2000, loss[loss=0.2117, ctc_loss=0.1009, cr_loss=0.3134, attn_decoder_loss=0.2171, over 29385.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1133, cr_loss=0.3546, attn_decoder_loss=0.2404, over 5796058.72 frames. ], batch size: 67, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:44:17,421 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:44:48,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=713980.0, ans=0.1 2024-09-19 17:45:28,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.86 vs. limit=10.0 2024-09-19 17:45:34,776 INFO [train.py:1198] (1/2) Epoch 40, batch 2050, loss[loss=0.2117, ctc_loss=0.09592, cr_loss=0.3183, attn_decoder_loss=0.2174, over 29445.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1129, cr_loss=0.3537, attn_decoder_loss=0.2396, over 5788358.10 frames. ], batch size: 70, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:45:36,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=714100.0, ans=0.2 2024-09-19 17:45:49,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714100.0, ans=0.1 2024-09-19 17:45:51,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=714140.0, ans=0.1 2024-09-19 17:45:55,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=714140.0, ans=0.07 2024-09-19 17:46:05,844 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.392e+01 8.898e+01 9.558e+01 3.245e+02, threshold=1.780e+02, percent-clipped=2.0 2024-09-19 17:46:07,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.57 vs. limit=15.0 2024-09-19 17:46:21,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=714220.0, ans=0.025 2024-09-19 17:46:24,433 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:46:34,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=714220.0, ans=0.125 2024-09-19 17:46:52,700 INFO [train.py:1198] (1/2) Epoch 40, batch 2100, loss[loss=0.2304, ctc_loss=0.1045, cr_loss=0.3452, attn_decoder_loss=0.2367, over 29777.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1124, cr_loss=0.3526, attn_decoder_loss=0.239, over 5799740.25 frames. ], batch size: 81, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:46:55,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=714300.0, ans=0.125 2024-09-19 17:47:04,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=714300.0, ans=0.0 2024-09-19 17:47:52,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=714460.0, ans=10.0 2024-09-19 17:48:07,619 INFO [train.py:1198] (1/2) Epoch 40, batch 2150, loss[loss=0.2294, ctc_loss=0.1101, cr_loss=0.3387, attn_decoder_loss=0.2351, over 29465.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1119, cr_loss=0.3516, attn_decoder_loss=0.2385, over 5814879.38 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:48:12,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=714500.0, ans=0.05 2024-09-19 17:48:17,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=714500.0, ans=0.0 2024-09-19 17:48:38,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.587e+01 9.010e+01 9.804e+01 2.260e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-19 17:48:38,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=714580.0, ans=0.025 2024-09-19 17:48:46,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=714580.0, ans=0.125 2024-09-19 17:48:55,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=22.5 2024-09-19 17:49:05,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=714620.0, ans=0.125 2024-09-19 17:49:08,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=714660.0, ans=0.125 2024-09-19 17:49:19,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=714660.0, ans=0.0 2024-09-19 17:49:24,962 INFO [train.py:1198] (1/2) Epoch 40, batch 2200, loss[loss=0.2406, ctc_loss=0.1118, cr_loss=0.3603, attn_decoder_loss=0.2469, over 29620.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1123, cr_loss=0.3523, attn_decoder_loss=0.2387, over 5811124.49 frames. ], batch size: 86, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:49:26,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2024-09-19 17:49:51,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=714740.0, ans=0.125 2024-09-19 17:49:54,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=714740.0, ans=0.04949747468305833 2024-09-19 17:50:15,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=714820.0, ans=0.125 2024-09-19 17:50:26,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=714860.0, ans=0.125 2024-09-19 17:50:42,694 INFO [train.py:1198] (1/2) Epoch 40, batch 2250, loss[loss=0.2396, ctc_loss=0.1131, cr_loss=0.3618, attn_decoder_loss=0.2457, over 29723.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1119, cr_loss=0.3513, attn_decoder_loss=0.2383, over 5810711.74 frames. ], batch size: 82, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 17:51:12,518 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.488e+01 9.052e+01 9.511e+01 5.082e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 17:51:14,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714980.0, ans=0.1 2024-09-19 17:51:21,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2024-09-19 17:51:21,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=714980.0, ans=0.0 2024-09-19 17:51:41,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=715060.0, ans=0.025 2024-09-19 17:51:57,722 INFO [train.py:1198] (1/2) Epoch 40, batch 2300, loss[loss=0.2098, ctc_loss=0.09598, cr_loss=0.309, attn_decoder_loss=0.2156, over 29367.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1115, cr_loss=0.3504, attn_decoder_loss=0.2373, over 5798173.82 frames. ], batch size: 71, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 17:52:17,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=715140.0, ans=0.2 2024-09-19 17:52:45,623 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:53:02,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=715260.0, ans=0.125 2024-09-19 17:53:15,290 INFO [train.py:1198] (1/2) Epoch 40, batch 2350, loss[loss=0.2362, ctc_loss=0.1118, cr_loss=0.3512, attn_decoder_loss=0.2422, over 29685.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1119, cr_loss=0.351, attn_decoder_loss=0.2377, over 5804318.13 frames. ], batch size: 83, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 17:53:17,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=715300.0, ans=0.125 2024-09-19 17:53:21,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=15.0 2024-09-19 17:53:44,633 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:53:47,273 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.558e+01 9.025e+01 9.597e+01 1.404e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 17:53:55,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=715380.0, ans=10.0 2024-09-19 17:54:01,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=715420.0, ans=0.0 2024-09-19 17:54:30,661 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2024-09-19 17:54:32,767 INFO [train.py:1198] (1/2) Epoch 40, batch 2400, loss[loss=0.2292, ctc_loss=0.1111, cr_loss=0.3557, attn_decoder_loss=0.2344, over 29528.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1126, cr_loss=0.3528, attn_decoder_loss=0.2383, over 5808766.01 frames. ], batch size: 76, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:54:45,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=715500.0, ans=0.0 2024-09-19 17:54:54,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=715540.0, ans=0.125 2024-09-19 17:55:12,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715580.0, ans=0.1 2024-09-19 17:55:17,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=715620.0, ans=0.0 2024-09-19 17:55:43,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2024-09-19 17:55:48,049 INFO [train.py:1198] (1/2) Epoch 40, batch 2450, loss[loss=0.239, ctc_loss=0.117, cr_loss=0.3561, attn_decoder_loss=0.2446, over 29733.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1132, cr_loss=0.354, attn_decoder_loss=0.2392, over 5786189.02 frames. ], batch size: 82, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:56:01,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=715740.0, ans=0.125 2024-09-19 17:56:18,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.28 vs. limit=15.0 2024-09-19 17:56:20,254 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.714e+01 9.274e+01 9.862e+01 1.579e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-19 17:56:35,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=715820.0, ans=0.0 2024-09-19 17:56:39,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=715820.0, ans=0.125 2024-09-19 17:57:02,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=715860.0, ans=0.1 2024-09-19 17:57:05,503 INFO [train.py:1198] (1/2) Epoch 40, batch 2500, loss[loss=0.245, ctc_loss=0.1222, cr_loss=0.3765, attn_decoder_loss=0.2503, over 29625.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.113, cr_loss=0.3533, attn_decoder_loss=0.2391, over 5795996.88 frames. ], batch size: 86, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:57:05,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=715900.0, ans=0.2 2024-09-19 17:57:10,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=715900.0, ans=0.1 2024-09-19 17:57:16,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=715900.0, ans=0.0 2024-09-19 17:57:16,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=715900.0, ans=0.125 2024-09-19 17:57:35,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=715940.0, ans=0.125 2024-09-19 17:57:46,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=715980.0, ans=0.125 2024-09-19 17:58:24,137 INFO [train.py:1198] (1/2) Epoch 40, batch 2550, loss[loss=0.2058, ctc_loss=0.09378, cr_loss=0.3031, attn_decoder_loss=0.2115, over 29313.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1129, cr_loss=0.3534, attn_decoder_loss=0.2391, over 5800076.74 frames. ], batch size: 67, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:58:28,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=716100.0, ans=0.125 2024-09-19 17:58:45,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716140.0, ans=0.1 2024-09-19 17:58:53,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.529e+01 8.996e+01 9.557e+01 1.715e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-19 17:59:20,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=716220.0, ans=0.125 2024-09-19 17:59:26,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=716260.0, ans=0.025 2024-09-19 17:59:39,657 INFO [train.py:1198] (1/2) Epoch 40, batch 2600, loss[loss=0.2211, ctc_loss=0.09925, cr_loss=0.318, attn_decoder_loss=0.2275, over 29466.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1125, cr_loss=0.3519, attn_decoder_loss=0.2393, over 5795983.51 frames. ], batch size: 78, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:59:59,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2024-09-19 18:00:08,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=716340.0, ans=0.125 2024-09-19 18:00:12,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2024-09-19 18:00:32,926 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:00:49,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2024-09-19 18:00:50,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=716460.0, ans=0.0 2024-09-19 18:00:53,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=716460.0, ans=0.125 2024-09-19 18:00:56,053 INFO [train.py:1198] (1/2) Epoch 40, batch 2650, loss[loss=0.2565, ctc_loss=0.1349, cr_loss=0.3994, attn_decoder_loss=0.2612, over 29299.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1124, cr_loss=0.3519, attn_decoder_loss=0.2394, over 5802874.28 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 18:01:02,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=716500.0, ans=0.025 2024-09-19 18:01:02,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=716500.0, ans=10.0 2024-09-19 18:01:07,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2024-09-19 18:01:28,225 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.493e+01 9.009e+01 9.595e+01 1.150e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 18:01:42,354 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:02:00,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=716660.0, ans=0.0 2024-09-19 18:02:13,718 INFO [train.py:1198] (1/2) Epoch 40, batch 2700, loss[loss=0.2385, ctc_loss=0.1063, cr_loss=0.3446, attn_decoder_loss=0.2455, over 29551.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.113, cr_loss=0.3532, attn_decoder_loss=0.2399, over 5797843.67 frames. ], batch size: 87, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:02:18,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=716700.0, ans=0.025 2024-09-19 18:02:22,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=716700.0, ans=0.1 2024-09-19 18:02:25,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=716700.0, ans=0.125 2024-09-19 18:02:32,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=716740.0, ans=0.2 2024-09-19 18:02:42,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=716780.0, ans=0.2 2024-09-19 18:02:56,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=716780.0, ans=0.125 2024-09-19 18:03:17,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=716860.0, ans=0.125 2024-09-19 18:03:29,470 INFO [train.py:1198] (1/2) Epoch 40, batch 2750, loss[loss=0.2332, ctc_loss=0.1208, cr_loss=0.3818, attn_decoder_loss=0.2372, over 29527.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1121, cr_loss=0.3516, attn_decoder_loss=0.2386, over 5795522.96 frames. ], batch size: 75, lr: 2.75e-03, grad_scale: 4.0 2024-09-19 18:03:35,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=716900.0, ans=0.025 2024-09-19 18:03:40,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=716900.0, ans=0.125 2024-09-19 18:03:53,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=716940.0, ans=0.125 2024-09-19 18:03:55,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.43 vs. limit=15.0 2024-09-19 18:04:04,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.418e+01 8.972e+01 9.467e+01 1.420e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 18:04:07,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=716980.0, ans=0.0 2024-09-19 18:04:09,472 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:04:18,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=717020.0, ans=0.125 2024-09-19 18:04:20,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=717020.0, ans=0.125 2024-09-19 18:04:30,789 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:04:46,936 INFO [train.py:1198] (1/2) Epoch 40, batch 2800, loss[loss=0.256, ctc_loss=0.1522, cr_loss=0.4175, attn_decoder_loss=0.2582, over 20082.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1124, cr_loss=0.3522, attn_decoder_loss=0.2387, over 5776185.13 frames. ], batch size: 210, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:04:52,001 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-09-19 18:04:53,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.66 vs. limit=15.0 2024-09-19 18:04:58,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=717100.0, ans=15.0 2024-09-19 18:05:02,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=717140.0, ans=0.125 2024-09-19 18:05:04,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-09-19 18:05:12,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2024-09-19 18:05:15,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=717180.0, ans=0.125 2024-09-19 18:05:22,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=717180.0, ans=0.0 2024-09-19 18:05:37,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=717220.0, ans=0.125 2024-09-19 18:05:37,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=717220.0, ans=0.04949747468305833 2024-09-19 18:05:58,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=717260.0, ans=0.125 2024-09-19 18:05:59,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=717260.0, ans=0.0 2024-09-19 18:06:03,969 INFO [train.py:1198] (1/2) Epoch 40, batch 2850, loss[loss=0.2254, ctc_loss=0.1042, cr_loss=0.327, attn_decoder_loss=0.2316, over 29515.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1128, cr_loss=0.3531, attn_decoder_loss=0.2392, over 5761635.72 frames. ], batch size: 77, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:06:33,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=717380.0, ans=0.125 2024-09-19 18:06:35,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=22.5 2024-09-19 18:06:37,391 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.590e+01 9.012e+01 9.613e+01 1.852e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-19 18:07:15,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=717460.0, ans=0.2 2024-09-19 18:07:19,945 INFO [train.py:1198] (1/2) Epoch 40, batch 2900, loss[loss=0.2332, ctc_loss=0.1088, cr_loss=0.3517, attn_decoder_loss=0.2393, over 29402.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1133, cr_loss=0.3543, attn_decoder_loss=0.2402, over 5787226.19 frames. ], batch size: 79, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:07:21,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=717500.0, ans=0.09899494936611666 2024-09-19 18:08:06,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=717620.0, ans=0.125 2024-09-19 18:08:07,879 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:08:26,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.69 vs. limit=15.0 2024-09-19 18:08:37,798 INFO [train.py:1198] (1/2) Epoch 40, batch 2950, loss[loss=0.2207, ctc_loss=0.1026, cr_loss=0.3242, attn_decoder_loss=0.2266, over 29530.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1124, cr_loss=0.3523, attn_decoder_loss=0.239, over 5781977.83 frames. ], batch size: 75, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:08:48,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=717700.0, ans=0.125 2024-09-19 18:09:01,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.55 vs. limit=12.0 2024-09-19 18:09:11,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.443e+01 9.079e+01 9.666e+01 1.457e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 18:09:27,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=717820.0, ans=0.125 2024-09-19 18:09:29,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.90 vs. limit=22.5 2024-09-19 18:09:36,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=717820.0, ans=0.025 2024-09-19 18:09:46,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.67 vs. limit=10.0 2024-09-19 18:09:53,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=717860.0, ans=0.125 2024-09-19 18:09:56,126 INFO [train.py:1198] (1/2) Epoch 40, batch 3000, loss[loss=0.2363, ctc_loss=0.1112, cr_loss=0.3505, attn_decoder_loss=0.2424, over 29772.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1123, cr_loss=0.3519, attn_decoder_loss=0.2388, over 5782654.10 frames. ], batch size: 81, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:09:56,126 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 18:10:14,431 INFO [train.py:1230] (1/2) Epoch 40, validation: loss=0.2122, ctc_loss=0.03685, cr_loss=5.615e-15, attn_decoder_loss=0.2317, over 944034.00 frames. 2024-09-19 18:10:14,431 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 18:10:24,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=717900.0, ans=0.0 2024-09-19 18:11:00,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718020.0, ans=0.1 2024-09-19 18:11:06,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=718020.0, ans=0.125 2024-09-19 18:11:09,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=718020.0, ans=0.125 2024-09-19 18:11:11,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.45 vs. limit=15.0 2024-09-19 18:11:20,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=718060.0, ans=0.015 2024-09-19 18:11:31,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=718100.0, ans=0.125 2024-09-19 18:11:31,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=718100.0, ans=0.125 2024-09-19 18:11:32,563 INFO [train.py:1198] (1/2) Epoch 40, batch 3050, loss[loss=0.23, ctc_loss=0.1121, cr_loss=0.3614, attn_decoder_loss=0.2351, over 29539.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1132, cr_loss=0.3546, attn_decoder_loss=0.2398, over 5776346.13 frames. ], batch size: 76, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:11:45,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=718100.0, ans=0.1 2024-09-19 18:11:51,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=718140.0, ans=0.1 2024-09-19 18:11:54,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.06 vs. limit=15.0 2024-09-19 18:12:04,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=718180.0, ans=0.05 2024-09-19 18:12:05,626 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.520e+01 9.084e+01 9.934e+01 1.461e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-19 18:12:05,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=718180.0, ans=0.5 2024-09-19 18:12:23,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=718220.0, ans=0.125 2024-09-19 18:12:25,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=718220.0, ans=0.1 2024-09-19 18:12:47,574 INFO [train.py:1198] (1/2) Epoch 40, batch 3100, loss[loss=0.2512, ctc_loss=0.1265, cr_loss=0.3975, attn_decoder_loss=0.2562, over 29245.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1131, cr_loss=0.354, attn_decoder_loss=0.2395, over 5776580.67 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:12:59,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=718300.0, ans=0.125 2024-09-19 18:13:09,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=718340.0, ans=0.125 2024-09-19 18:13:12,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=718340.0, ans=0.125 2024-09-19 18:13:14,222 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:13:26,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.48 vs. limit=15.0 2024-09-19 18:13:30,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2024-09-19 18:13:42,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=718420.0, ans=0.1 2024-09-19 18:13:46,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=718420.0, ans=0.0 2024-09-19 18:13:53,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=718460.0, ans=0.125 2024-09-19 18:13:58,700 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:13:59,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-09-19 18:14:02,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.24 vs. limit=15.0 2024-09-19 18:14:03,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=718460.0, ans=0.05 2024-09-19 18:14:04,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=718500.0, ans=0.125 2024-09-19 18:14:06,030 INFO [train.py:1198] (1/2) Epoch 40, batch 3150, loss[loss=0.2492, ctc_loss=0.1126, cr_loss=0.3395, attn_decoder_loss=0.2568, over 28836.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1128, cr_loss=0.3532, attn_decoder_loss=0.2395, over 5782140.03 frames. ], batch size: 104, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:14:13,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=718500.0, ans=0.0 2024-09-19 18:14:18,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=718500.0, ans=0.125 2024-09-19 18:14:24,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718540.0, ans=0.1 2024-09-19 18:14:26,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2024-09-19 18:14:32,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=718540.0, ans=0.125 2024-09-19 18:14:39,242 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.506e+01 9.197e+01 9.540e+01 2.562e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-19 18:14:45,617 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:15:23,520 INFO [train.py:1198] (1/2) Epoch 40, batch 3200, loss[loss=0.2349, ctc_loss=0.1143, cr_loss=0.354, attn_decoder_loss=0.2404, over 29413.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1126, cr_loss=0.353, attn_decoder_loss=0.2388, over 5792054.57 frames. ], batch size: 79, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 18:15:25,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=718700.0, ans=0.125 2024-09-19 18:15:28,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=718700.0, ans=0.125 2024-09-19 18:15:51,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=12.0 2024-09-19 18:15:52,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=718780.0, ans=0.125 2024-09-19 18:15:52,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=718780.0, ans=0.2 2024-09-19 18:15:58,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=718780.0, ans=0.125 2024-09-19 18:16:06,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.50 vs. limit=15.0 2024-09-19 18:16:06,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2024-09-19 18:16:38,753 INFO [train.py:1198] (1/2) Epoch 40, batch 3250, loss[loss=0.2425, ctc_loss=0.1104, cr_loss=0.3476, attn_decoder_loss=0.2494, over 29718.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1126, cr_loss=0.3534, attn_decoder_loss=0.2393, over 5799163.80 frames. ], batch size: 84, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 18:17:05,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=718940.0, ans=0.125 2024-09-19 18:17:09,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=718980.0, ans=0.025 2024-09-19 18:17:13,652 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.518e+01 9.005e+01 9.479e+01 1.398e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-19 18:17:35,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-19 18:17:45,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=719060.0, ans=0.125 2024-09-19 18:17:55,808 INFO [train.py:1198] (1/2) Epoch 40, batch 3300, loss[loss=0.248, ctc_loss=0.113, cr_loss=0.3535, attn_decoder_loss=0.2551, over 28295.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1119, cr_loss=0.3518, attn_decoder_loss=0.2384, over 5795426.09 frames. ], batch size: 111, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 18:18:02,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719100.0, ans=0.1 2024-09-19 18:18:02,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.19 vs. limit=10.0 2024-09-19 18:18:13,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=719140.0, ans=10.0 2024-09-19 18:18:30,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=719180.0, ans=0.125 2024-09-19 18:18:40,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=719220.0, ans=0.0 2024-09-19 18:19:05,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2024-09-19 18:19:10,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=719260.0, ans=0.0 2024-09-19 18:19:13,642 INFO [train.py:1198] (1/2) Epoch 40, batch 3350, loss[loss=0.2522, ctc_loss=0.1332, cr_loss=0.4049, attn_decoder_loss=0.2565, over 28914.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.113, cr_loss=0.3536, attn_decoder_loss=0.2392, over 5771024.08 frames. ], batch size: 104, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:19:29,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=719340.0, ans=0.125 2024-09-19 18:19:35,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-09-19 18:19:48,481 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.582e+01 9.036e+01 9.650e+01 6.119e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-19 18:19:57,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=719420.0, ans=0.1 2024-09-19 18:20:12,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=719460.0, ans=0.0 2024-09-19 18:20:14,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=719460.0, ans=0.1 2024-09-19 18:20:15,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=719460.0, ans=0.125 2024-09-19 18:20:22,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=719460.0, ans=0.0 2024-09-19 18:20:29,165 INFO [train.py:1198] (1/2) Epoch 40, batch 3400, loss[loss=0.2111, ctc_loss=0.09669, cr_loss=0.3133, attn_decoder_loss=0.2169, over 29360.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.113, cr_loss=0.3535, attn_decoder_loss=0.2391, over 5764362.26 frames. ], batch size: 67, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:20:29,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=719500.0, ans=0.05 2024-09-19 18:20:49,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=719540.0, ans=0.125 2024-09-19 18:21:11,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=719580.0, ans=0.07 2024-09-19 18:21:29,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=719620.0, ans=0.1 2024-09-19 18:21:30,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=719660.0, ans=0.2 2024-09-19 18:21:32,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=12.0 2024-09-19 18:21:46,909 INFO [train.py:1198] (1/2) Epoch 40, batch 3450, loss[loss=0.2449, ctc_loss=0.1136, cr_loss=0.3423, attn_decoder_loss=0.2519, over 28557.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.113, cr_loss=0.3538, attn_decoder_loss=0.2395, over 5773889.65 frames. ], batch size: 112, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:21:53,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=719700.0, ans=0.0 2024-09-19 18:22:09,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=719740.0, ans=0.125 2024-09-19 18:22:21,307 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.580e+01 9.014e+01 9.618e+01 1.900e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 18:22:23,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=719780.0, ans=0.125 2024-09-19 18:22:48,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=719860.0, ans=0.2 2024-09-19 18:22:51,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=719860.0, ans=0.2 2024-09-19 18:22:54,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=719860.0, ans=0.0 2024-09-19 18:23:04,427 INFO [train.py:1198] (1/2) Epoch 40, batch 3500, loss[loss=0.2075, ctc_loss=0.09319, cr_loss=0.3108, attn_decoder_loss=0.2133, over 29323.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1129, cr_loss=0.3533, attn_decoder_loss=0.2391, over 5776982.04 frames. ], batch size: 71, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:23:10,809 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:23:16,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=719900.0, ans=0.2 2024-09-19 18:23:27,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.21 vs. limit=15.0 2024-09-19 18:23:39,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=719980.0, ans=0.125 2024-09-19 18:23:51,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=719980.0, ans=0.125 2024-09-19 18:23:54,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=719980.0, ans=0.125 2024-09-19 18:24:03,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2024-09-19 18:24:19,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=720060.0, ans=0.125 2024-09-19 18:24:23,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720060.0, ans=0.1 2024-09-19 18:24:26,506 INFO [train.py:1198] (1/2) Epoch 40, batch 3550, loss[loss=0.2393, ctc_loss=0.1141, cr_loss=0.3583, attn_decoder_loss=0.2453, over 29717.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1129, cr_loss=0.3534, attn_decoder_loss=0.2391, over 5784095.44 frames. ], batch size: 89, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:24:29,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=720100.0, ans=0.125 2024-09-19 18:24:32,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=720100.0, ans=0.125 2024-09-19 18:24:39,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.92 vs. limit=15.0 2024-09-19 18:24:44,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=720140.0, ans=0.2 2024-09-19 18:24:58,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=720180.0, ans=0.0 2024-09-19 18:24:59,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=720180.0, ans=0.125 2024-09-19 18:25:00,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.600e+01 9.034e+01 9.634e+01 4.593e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-19 18:25:11,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=720220.0, ans=0.5 2024-09-19 18:25:17,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=720220.0, ans=0.0 2024-09-19 18:25:29,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-09-19 18:25:31,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=720260.0, ans=0.0 2024-09-19 18:25:40,397 INFO [train.py:1198] (1/2) Epoch 40, batch 3600, loss[loss=0.2235, ctc_loss=0.1022, cr_loss=0.3207, attn_decoder_loss=0.2299, over 29493.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1131, cr_loss=0.3537, attn_decoder_loss=0.2393, over 5792581.50 frames. ], batch size: 77, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:25:42,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=720300.0, ans=0.125 2024-09-19 18:25:57,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=720340.0, ans=0.125 2024-09-19 18:26:10,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=720380.0, ans=0.125 2024-09-19 18:26:21,332 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:26:23,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-09-19 18:26:56,388 INFO [train.py:1198] (1/2) Epoch 40, batch 3650, loss[loss=0.2337, ctc_loss=0.1126, cr_loss=0.3413, attn_decoder_loss=0.2395, over 29524.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1126, cr_loss=0.3527, attn_decoder_loss=0.2387, over 5794065.57 frames. ], batch size: 90, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:27:05,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=720500.0, ans=0.125 2024-09-19 18:27:12,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720540.0, ans=0.1 2024-09-19 18:27:30,422 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.608e+01 9.210e+01 9.736e+01 1.315e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-19 18:27:42,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=720620.0, ans=0.125 2024-09-19 18:27:44,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=720620.0, ans=0.09899494936611666 2024-09-19 18:27:50,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.19 vs. limit=15.0 2024-09-19 18:27:53,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=15.0 2024-09-19 18:27:59,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-09-19 18:28:02,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=720660.0, ans=0.025 2024-09-19 18:28:10,693 INFO [train.py:1198] (1/2) Epoch 40, batch 3700, loss[loss=0.2463, ctc_loss=0.1171, cr_loss=0.3519, attn_decoder_loss=0.2529, over 29706.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1127, cr_loss=0.3531, attn_decoder_loss=0.2389, over 5803265.55 frames. ], batch size: 84, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:28:28,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.29 vs. limit=10.0 2024-09-19 18:28:46,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=720780.0, ans=0.07 2024-09-19 18:28:49,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720780.0, ans=0.1 2024-09-19 18:28:49,824 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:29:17,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720860.0, ans=0.1 2024-09-19 18:29:19,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=720860.0, ans=0.125 2024-09-19 18:29:26,573 INFO [train.py:1198] (1/2) Epoch 40, batch 3750, loss[loss=0.2042, ctc_loss=0.0897, cr_loss=0.3018, attn_decoder_loss=0.2102, over 29359.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1126, cr_loss=0.3531, attn_decoder_loss=0.2386, over 5806888.21 frames. ], batch size: 67, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:29:40,312 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:29:44,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720940.0, ans=0.1 2024-09-19 18:29:51,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.77 vs. limit=10.0 2024-09-19 18:29:57,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=720980.0, ans=0.0 2024-09-19 18:30:01,966 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.539e+01 9.071e+01 9.494e+01 1.651e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-19 18:30:10,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2024-09-19 18:30:15,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=721020.0, ans=0.125 2024-09-19 18:30:17,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=721020.0, ans=0.125 2024-09-19 18:30:17,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=721020.0, ans=0.95 2024-09-19 18:30:29,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=721060.0, ans=0.0 2024-09-19 18:30:40,981 INFO [train.py:1198] (1/2) Epoch 40, batch 3800, loss[loss=0.244, ctc_loss=0.119, cr_loss=0.3715, attn_decoder_loss=0.2496, over 29643.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1123, cr_loss=0.3527, attn_decoder_loss=0.2383, over 5797182.08 frames. ], batch size: 86, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:31:03,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=721140.0, ans=0.2 2024-09-19 18:31:05,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=721140.0, ans=0.2 2024-09-19 18:31:09,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721180.0, ans=0.1 2024-09-19 18:31:12,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.37 vs. limit=10.0 2024-09-19 18:31:19,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=721180.0, ans=0.125 2024-09-19 18:31:19,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=721180.0, ans=0.1 2024-09-19 18:31:27,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=721220.0, ans=0.0 2024-09-19 18:31:46,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=721260.0, ans=0.125 2024-09-19 18:31:56,319 INFO [train.py:1198] (1/2) Epoch 40, batch 3850, loss[loss=0.2443, ctc_loss=0.1152, cr_loss=0.3729, attn_decoder_loss=0.2504, over 29235.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1116, cr_loss=0.3513, attn_decoder_loss=0.2381, over 5811796.06 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:31:59,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=721300.0, ans=0.125 2024-09-19 18:32:30,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721380.0, ans=0.0 2024-09-19 18:32:31,695 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.429e+01 8.857e+01 9.400e+01 1.753e+02, threshold=1.771e+02, percent-clipped=0.0 2024-09-19 18:32:44,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.21 vs. limit=22.5 2024-09-19 18:33:10,022 INFO [train.py:1198] (1/2) Epoch 40, batch 3900, loss[loss=0.241, ctc_loss=0.1203, cr_loss=0.3592, attn_decoder_loss=0.2464, over 29604.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1124, cr_loss=0.3528, attn_decoder_loss=0.2388, over 5815789.01 frames. ], batch size: 86, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:33:24,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=721540.0, ans=0.125 2024-09-19 18:33:26,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=721540.0, ans=0.125 2024-09-19 18:33:44,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=721580.0, ans=0.0 2024-09-19 18:33:57,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=721620.0, ans=0.0 2024-09-19 18:34:25,648 INFO [train.py:1198] (1/2) Epoch 40, batch 3950, loss[loss=0.241, ctc_loss=0.1199, cr_loss=0.3582, attn_decoder_loss=0.2465, over 29514.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.112, cr_loss=0.3518, attn_decoder_loss=0.2387, over 5835121.11 frames. ], batch size: 97, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:34:40,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=721740.0, ans=0.125 2024-09-19 18:34:48,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721740.0, ans=0.1 2024-09-19 18:34:49,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=721740.0, ans=0.1 2024-09-19 18:35:00,908 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.605e+01 9.141e+01 9.620e+01 2.736e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-19 18:35:01,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=721780.0, ans=0.09899494936611666 2024-09-19 18:35:14,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=721820.0, ans=0.125 2024-09-19 18:35:24,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=721860.0, ans=0.125 2024-09-19 18:35:32,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2024-09-19 18:35:39,103 INFO [train.py:1198] (1/2) Epoch 40, batch 4000, loss[loss=0.2286, ctc_loss=0.1056, cr_loss=0.334, attn_decoder_loss=0.2348, over 29492.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1121, cr_loss=0.352, attn_decoder_loss=0.2389, over 5813469.85 frames. ], batch size: 74, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:36:01,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=721940.0, ans=0.2 2024-09-19 18:36:12,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=721980.0, ans=0.0 2024-09-19 18:36:29,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=722020.0, ans=0.1 2024-09-19 18:36:29,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=722020.0, ans=0.125 2024-09-19 18:36:54,431 INFO [train.py:1198] (1/2) Epoch 40, batch 4050, loss[loss=0.2545, ctc_loss=0.146, cr_loss=0.3909, attn_decoder_loss=0.2579, over 20153.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.112, cr_loss=0.3514, attn_decoder_loss=0.2386, over 5796860.48 frames. ], batch size: 209, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:37:07,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.51 vs. limit=15.0 2024-09-19 18:37:17,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.53 vs. limit=10.0 2024-09-19 18:37:18,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=722140.0, ans=0.1 2024-09-19 18:37:19,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=722140.0, ans=0.125 2024-09-19 18:37:29,859 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.666e+01 9.149e+01 9.737e+01 4.805e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-19 18:37:40,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=722220.0, ans=0.0 2024-09-19 18:37:47,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=722220.0, ans=0.125 2024-09-19 18:37:57,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-19 18:38:08,046 INFO [train.py:1198] (1/2) Epoch 40, batch 4100, loss[loss=0.2592, ctc_loss=0.1388, cr_loss=0.4261, attn_decoder_loss=0.2631, over 29458.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1122, cr_loss=0.3512, attn_decoder_loss=0.2387, over 5792077.31 frames. ], batch size: 90, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:38:18,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=722300.0, ans=0.2 2024-09-19 18:38:28,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=722340.0, ans=0.2 2024-09-19 18:38:36,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=722380.0, ans=0.2 2024-09-19 18:38:42,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=722380.0, ans=0.0 2024-09-19 18:38:50,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=15.0 2024-09-19 18:39:23,039 INFO [train.py:1198] (1/2) Epoch 40, batch 4150, loss[loss=0.2237, ctc_loss=0.1049, cr_loss=0.3496, attn_decoder_loss=0.2291, over 29503.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1122, cr_loss=0.3508, attn_decoder_loss=0.2386, over 5797549.09 frames. ], batch size: 77, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:39:27,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=722500.0, ans=0.0 2024-09-19 18:39:27,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=722500.0, ans=0.0 2024-09-19 18:39:57,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.418e+01 8.915e+01 9.615e+01 1.835e+02, threshold=1.783e+02, percent-clipped=1.0 2024-09-19 18:40:36,033 INFO [train.py:1198] (1/2) Epoch 40, batch 4200, loss[loss=0.2463, ctc_loss=0.1221, cr_loss=0.3756, attn_decoder_loss=0.2517, over 29498.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1125, cr_loss=0.3517, attn_decoder_loss=0.2392, over 5799953.52 frames. ], batch size: 90, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:40:52,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722740.0, ans=0.1 2024-09-19 18:40:57,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=722740.0, ans=0.125 2024-09-19 18:41:08,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=722780.0, ans=0.2 2024-09-19 18:41:18,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=722780.0, ans=0.2 2024-09-19 18:41:19,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=722820.0, ans=0.125 2024-09-19 18:41:30,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=722820.0, ans=0.0 2024-09-19 18:41:47,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=722860.0, ans=0.125 2024-09-19 18:41:50,106 INFO [train.py:1198] (1/2) Epoch 40, batch 4250, loss[loss=0.2195, ctc_loss=0.09814, cr_loss=0.324, attn_decoder_loss=0.2258, over 29509.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1123, cr_loss=0.3517, attn_decoder_loss=0.2395, over 5805969.80 frames. ], batch size: 74, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:41:50,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=722900.0, ans=0.125 2024-09-19 18:41:56,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=722900.0, ans=0.125 2024-09-19 18:42:12,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=722940.0, ans=0.0 2024-09-19 18:42:27,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.534e+01 9.089e+01 9.722e+01 3.339e+02, threshold=1.818e+02, percent-clipped=2.0 2024-09-19 18:42:39,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=723020.0, ans=0.95 2024-09-19 18:42:42,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=723020.0, ans=0.07 2024-09-19 18:42:51,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=723060.0, ans=0.125 2024-09-19 18:43:04,245 INFO [train.py:1198] (1/2) Epoch 40, batch 4300, loss[loss=0.2439, ctc_loss=0.1116, cr_loss=0.347, attn_decoder_loss=0.2509, over 29516.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1123, cr_loss=0.3518, attn_decoder_loss=0.2397, over 5794582.28 frames. ], batch size: 87, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:43:12,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=723100.0, ans=0.0 2024-09-19 18:43:26,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=723140.0, ans=0.0 2024-09-19 18:43:28,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=723140.0, ans=0.125 2024-09-19 18:44:06,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=723260.0, ans=0.2 2024-09-19 18:44:15,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=723260.0, ans=0.0 2024-09-19 18:44:19,239 INFO [train.py:1198] (1/2) Epoch 40, batch 4350, loss[loss=0.2454, ctc_loss=0.1229, cr_loss=0.3649, attn_decoder_loss=0.2509, over 29501.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1151, cr_loss=0.3575, attn_decoder_loss=0.2427, over 5797601.55 frames. ], batch size: 97, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:44:25,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=723300.0, ans=0.125 2024-09-19 18:44:55,940 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 8.976e+01 9.434e+01 1.012e+02 1.882e+02, threshold=1.887e+02, percent-clipped=1.0 2024-09-19 18:44:56,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=723380.0, ans=0.95 2024-09-19 18:45:23,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2024-09-19 18:45:30,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=723460.0, ans=0.125 2024-09-19 18:45:32,930 INFO [train.py:1198] (1/2) Epoch 40, batch 4400, loss[loss=0.2415, ctc_loss=0.1179, cr_loss=0.3737, attn_decoder_loss=0.2469, over 27379.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1163, cr_loss=0.3599, attn_decoder_loss=0.2447, over 5767373.78 frames. ], batch size: 124, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:45:42,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.22 vs. limit=12.0 2024-09-19 18:45:44,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=723500.0, ans=0.125 2024-09-19 18:45:55,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=723540.0, ans=0.0 2024-09-19 18:46:31,712 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:46:33,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=723660.0, ans=0.125 2024-09-19 18:46:38,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=723660.0, ans=0.025 2024-09-19 18:46:38,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=15.0 2024-09-19 18:46:46,890 INFO [train.py:1198] (1/2) Epoch 40, batch 4450, loss[loss=0.2532, ctc_loss=0.1465, cr_loss=0.3731, attn_decoder_loss=0.2568, over 19816.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1199, cr_loss=0.3654, attn_decoder_loss=0.2467, over 5570431.54 frames. ], batch size: 209, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:46:53,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=723700.0, ans=0.125 2024-09-19 18:46:59,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723700.0, ans=0.1 2024-09-19 18:47:14,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=723740.0, ans=0.05 2024-09-19 18:47:16,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2024-09-19 18:47:26,344 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.232e+01 9.186e+01 1.020e+02 1.192e+02 3.727e+02, threshold=2.040e+02, percent-clipped=2.0 2024-09-19 18:48:02,473 INFO [train.py:1198] (1/2) Epoch 40, batch 4500, loss[loss=0.2497, ctc_loss=0.1397, cr_loss=0.3564, attn_decoder_loss=0.254, over 20483.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1229, cr_loss=0.3682, attn_decoder_loss=0.2484, over 5233227.71 frames. ], batch size: 209, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:48:03,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.04 vs. limit=10.0 2024-09-19 18:48:07,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=723900.0, ans=0.125 2024-09-19 18:48:17,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=723940.0, ans=0.125 2024-09-19 18:48:21,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.77 vs. limit=22.5 2024-09-19 18:48:22,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-09-19 18:48:33,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-09-19 18:49:10,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.58 vs. limit=22.5 2024-09-19 18:49:17,629 INFO [train.py:1198] (1/2) Epoch 41, batch 0, loss[loss=0.2099, ctc_loss=0.09146, cr_loss=0.2958, attn_decoder_loss=0.2164, over 29607.00 frames. ], tot_loss[loss=0.2099, ctc_loss=0.09146, cr_loss=0.2958, attn_decoder_loss=0.2164, over 29607.00 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 18:49:17,629 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 18:49:35,496 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7393, 4.2303, 4.5439, 4.6690], device='cuda:1') 2024-09-19 18:49:36,951 INFO [train.py:1230] (1/2) Epoch 41, validation: loss=0.2123, ctc_loss=0.03622, cr_loss=6.741e-15, attn_decoder_loss=0.2319, over 944034.00 frames. 2024-09-19 18:49:36,952 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 18:49:59,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=724040.0, ans=0.125 2024-09-19 18:50:01,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=724040.0, ans=0.125 2024-09-19 18:50:15,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=724080.0, ans=0.2 2024-09-19 18:50:16,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724080.0, ans=0.1 2024-09-19 18:50:32,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.04 vs. limit=15.0 2024-09-19 18:50:52,546 INFO [train.py:1198] (1/2) Epoch 41, batch 50, loss[loss=0.2078, ctc_loss=0.0907, cr_loss=0.3102, attn_decoder_loss=0.2139, over 29483.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1146, cr_loss=0.3579, attn_decoder_loss=0.2412, over 1266993.74 frames. ], batch size: 70, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 18:50:54,028 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.153e+01 1.062e+02 1.232e+02 3.092e+02, threshold=2.125e+02, percent-clipped=2.0 2024-09-19 18:51:00,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=724200.0, ans=0.0 2024-09-19 18:51:04,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=724200.0, ans=0.2 2024-09-19 18:51:07,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=724240.0, ans=0.0 2024-09-19 18:51:21,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=724280.0, ans=0.0 2024-09-19 18:51:32,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=724280.0, ans=0.0 2024-09-19 18:51:33,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=724280.0, ans=0.1 2024-09-19 18:51:38,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=724320.0, ans=0.125 2024-09-19 18:51:41,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.56 vs. limit=12.0 2024-09-19 18:51:43,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=22.5 2024-09-19 18:51:48,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.15 vs. limit=15.0 2024-09-19 18:52:07,810 INFO [train.py:1198] (1/2) Epoch 41, batch 100, loss[loss=0.2149, ctc_loss=0.0992, cr_loss=0.322, attn_decoder_loss=0.2206, over 29513.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1146, cr_loss=0.3576, attn_decoder_loss=0.2421, over 2251192.56 frames. ], batch size: 76, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 18:52:08,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2024-09-19 18:52:32,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.28 vs. limit=15.0 2024-09-19 18:52:37,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.17 vs. limit=15.0 2024-09-19 18:53:15,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724560.0, ans=0.1 2024-09-19 18:53:27,441 INFO [train.py:1198] (1/2) Epoch 41, batch 150, loss[loss=0.217, ctc_loss=0.09723, cr_loss=0.3282, attn_decoder_loss=0.223, over 29409.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1128, cr_loss=0.3537, attn_decoder_loss=0.2401, over 3046234.21 frames. ], batch size: 70, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:53:30,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 8.681e+01 9.088e+01 9.657e+01 1.697e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-19 18:53:34,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=15.0 2024-09-19 18:53:39,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=724600.0, ans=0.025 2024-09-19 18:53:49,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-09-19 18:53:58,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=22.5 2024-09-19 18:54:32,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=724760.0, ans=0.0 2024-09-19 18:54:42,651 INFO [train.py:1198] (1/2) Epoch 41, batch 200, loss[loss=0.2473, ctc_loss=0.1203, cr_loss=0.3591, attn_decoder_loss=0.2534, over 27503.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1124, cr_loss=0.3533, attn_decoder_loss=0.2392, over 3658267.46 frames. ], batch size: 125, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:55:06,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-09-19 18:55:13,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=724880.0, ans=0.2 2024-09-19 18:55:22,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=724880.0, ans=0.0 2024-09-19 18:55:40,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=724920.0, ans=0.0 2024-09-19 18:55:45,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=724960.0, ans=0.025 2024-09-19 18:55:49,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=724960.0, ans=0.0 2024-09-19 18:55:57,748 INFO [train.py:1198] (1/2) Epoch 41, batch 250, loss[loss=0.2409, ctc_loss=0.1145, cr_loss=0.3621, attn_decoder_loss=0.2469, over 29226.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1121, cr_loss=0.3529, attn_decoder_loss=0.2388, over 4140999.70 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:56:00,844 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.416e+01 8.964e+01 9.351e+01 1.561e+02, threshold=1.793e+02, percent-clipped=0.0 2024-09-19 18:56:02,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=725000.0, ans=0.0 2024-09-19 18:56:04,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=725000.0, ans=0.125 2024-09-19 18:56:04,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=725000.0, ans=0.125 2024-09-19 18:56:23,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=725040.0, ans=0.125 2024-09-19 18:56:26,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=725080.0, ans=0.125 2024-09-19 18:56:46,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=725120.0, ans=0.0 2024-09-19 18:56:59,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=725160.0, ans=0.09899494936611666 2024-09-19 18:57:09,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-09-19 18:57:09,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.92 vs. limit=15.0 2024-09-19 18:57:17,569 INFO [train.py:1198] (1/2) Epoch 41, batch 300, loss[loss=0.2442, ctc_loss=0.1247, cr_loss=0.3918, attn_decoder_loss=0.2488, over 29553.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1112, cr_loss=0.3503, attn_decoder_loss=0.238, over 4510201.53 frames. ], batch size: 92, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:58:33,155 INFO [train.py:1198] (1/2) Epoch 41, batch 350, loss[loss=0.2074, ctc_loss=0.09603, cr_loss=0.3139, attn_decoder_loss=0.2127, over 29762.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.112, cr_loss=0.3513, attn_decoder_loss=0.2387, over 4795313.58 frames. ], batch size: 72, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:58:36,042 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.397e+01 8.852e+01 9.608e+01 1.644e+02, threshold=1.770e+02, percent-clipped=0.0 2024-09-19 18:58:36,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=725400.0, ans=0.1 2024-09-19 18:58:42,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=725400.0, ans=0.125 2024-09-19 18:58:56,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.48 vs. limit=10.0 2024-09-19 18:59:22,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=725520.0, ans=0.2 2024-09-19 18:59:48,501 INFO [train.py:1198] (1/2) Epoch 41, batch 400, loss[loss=0.2399, ctc_loss=0.1139, cr_loss=0.3595, attn_decoder_loss=0.2459, over 29735.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1121, cr_loss=0.3521, attn_decoder_loss=0.2385, over 5024608.62 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:00:05,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=725640.0, ans=0.125 2024-09-19 19:00:14,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=725640.0, ans=0.0 2024-09-19 19:00:16,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=725640.0, ans=0.0 2024-09-19 19:00:19,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=725680.0, ans=0.0 2024-09-19 19:01:08,850 INFO [train.py:1198] (1/2) Epoch 41, batch 450, loss[loss=0.2468, ctc_loss=0.1215, cr_loss=0.379, attn_decoder_loss=0.2522, over 29694.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1124, cr_loss=0.3525, attn_decoder_loss=0.2387, over 5188538.54 frames. ], batch size: 83, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:01:11,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.467e+01 8.907e+01 9.504e+01 2.028e+02, threshold=1.781e+02, percent-clipped=1.0 2024-09-19 19:01:39,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=725880.0, ans=0.0 2024-09-19 19:02:15,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=725960.0, ans=0.2 2024-09-19 19:02:20,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-09-19 19:02:24,456 INFO [train.py:1198] (1/2) Epoch 41, batch 500, loss[loss=0.2538, ctc_loss=0.1269, cr_loss=0.4031, attn_decoder_loss=0.259, over 29450.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1121, cr_loss=0.3521, attn_decoder_loss=0.2383, over 5331113.44 frames. ], batch size: 94, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:02:32,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=726000.0, ans=0.95 2024-09-19 19:02:43,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=726040.0, ans=0.2 2024-09-19 19:03:06,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=726080.0, ans=0.0 2024-09-19 19:03:39,813 INFO [train.py:1198] (1/2) Epoch 41, batch 550, loss[loss=0.2391, ctc_loss=0.11, cr_loss=0.349, attn_decoder_loss=0.2456, over 28945.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1122, cr_loss=0.3524, attn_decoder_loss=0.2383, over 5423341.09 frames. ], batch size: 104, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:03:42,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.739e+01 9.193e+01 9.957e+01 2.783e+02, threshold=1.839e+02, percent-clipped=3.0 2024-09-19 19:04:02,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=726240.0, ans=0.125 2024-09-19 19:04:14,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-19 19:04:21,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.80 vs. limit=22.5 2024-09-19 19:04:23,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=726320.0, ans=0.035 2024-09-19 19:04:31,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=726320.0, ans=0.025 2024-09-19 19:04:40,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=726360.0, ans=0.125 2024-09-19 19:04:42,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726360.0, ans=0.1 2024-09-19 19:04:58,276 INFO [train.py:1198] (1/2) Epoch 41, batch 600, loss[loss=0.2563, ctc_loss=0.13, cr_loss=0.3758, attn_decoder_loss=0.2619, over 29259.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1124, cr_loss=0.3527, attn_decoder_loss=0.2386, over 5509592.11 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:05:20,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=726440.0, ans=0.125 2024-09-19 19:05:31,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=726480.0, ans=0.2 2024-09-19 19:05:43,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=726520.0, ans=0.0 2024-09-19 19:05:47,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.39 vs. limit=10.0 2024-09-19 19:05:48,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=726520.0, ans=0.0 2024-09-19 19:05:49,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=726520.0, ans=0.0 2024-09-19 19:05:51,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=726520.0, ans=0.125 2024-09-19 19:06:15,345 INFO [train.py:1198] (1/2) Epoch 41, batch 650, loss[loss=0.247, ctc_loss=0.1241, cr_loss=0.3882, attn_decoder_loss=0.2521, over 29743.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1113, cr_loss=0.3508, attn_decoder_loss=0.238, over 5586708.84 frames. ], batch size: 81, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:06:17,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2024-09-19 19:06:19,865 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.350e+01 8.880e+01 9.262e+01 1.448e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-19 19:06:27,916 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:06:39,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=726640.0, ans=0.0 2024-09-19 19:07:18,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2024-09-19 19:07:23,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-09-19 19:07:25,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2024-09-19 19:07:30,713 INFO [train.py:1198] (1/2) Epoch 41, batch 700, loss[loss=0.2274, ctc_loss=0.1092, cr_loss=0.3535, attn_decoder_loss=0.2327, over 29539.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1116, cr_loss=0.3517, attn_decoder_loss=0.2386, over 5637676.82 frames. ], batch size: 76, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:07:32,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=726800.0, ans=0.0 2024-09-19 19:07:51,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=15.0 2024-09-19 19:07:51,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=726840.0, ans=0.0 2024-09-19 19:07:58,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.15 vs. limit=15.0 2024-09-19 19:08:08,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=726880.0, ans=0.0 2024-09-19 19:08:23,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726920.0, ans=0.1 2024-09-19 19:08:30,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.55 vs. limit=12.0 2024-09-19 19:08:31,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.32 vs. limit=12.0 2024-09-19 19:08:46,103 INFO [train.py:1198] (1/2) Epoch 41, batch 750, loss[loss=0.235, ctc_loss=0.1174, cr_loss=0.3498, attn_decoder_loss=0.2402, over 29703.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1119, cr_loss=0.3518, attn_decoder_loss=0.2385, over 5678033.83 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:08:52,745 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.416e+01 8.976e+01 9.718e+01 1.767e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-19 19:08:56,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=727000.0, ans=0.1 2024-09-19 19:09:02,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=727040.0, ans=0.0 2024-09-19 19:09:43,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=727120.0, ans=0.0 2024-09-19 19:09:44,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=727120.0, ans=0.125 2024-09-19 19:10:06,044 INFO [train.py:1198] (1/2) Epoch 41, batch 800, loss[loss=0.2161, ctc_loss=0.09687, cr_loss=0.3178, attn_decoder_loss=0.2223, over 29619.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1118, cr_loss=0.3519, attn_decoder_loss=0.2382, over 5708201.15 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:10:06,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=727200.0, ans=0.125 2024-09-19 19:10:22,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=727240.0, ans=0.125 2024-09-19 19:10:28,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=727240.0, ans=0.125 2024-09-19 19:10:35,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=727280.0, ans=0.125 2024-09-19 19:11:05,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=727360.0, ans=0.125 2024-09-19 19:11:21,343 INFO [train.py:1198] (1/2) Epoch 41, batch 850, loss[loss=0.2361, ctc_loss=0.1075, cr_loss=0.3417, attn_decoder_loss=0.2428, over 29684.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1113, cr_loss=0.3505, attn_decoder_loss=0.2379, over 5738962.43 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:11:25,688 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.437e+01 9.040e+01 9.490e+01 1.672e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 19:11:54,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=727480.0, ans=0.025 2024-09-19 19:11:56,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=727480.0, ans=0.125 2024-09-19 19:12:05,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=727520.0, ans=0.015 2024-09-19 19:12:06,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=727520.0, ans=0.125 2024-09-19 19:12:14,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=727520.0, ans=0.2 2024-09-19 19:12:16,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=727520.0, ans=0.125 2024-09-19 19:12:19,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=727520.0, ans=0.125 2024-09-19 19:12:21,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.71 vs. limit=10.0 2024-09-19 19:12:37,294 INFO [train.py:1198] (1/2) Epoch 41, batch 900, loss[loss=0.2147, ctc_loss=0.09822, cr_loss=0.3168, attn_decoder_loss=0.2206, over 29611.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1115, cr_loss=0.3509, attn_decoder_loss=0.2382, over 5742353.10 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:12:43,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=22.5 2024-09-19 19:12:46,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=12.0 2024-09-19 19:12:47,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=727600.0, ans=0.125 2024-09-19 19:12:53,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=727640.0, ans=0.0 2024-09-19 19:12:57,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=727640.0, ans=0.2 2024-09-19 19:13:28,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2024-09-19 19:13:49,325 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:13:53,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=727760.0, ans=0.125 2024-09-19 19:13:56,362 INFO [train.py:1198] (1/2) Epoch 41, batch 950, loss[loss=0.2126, ctc_loss=0.09233, cr_loss=0.2993, attn_decoder_loss=0.2193, over 29524.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1115, cr_loss=0.3506, attn_decoder_loss=0.2385, over 5744438.22 frames. ], batch size: 74, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:14:00,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.606e+01 9.118e+01 9.826e+01 2.095e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 19:14:14,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=727840.0, ans=0.125 2024-09-19 19:14:54,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=727920.0, ans=0.2 2024-09-19 19:15:04,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=727960.0, ans=0.0 2024-09-19 19:15:12,361 INFO [train.py:1198] (1/2) Epoch 41, batch 1000, loss[loss=0.226, ctc_loss=0.111, cr_loss=0.3544, attn_decoder_loss=0.2309, over 29517.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1121, cr_loss=0.3514, attn_decoder_loss=0.2388, over 5737680.94 frames. ], batch size: 77, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:15:33,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-09-19 19:15:35,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=728040.0, ans=0.2 2024-09-19 19:15:45,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=728080.0, ans=0.015 2024-09-19 19:15:46,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=728080.0, ans=0.125 2024-09-19 19:15:49,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=728080.0, ans=0.125 2024-09-19 19:15:52,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=728080.0, ans=0.125 2024-09-19 19:16:01,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=728120.0, ans=0.125 2024-09-19 19:16:11,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=728160.0, ans=0.07 2024-09-19 19:16:21,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=728160.0, ans=0.125 2024-09-19 19:16:28,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=728200.0, ans=0.125 2024-09-19 19:16:29,754 INFO [train.py:1198] (1/2) Epoch 41, batch 1050, loss[loss=0.2392, ctc_loss=0.1152, cr_loss=0.3617, attn_decoder_loss=0.2449, over 29658.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.112, cr_loss=0.351, attn_decoder_loss=0.2382, over 5744057.03 frames. ], batch size: 85, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:16:30,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=728200.0, ans=0.0 2024-09-19 19:16:35,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.570e+01 9.055e+01 9.661e+01 1.822e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-19 19:17:08,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=728280.0, ans=0.0 2024-09-19 19:17:12,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=728280.0, ans=0.07 2024-09-19 19:17:13,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=15.0 2024-09-19 19:17:19,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=728320.0, ans=0.0 2024-09-19 19:17:47,180 INFO [train.py:1198] (1/2) Epoch 41, batch 1100, loss[loss=0.2214, ctc_loss=0.1069, cr_loss=0.3361, attn_decoder_loss=0.2267, over 29444.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1116, cr_loss=0.3504, attn_decoder_loss=0.2378, over 5756208.17 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:17:48,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=728400.0, ans=0.0 2024-09-19 19:17:56,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=728400.0, ans=0.125 2024-09-19 19:17:59,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=728400.0, ans=0.0 2024-09-19 19:18:11,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=728440.0, ans=0.2 2024-09-19 19:18:21,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.00 vs. limit=22.5 2024-09-19 19:18:22,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2024-09-19 19:18:42,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-09-19 19:18:55,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=728560.0, ans=0.04949747468305833 2024-09-19 19:18:59,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=728560.0, ans=0.0 2024-09-19 19:19:02,682 INFO [train.py:1198] (1/2) Epoch 41, batch 1150, loss[loss=0.222, ctc_loss=0.102, cr_loss=0.3314, attn_decoder_loss=0.2279, over 29470.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1117, cr_loss=0.3509, attn_decoder_loss=0.2381, over 5753096.54 frames. ], batch size: 78, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:19:08,827 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.493e+01 8.986e+01 9.432e+01 3.581e+02, threshold=1.797e+02, percent-clipped=4.0 2024-09-19 19:19:20,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-19 19:19:22,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=728640.0, ans=0.1 2024-09-19 19:19:28,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=728640.0, ans=0.125 2024-09-19 19:20:17,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=728760.0, ans=0.0 2024-09-19 19:20:19,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=728800.0, ans=0.125 2024-09-19 19:20:20,764 INFO [train.py:1198] (1/2) Epoch 41, batch 1200, loss[loss=0.234, ctc_loss=0.1068, cr_loss=0.3485, attn_decoder_loss=0.2404, over 29688.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1121, cr_loss=0.3518, attn_decoder_loss=0.2388, over 5745567.77 frames. ], batch size: 85, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:20:39,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=728840.0, ans=0.0 2024-09-19 19:20:39,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=728840.0, ans=0.125 2024-09-19 19:20:46,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=728840.0, ans=0.125 2024-09-19 19:20:47,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2024-09-19 19:20:55,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=728880.0, ans=0.025 2024-09-19 19:21:10,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728920.0, ans=0.1 2024-09-19 19:21:13,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=728920.0, ans=0.1 2024-09-19 19:21:28,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=728960.0, ans=0.125 2024-09-19 19:21:30,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-19 19:21:37,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=729000.0, ans=0.125 2024-09-19 19:21:37,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=729000.0, ans=0.1 2024-09-19 19:21:38,651 INFO [train.py:1198] (1/2) Epoch 41, batch 1250, loss[loss=0.2461, ctc_loss=0.1255, cr_loss=0.3873, attn_decoder_loss=0.2509, over 29527.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1127, cr_loss=0.3531, attn_decoder_loss=0.2394, over 5774007.06 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:21:44,546 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.620e+01 9.115e+01 9.641e+01 1.627e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-19 19:21:48,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=729000.0, ans=0.0 2024-09-19 19:21:58,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=729040.0, ans=0.1 2024-09-19 19:22:00,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=729040.0, ans=0.125 2024-09-19 19:22:06,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2024-09-19 19:22:32,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=729120.0, ans=0.0 2024-09-19 19:22:43,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2024-09-19 19:22:47,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=729160.0, ans=0.125 2024-09-19 19:22:54,564 INFO [train.py:1198] (1/2) Epoch 41, batch 1300, loss[loss=0.2452, ctc_loss=0.1188, cr_loss=0.3636, attn_decoder_loss=0.2512, over 28498.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1125, cr_loss=0.3526, attn_decoder_loss=0.2389, over 5778338.39 frames. ], batch size: 112, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:22:57,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=729200.0, ans=0.125 2024-09-19 19:22:59,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=729200.0, ans=0.025 2024-09-19 19:23:17,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=729240.0, ans=0.0 2024-09-19 19:23:40,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=729320.0, ans=0.0 2024-09-19 19:23:52,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=729320.0, ans=0.05 2024-09-19 19:23:52,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=729320.0, ans=0.2 2024-09-19 19:23:57,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=729360.0, ans=0.1 2024-09-19 19:24:10,601 INFO [train.py:1198] (1/2) Epoch 41, batch 1350, loss[loss=0.2372, ctc_loss=0.114, cr_loss=0.3628, attn_decoder_loss=0.2428, over 29750.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1121, cr_loss=0.3521, attn_decoder_loss=0.2387, over 5794636.50 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:24:14,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=729400.0, ans=0.1 2024-09-19 19:24:18,634 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.406e+01 8.862e+01 9.438e+01 1.295e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 19:24:18,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=729400.0, ans=0.125 2024-09-19 19:24:51,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-09-19 19:25:02,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=22.5 2024-09-19 19:25:04,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=729520.0, ans=0.025 2024-09-19 19:25:10,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=729520.0, ans=0.125 2024-09-19 19:25:22,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=729560.0, ans=0.0 2024-09-19 19:25:30,638 INFO [train.py:1198] (1/2) Epoch 41, batch 1400, loss[loss=0.2052, ctc_loss=0.09652, cr_loss=0.32, attn_decoder_loss=0.2102, over 29575.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1118, cr_loss=0.3512, attn_decoder_loss=0.2385, over 5805620.04 frames. ], batch size: 69, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:25:37,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-09-19 19:25:51,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=729640.0, ans=0.125 2024-09-19 19:25:56,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=729640.0, ans=0.1 2024-09-19 19:26:19,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=729720.0, ans=0.1 2024-09-19 19:26:20,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=729720.0, ans=0.125 2024-09-19 19:26:29,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=729760.0, ans=0.125 2024-09-19 19:26:32,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=729760.0, ans=0.125 2024-09-19 19:26:32,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2024-09-19 19:26:41,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=729760.0, ans=0.0 2024-09-19 19:26:45,627 INFO [train.py:1198] (1/2) Epoch 41, batch 1450, loss[loss=0.2557, ctc_loss=0.1274, cr_loss=0.382, attn_decoder_loss=0.2615, over 29469.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1122, cr_loss=0.3522, attn_decoder_loss=0.239, over 5802858.45 frames. ], batch size: 94, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:26:48,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=729800.0, ans=0.125 2024-09-19 19:26:51,369 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.557e+01 9.068e+01 9.745e+01 1.592e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-19 19:27:14,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=729880.0, ans=0.0 2024-09-19 19:27:18,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=12.0 2024-09-19 19:27:26,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-09-19 19:27:34,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=729920.0, ans=0.5 2024-09-19 19:27:44,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=729960.0, ans=0.2 2024-09-19 19:28:03,359 INFO [train.py:1198] (1/2) Epoch 41, batch 1500, loss[loss=0.2449, ctc_loss=0.1189, cr_loss=0.372, attn_decoder_loss=0.2507, over 29634.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1123, cr_loss=0.3521, attn_decoder_loss=0.2393, over 5802648.03 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:28:05,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.93 vs. limit=12.0 2024-09-19 19:28:49,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=730120.0, ans=0.0 2024-09-19 19:28:54,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=730120.0, ans=0.5 2024-09-19 19:28:54,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=730120.0, ans=0.025 2024-09-19 19:29:12,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=730160.0, ans=0.05 2024-09-19 19:29:21,288 INFO [train.py:1198] (1/2) Epoch 41, batch 1550, loss[loss=0.2426, ctc_loss=0.1227, cr_loss=0.3705, attn_decoder_loss=0.2476, over 29487.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1129, cr_loss=0.353, attn_decoder_loss=0.2393, over 5779508.52 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:29:27,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.596e+01 9.016e+01 9.921e+01 2.014e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 19:29:31,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=730200.0, ans=0.125 2024-09-19 19:30:36,413 INFO [train.py:1198] (1/2) Epoch 41, batch 1600, loss[loss=0.2416, ctc_loss=0.1135, cr_loss=0.3533, attn_decoder_loss=0.2479, over 29665.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1129, cr_loss=0.3524, attn_decoder_loss=0.239, over 5762133.71 frames. ], batch size: 85, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:30:38,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730400.0, ans=0.1 2024-09-19 19:30:38,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=730400.0, ans=0.125 2024-09-19 19:30:48,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=730400.0, ans=0.0 2024-09-19 19:30:51,782 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:30:54,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=730440.0, ans=0.0 2024-09-19 19:31:04,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=730440.0, ans=0.125 2024-09-19 19:31:05,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=730480.0, ans=0.125 2024-09-19 19:31:08,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=730480.0, ans=0.05 2024-09-19 19:31:54,180 INFO [train.py:1198] (1/2) Epoch 41, batch 1650, loss[loss=0.2489, ctc_loss=0.1233, cr_loss=0.3689, attn_decoder_loss=0.2546, over 29712.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1129, cr_loss=0.3527, attn_decoder_loss=0.2391, over 5758078.43 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:31:59,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=730600.0, ans=0.0 2024-09-19 19:32:03,241 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.587e+01 9.228e+01 9.861e+01 2.680e+02, threshold=1.846e+02, percent-clipped=1.0 2024-09-19 19:32:12,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=730640.0, ans=0.2 2024-09-19 19:33:11,311 INFO [train.py:1198] (1/2) Epoch 41, batch 1700, loss[loss=0.2113, ctc_loss=0.1042, cr_loss=0.3295, attn_decoder_loss=0.2159, over 29569.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1127, cr_loss=0.3529, attn_decoder_loss=0.2389, over 5779421.60 frames. ], batch size: 69, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:33:27,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.55 vs. limit=12.0 2024-09-19 19:33:31,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730840.0, ans=0.1 2024-09-19 19:33:46,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=730880.0, ans=0.04949747468305833 2024-09-19 19:33:52,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=730880.0, ans=0.125 2024-09-19 19:33:52,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=730880.0, ans=0.125 2024-09-19 19:34:10,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=730960.0, ans=0.125 2024-09-19 19:34:13,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=730960.0, ans=0.125 2024-09-19 19:34:26,881 INFO [train.py:1198] (1/2) Epoch 41, batch 1750, loss[loss=0.2069, ctc_loss=0.091, cr_loss=0.306, attn_decoder_loss=0.213, over 29330.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1122, cr_loss=0.3515, attn_decoder_loss=0.2382, over 5787963.25 frames. ], batch size: 67, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:34:35,972 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.612e+01 9.117e+01 9.709e+01 1.098e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-19 19:34:37,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=731000.0, ans=0.125 2024-09-19 19:34:46,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=731040.0, ans=0.0 2024-09-19 19:34:57,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=731080.0, ans=0.0 2024-09-19 19:35:08,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2024-09-19 19:35:15,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=731120.0, ans=0.2 2024-09-19 19:35:15,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=731120.0, ans=0.0 2024-09-19 19:35:27,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=731160.0, ans=0.1 2024-09-19 19:35:44,367 INFO [train.py:1198] (1/2) Epoch 41, batch 1800, loss[loss=0.248, ctc_loss=0.1231, cr_loss=0.3725, attn_decoder_loss=0.2536, over 29667.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1122, cr_loss=0.3519, attn_decoder_loss=0.2385, over 5790409.66 frames. ], batch size: 83, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:35:46,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=731200.0, ans=0.0 2024-09-19 19:35:48,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.92 vs. limit=6.0 2024-09-19 19:36:04,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=731240.0, ans=0.125 2024-09-19 19:36:14,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=731280.0, ans=0.125 2024-09-19 19:36:19,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=731280.0, ans=0.5 2024-09-19 19:36:19,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=731280.0, ans=0.125 2024-09-19 19:36:22,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=731280.0, ans=0.125 2024-09-19 19:36:25,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731280.0, ans=0.1 2024-09-19 19:36:58,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=12.0 2024-09-19 19:37:02,117 INFO [train.py:1198] (1/2) Epoch 41, batch 1850, loss[loss=0.2484, ctc_loss=0.1183, cr_loss=0.3735, attn_decoder_loss=0.2546, over 29616.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.112, cr_loss=0.3519, attn_decoder_loss=0.2385, over 5797439.27 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:37:03,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=731400.0, ans=0.125 2024-09-19 19:37:10,994 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.675e+01 9.084e+01 9.615e+01 1.395e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-19 19:37:20,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=731440.0, ans=0.125 2024-09-19 19:37:27,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-19 19:37:30,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=731480.0, ans=0.125 2024-09-19 19:37:36,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=731480.0, ans=0.0 2024-09-19 19:37:39,968 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:37:44,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=731480.0, ans=0.125 2024-09-19 19:37:57,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=731520.0, ans=0.0 2024-09-19 19:37:59,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=731520.0, ans=0.0 2024-09-19 19:38:17,056 INFO [train.py:1198] (1/2) Epoch 41, batch 1900, loss[loss=0.2438, ctc_loss=0.1224, cr_loss=0.3733, attn_decoder_loss=0.249, over 29691.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1124, cr_loss=0.353, attn_decoder_loss=0.2393, over 5805481.97 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:38:19,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=731600.0, ans=0.025 2024-09-19 19:38:22,115 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:38:23,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=731600.0, ans=0.125 2024-09-19 19:38:26,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=731600.0, ans=0.0 2024-09-19 19:38:28,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=731600.0, ans=0.125 2024-09-19 19:39:02,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=731720.0, ans=0.0 2024-09-19 19:39:12,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2024-09-19 19:39:34,504 INFO [train.py:1198] (1/2) Epoch 41, batch 1950, loss[loss=0.2316, ctc_loss=0.1106, cr_loss=0.3685, attn_decoder_loss=0.2369, over 29428.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1131, cr_loss=0.3544, attn_decoder_loss=0.2404, over 5820028.97 frames. ], batch size: 78, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:39:43,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.775e+01 9.303e+01 9.846e+01 2.591e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-19 19:39:57,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=731840.0, ans=0.0 2024-09-19 19:40:06,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=731880.0, ans=0.125 2024-09-19 19:40:23,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=731920.0, ans=0.125 2024-09-19 19:40:48,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=731960.0, ans=0.2 2024-09-19 19:40:51,612 INFO [train.py:1198] (1/2) Epoch 41, batch 2000, loss[loss=0.2089, ctc_loss=0.1011, cr_loss=0.3179, attn_decoder_loss=0.2138, over 29323.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1132, cr_loss=0.3545, attn_decoder_loss=0.2405, over 5798003.48 frames. ], batch size: 67, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:41:05,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=732040.0, ans=0.125 2024-09-19 19:41:26,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=732080.0, ans=0.0 2024-09-19 19:41:37,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=732120.0, ans=0.125 2024-09-19 19:41:44,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=732120.0, ans=0.1 2024-09-19 19:42:05,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-09-19 19:42:07,261 INFO [train.py:1198] (1/2) Epoch 41, batch 2050, loss[loss=0.1969, ctc_loss=0.07727, cr_loss=0.265, attn_decoder_loss=0.2043, over 29431.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1127, cr_loss=0.353, attn_decoder_loss=0.2395, over 5790586.03 frames. ], batch size: 70, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:42:07,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=732200.0, ans=0.025 2024-09-19 19:42:16,362 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 8.645e+01 9.096e+01 9.473e+01 4.528e+02, threshold=1.819e+02, percent-clipped=2.0 2024-09-19 19:42:19,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=732200.0, ans=0.2 2024-09-19 19:42:24,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=732240.0, ans=0.025 2024-09-19 19:42:25,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=22.5 2024-09-19 19:42:30,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=732240.0, ans=0.0 2024-09-19 19:43:08,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=732360.0, ans=0.0 2024-09-19 19:43:19,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=732360.0, ans=0.2 2024-09-19 19:43:25,408 INFO [train.py:1198] (1/2) Epoch 41, batch 2100, loss[loss=0.2218, ctc_loss=0.09582, cr_loss=0.3194, attn_decoder_loss=0.2287, over 29755.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1117, cr_loss=0.3511, attn_decoder_loss=0.2387, over 5801225.59 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:43:36,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=732400.0, ans=0.125 2024-09-19 19:43:51,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2024-09-19 19:43:57,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732480.0, ans=0.1 2024-09-19 19:44:04,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732480.0, ans=0.125 2024-09-19 19:44:25,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=732560.0, ans=0.0 2024-09-19 19:44:42,399 INFO [train.py:1198] (1/2) Epoch 41, batch 2150, loss[loss=0.2299, ctc_loss=0.1117, cr_loss=0.3513, attn_decoder_loss=0.2353, over 29435.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1109, cr_loss=0.3496, attn_decoder_loss=0.2379, over 5815522.88 frames. ], batch size: 78, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:44:43,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-19 19:44:51,575 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.227e+01 8.830e+01 9.472e+01 1.149e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-19 19:44:54,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=732600.0, ans=0.125 2024-09-19 19:45:22,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=732680.0, ans=0.125 2024-09-19 19:45:24,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=732680.0, ans=0.125 2024-09-19 19:45:29,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=732720.0, ans=0.2 2024-09-19 19:45:40,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=732720.0, ans=0.125 2024-09-19 19:45:43,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=732760.0, ans=0.0 2024-09-19 19:45:46,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=732760.0, ans=0.2 2024-09-19 19:45:49,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=732760.0, ans=0.025 2024-09-19 19:45:58,479 INFO [train.py:1198] (1/2) Epoch 41, batch 2200, loss[loss=0.2451, ctc_loss=0.1125, cr_loss=0.3539, attn_decoder_loss=0.252, over 29611.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1115, cr_loss=0.3502, attn_decoder_loss=0.2382, over 5812825.60 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:46:03,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=732800.0, ans=0.1 2024-09-19 19:46:42,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=732920.0, ans=0.125 2024-09-19 19:46:51,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=732920.0, ans=0.05 2024-09-19 19:47:07,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2024-09-19 19:47:16,341 INFO [train.py:1198] (1/2) Epoch 41, batch 2250, loss[loss=0.2384, ctc_loss=0.113, cr_loss=0.3553, attn_decoder_loss=0.2445, over 29720.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1116, cr_loss=0.3506, attn_decoder_loss=0.2386, over 5813475.24 frames. ], batch size: 82, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:47:16,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=733000.0, ans=0.0 2024-09-19 19:47:25,241 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.546e+01 9.093e+01 9.694e+01 2.560e+02, threshold=1.819e+02, percent-clipped=3.0 2024-09-19 19:47:27,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=733000.0, ans=0.0 2024-09-19 19:47:57,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=733080.0, ans=0.125 2024-09-19 19:48:01,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=733120.0, ans=0.125 2024-09-19 19:48:06,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.51 vs. limit=15.0 2024-09-19 19:48:33,631 INFO [train.py:1198] (1/2) Epoch 41, batch 2300, loss[loss=0.2063, ctc_loss=0.09303, cr_loss=0.3145, attn_decoder_loss=0.2119, over 29311.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1107, cr_loss=0.3493, attn_decoder_loss=0.2375, over 5798699.20 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:49:06,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2024-09-19 19:49:45,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-09-19 19:49:49,350 INFO [train.py:1198] (1/2) Epoch 41, batch 2350, loss[loss=0.2395, ctc_loss=0.1151, cr_loss=0.3605, attn_decoder_loss=0.2453, over 29692.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.111, cr_loss=0.3495, attn_decoder_loss=0.2376, over 5804848.96 frames. ], batch size: 83, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:49:54,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=733400.0, ans=15.0 2024-09-19 19:49:54,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-09-19 19:49:58,160 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.660e+01 9.088e+01 9.774e+01 1.601e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-19 19:50:00,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=733400.0, ans=0.0 2024-09-19 19:50:06,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=733440.0, ans=0.1 2024-09-19 19:50:06,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=733440.0, ans=0.2 2024-09-19 19:50:09,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=733440.0, ans=0.125 2024-09-19 19:50:12,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=733440.0, ans=0.2 2024-09-19 19:50:19,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733480.0, ans=0.1 2024-09-19 19:50:37,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=733520.0, ans=0.125 2024-09-19 19:50:52,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=733560.0, ans=0.0 2024-09-19 19:50:57,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=733560.0, ans=0.1 2024-09-19 19:51:06,699 INFO [train.py:1198] (1/2) Epoch 41, batch 2400, loss[loss=0.232, ctc_loss=0.1158, cr_loss=0.3754, attn_decoder_loss=0.2366, over 29536.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1115, cr_loss=0.3509, attn_decoder_loss=0.2381, over 5808791.13 frames. ], batch size: 76, lr: 2.69e-03, grad_scale: 32.0 2024-09-19 19:51:08,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=733600.0, ans=0.2 2024-09-19 19:51:50,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.82 vs. limit=10.0 2024-09-19 19:52:06,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=733760.0, ans=0.125 2024-09-19 19:52:09,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=733760.0, ans=0.0 2024-09-19 19:52:10,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=733760.0, ans=0.125 2024-09-19 19:52:24,341 INFO [train.py:1198] (1/2) Epoch 41, batch 2450, loss[loss=0.2374, ctc_loss=0.1157, cr_loss=0.3555, attn_decoder_loss=0.243, over 29705.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.112, cr_loss=0.3516, attn_decoder_loss=0.2389, over 5784739.42 frames. ], batch size: 82, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:52:24,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=733800.0, ans=0.0 2024-09-19 19:52:34,692 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.655e+01 9.209e+01 9.754e+01 2.010e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-19 19:52:54,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=733880.0, ans=0.125 2024-09-19 19:52:54,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=733880.0, ans=0.0 2024-09-19 19:53:03,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=733880.0, ans=0.125 2024-09-19 19:53:38,228 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:53:39,343 INFO [train.py:1198] (1/2) Epoch 41, batch 2500, loss[loss=0.242, ctc_loss=0.1128, cr_loss=0.354, attn_decoder_loss=0.2485, over 29628.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1122, cr_loss=0.3524, attn_decoder_loss=0.239, over 5794705.60 frames. ], batch size: 86, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:54:02,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=734040.0, ans=0.125 2024-09-19 19:54:14,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=734080.0, ans=0.2 2024-09-19 19:54:40,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=734160.0, ans=0.05 2024-09-19 19:54:43,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.89 vs. limit=6.0 2024-09-19 19:54:48,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.78 vs. limit=15.0 2024-09-19 19:54:57,249 INFO [train.py:1198] (1/2) Epoch 41, batch 2550, loss[loss=0.2091, ctc_loss=0.09644, cr_loss=0.3157, attn_decoder_loss=0.2146, over 29347.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1118, cr_loss=0.3521, attn_decoder_loss=0.2388, over 5797079.44 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:54:59,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.37 vs. limit=22.5 2024-09-19 19:55:06,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=734200.0, ans=0.125 2024-09-19 19:55:07,642 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.421e+01 8.984e+01 9.489e+01 4.917e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 19:55:15,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2024-09-19 19:55:16,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=734240.0, ans=0.0 2024-09-19 19:55:24,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=734240.0, ans=0.0 2024-09-19 19:55:35,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.94 vs. limit=10.0 2024-09-19 19:55:38,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=734280.0, ans=0.0 2024-09-19 19:55:38,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-09-19 19:55:47,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=734320.0, ans=0.125 2024-09-19 19:55:47,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=734320.0, ans=0.125 2024-09-19 19:55:48,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=734320.0, ans=0.125 2024-09-19 19:56:13,325 INFO [train.py:1198] (1/2) Epoch 41, batch 2600, loss[loss=0.229, ctc_loss=0.1161, cr_loss=0.3746, attn_decoder_loss=0.2332, over 29467.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.112, cr_loss=0.3528, attn_decoder_loss=0.2393, over 5794328.16 frames. ], batch size: 78, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:56:18,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-09-19 19:56:31,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-19 19:56:33,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=734440.0, ans=0.2 2024-09-19 19:56:35,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=734440.0, ans=0.0 2024-09-19 19:56:35,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734440.0, ans=0.1 2024-09-19 19:56:51,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=734480.0, ans=0.125 2024-09-19 19:56:54,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734480.0, ans=0.1 2024-09-19 19:56:59,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=734520.0, ans=15.0 2024-09-19 19:57:02,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=734520.0, ans=0.2 2024-09-19 19:57:11,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=734520.0, ans=0.1 2024-09-19 19:57:17,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=734560.0, ans=0.2 2024-09-19 19:57:24,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2024-09-19 19:57:27,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734560.0, ans=0.1 2024-09-19 19:57:30,517 INFO [train.py:1198] (1/2) Epoch 41, batch 2650, loss[loss=0.246, ctc_loss=0.1165, cr_loss=0.3481, attn_decoder_loss=0.2527, over 29239.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1122, cr_loss=0.3529, attn_decoder_loss=0.2395, over 5800871.87 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:57:33,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=734600.0, ans=0.0 2024-09-19 19:57:41,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.633e+01 9.136e+01 9.710e+01 1.315e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 19:57:44,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=734640.0, ans=0.09899494936611666 2024-09-19 19:58:05,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=734680.0, ans=0.125 2024-09-19 19:58:22,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.32 vs. limit=15.0 2024-09-19 19:58:45,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=734760.0, ans=0.0 2024-09-19 19:58:48,349 INFO [train.py:1198] (1/2) Epoch 41, batch 2700, loss[loss=0.2463, ctc_loss=0.1202, cr_loss=0.3591, attn_decoder_loss=0.2524, over 29511.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1126, cr_loss=0.354, attn_decoder_loss=0.2399, over 5796188.69 frames. ], batch size: 87, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:59:06,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=734840.0, ans=0.125 2024-09-19 19:59:24,610 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:59:24,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=734880.0, ans=0.0 2024-09-19 19:59:26,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=734880.0, ans=0.0 2024-09-19 19:59:29,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=734880.0, ans=0.125 2024-09-19 19:59:38,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=734920.0, ans=0.0 2024-09-19 20:00:02,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=735000.0, ans=0.125 2024-09-19 20:00:03,699 INFO [train.py:1198] (1/2) Epoch 41, batch 2750, loss[loss=0.2242, ctc_loss=0.1053, cr_loss=0.3463, attn_decoder_loss=0.2297, over 29528.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.112, cr_loss=0.3528, attn_decoder_loss=0.2388, over 5796029.37 frames. ], batch size: 75, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:00:14,116 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.495e+01 8.920e+01 9.727e+01 1.790e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 20:00:41,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=735080.0, ans=0.125 2024-09-19 20:00:48,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=735080.0, ans=0.0 2024-09-19 20:00:48,319 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:01:06,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=735160.0, ans=0.125 2024-09-19 20:01:10,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=735160.0, ans=0.125 2024-09-19 20:01:21,634 INFO [train.py:1198] (1/2) Epoch 41, batch 2800, loss[loss=0.2512, ctc_loss=0.1346, cr_loss=0.3698, attn_decoder_loss=0.2559, over 20050.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1119, cr_loss=0.3525, attn_decoder_loss=0.2388, over 5776678.42 frames. ], batch size: 209, lr: 2.68e-03, grad_scale: 32.0 2024-09-19 20:01:26,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=735200.0, ans=0.125 2024-09-19 20:01:29,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=735200.0, ans=0.2 2024-09-19 20:01:42,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=735240.0, ans=0.2 2024-09-19 20:01:42,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=735240.0, ans=0.0 2024-09-19 20:01:48,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=735240.0, ans=0.125 2024-09-19 20:01:55,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.31 vs. limit=10.0 2024-09-19 20:01:56,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=735280.0, ans=0.125 2024-09-19 20:01:58,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2024-09-19 20:02:33,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-09-19 20:02:38,651 INFO [train.py:1198] (1/2) Epoch 41, batch 2850, loss[loss=0.2257, ctc_loss=0.1097, cr_loss=0.3503, attn_decoder_loss=0.2308, over 29513.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1122, cr_loss=0.3527, attn_decoder_loss=0.2393, over 5762730.57 frames. ], batch size: 77, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:02:50,645 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 8.742e+01 9.309e+01 1.007e+02 1.847e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-19 20:02:51,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2024-09-19 20:02:52,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=735440.0, ans=0.125 2024-09-19 20:03:27,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=735520.0, ans=0.07 2024-09-19 20:03:42,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=735560.0, ans=0.2 2024-09-19 20:03:49,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=735560.0, ans=0.125 2024-09-19 20:03:54,113 INFO [train.py:1198] (1/2) Epoch 41, batch 2900, loss[loss=0.2313, ctc_loss=0.1121, cr_loss=0.3627, attn_decoder_loss=0.2365, over 29418.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.113, cr_loss=0.3549, attn_decoder_loss=0.2405, over 5788047.65 frames. ], batch size: 79, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:04:04,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.22 vs. limit=15.0 2024-09-19 20:04:26,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=735680.0, ans=0.0 2024-09-19 20:04:39,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=735680.0, ans=0.125 2024-09-19 20:04:46,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=735720.0, ans=0.125 2024-09-19 20:04:49,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=735720.0, ans=0.125 2024-09-19 20:05:12,045 INFO [train.py:1198] (1/2) Epoch 41, batch 2950, loss[loss=0.2269, ctc_loss=0.1073, cr_loss=0.3505, attn_decoder_loss=0.2325, over 29505.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1118, cr_loss=0.352, attn_decoder_loss=0.2391, over 5781544.82 frames. ], batch size: 75, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:05:21,630 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:05:24,173 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.387e+01 8.869e+01 9.638e+01 2.369e+02, threshold=1.774e+02, percent-clipped=2.0 2024-09-19 20:05:37,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=735840.0, ans=0.125 2024-09-19 20:05:51,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=735880.0, ans=0.125 2024-09-19 20:06:00,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735920.0, ans=0.1 2024-09-19 20:06:37,437 INFO [train.py:1198] (1/2) Epoch 41, batch 3000, loss[loss=0.238, ctc_loss=0.1172, cr_loss=0.3568, attn_decoder_loss=0.2435, over 29763.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.112, cr_loss=0.3523, attn_decoder_loss=0.239, over 5783179.33 frames. ], batch size: 81, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:06:37,438 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 20:06:55,723 INFO [train.py:1230] (1/2) Epoch 41, validation: loss=0.2123, ctc_loss=0.03697, cr_loss=6.466e-15, attn_decoder_loss=0.2318, over 944034.00 frames. 2024-09-19 20:06:55,723 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 20:07:09,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=736040.0, ans=0.125 2024-09-19 20:07:13,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-09-19 20:07:14,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=736040.0, ans=0.07 2024-09-19 20:07:24,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=736080.0, ans=0.2 2024-09-19 20:07:43,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=736120.0, ans=0.0 2024-09-19 20:07:43,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=736120.0, ans=0.125 2024-09-19 20:08:13,489 INFO [train.py:1198] (1/2) Epoch 41, batch 3050, loss[loss=0.2273, ctc_loss=0.1122, cr_loss=0.3343, attn_decoder_loss=0.2326, over 29522.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1128, cr_loss=0.3537, attn_decoder_loss=0.2396, over 5776720.75 frames. ], batch size: 76, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:08:25,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.668e+01 9.193e+01 9.788e+01 2.004e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-19 20:08:33,531 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:08:53,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=736280.0, ans=0.0 2024-09-19 20:09:28,914 INFO [train.py:1198] (1/2) Epoch 41, batch 3100, loss[loss=0.2392, ctc_loss=0.1062, cr_loss=0.341, attn_decoder_loss=0.2465, over 29191.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1125, cr_loss=0.3527, attn_decoder_loss=0.239, over 5776383.87 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:09:33,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=736400.0, ans=0.125 2024-09-19 20:09:41,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=736400.0, ans=0.125 2024-09-19 20:09:57,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=736480.0, ans=0.125 2024-09-19 20:09:59,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=736480.0, ans=0.05 2024-09-19 20:10:03,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2024-09-19 20:10:10,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=736480.0, ans=0.025 2024-09-19 20:10:13,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=736480.0, ans=0.0 2024-09-19 20:10:20,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.96 vs. limit=15.0 2024-09-19 20:10:20,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.10 vs. limit=15.0 2024-09-19 20:10:21,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-19 20:10:46,491 INFO [train.py:1198] (1/2) Epoch 41, batch 3150, loss[loss=0.232, ctc_loss=0.1046, cr_loss=0.3236, attn_decoder_loss=0.239, over 28904.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1123, cr_loss=0.3519, attn_decoder_loss=0.2389, over 5782329.63 frames. ], batch size: 104, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:10:46,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=736600.0, ans=0.125 2024-09-19 20:10:48,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=736600.0, ans=0.2 2024-09-19 20:10:54,355 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:10:58,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.553e+01 9.133e+01 9.719e+01 1.833e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 20:11:13,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=736640.0, ans=0.0 2024-09-19 20:11:51,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=736760.0, ans=0.2 2024-09-19 20:12:04,119 INFO [train.py:1198] (1/2) Epoch 41, batch 3200, loss[loss=0.2245, ctc_loss=0.1025, cr_loss=0.3304, attn_decoder_loss=0.2307, over 29404.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1117, cr_loss=0.3509, attn_decoder_loss=0.2384, over 5793201.48 frames. ], batch size: 79, lr: 2.68e-03, grad_scale: 32.0 2024-09-19 20:12:11,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=736800.0, ans=0.125 2024-09-19 20:13:11,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2024-09-19 20:13:19,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.64 vs. limit=10.0 2024-09-19 20:13:20,190 INFO [train.py:1198] (1/2) Epoch 41, batch 3250, loss[loss=0.2411, ctc_loss=0.1162, cr_loss=0.3628, attn_decoder_loss=0.2469, over 29711.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1117, cr_loss=0.3512, attn_decoder_loss=0.2387, over 5800306.17 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:13:23,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=737000.0, ans=0.025 2024-09-19 20:13:33,816 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.531e+01 9.147e+01 9.717e+01 1.259e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-19 20:13:40,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=737040.0, ans=0.07 2024-09-19 20:13:46,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=12.0 2024-09-19 20:13:51,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=737080.0, ans=0.125 2024-09-19 20:14:10,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=737120.0, ans=0.0 2024-09-19 20:14:13,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=737120.0, ans=0.125 2024-09-19 20:14:34,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=737160.0, ans=0.0 2024-09-19 20:14:37,460 INFO [train.py:1198] (1/2) Epoch 41, batch 3300, loss[loss=0.2361, ctc_loss=0.1063, cr_loss=0.327, attn_decoder_loss=0.2433, over 28744.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1106, cr_loss=0.3488, attn_decoder_loss=0.2373, over 5798729.28 frames. ], batch size: 112, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:14:52,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=737240.0, ans=0.07 2024-09-19 20:14:54,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=737240.0, ans=0.05 2024-09-19 20:15:25,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=737320.0, ans=10.0 2024-09-19 20:15:30,245 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:15:34,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=737320.0, ans=0.0 2024-09-19 20:15:40,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=737360.0, ans=0.0 2024-09-19 20:15:54,701 INFO [train.py:1198] (1/2) Epoch 41, batch 3350, loss[loss=0.2508, ctc_loss=0.1228, cr_loss=0.3723, attn_decoder_loss=0.2567, over 28777.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1114, cr_loss=0.3504, attn_decoder_loss=0.2382, over 5775164.33 frames. ], batch size: 104, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:16:08,343 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.656e+01 9.093e+01 9.789e+01 1.911e+02, threshold=1.819e+02, percent-clipped=2.0 2024-09-19 20:16:11,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=737440.0, ans=0.125 2024-09-19 20:16:17,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-09-19 20:16:22,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737440.0, ans=0.1 2024-09-19 20:16:23,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2024-09-19 20:17:02,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=737560.0, ans=0.0 2024-09-19 20:17:04,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=737560.0, ans=0.0 2024-09-19 20:17:10,491 INFO [train.py:1198] (1/2) Epoch 41, batch 3400, loss[loss=0.2078, ctc_loss=0.09444, cr_loss=0.3091, attn_decoder_loss=0.2135, over 29327.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1119, cr_loss=0.3511, attn_decoder_loss=0.2384, over 5767799.50 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:17:59,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=737720.0, ans=0.2 2024-09-19 20:17:59,980 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:18:04,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=737720.0, ans=0.1 2024-09-19 20:18:10,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=737720.0, ans=0.0 2024-09-19 20:18:28,095 INFO [train.py:1198] (1/2) Epoch 41, batch 3450, loss[loss=0.2475, ctc_loss=0.1145, cr_loss=0.3657, attn_decoder_loss=0.2541, over 28277.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1122, cr_loss=0.3522, attn_decoder_loss=0.2388, over 5776023.30 frames. ], batch size: 111, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:18:33,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=737800.0, ans=0.125 2024-09-19 20:18:41,844 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.497e+01 9.130e+01 9.574e+01 2.613e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-19 20:19:03,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=737880.0, ans=0.0 2024-09-19 20:19:07,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=737880.0, ans=0.5 2024-09-19 20:19:31,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=737960.0, ans=0.2 2024-09-19 20:19:42,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=738000.0, ans=0.0 2024-09-19 20:19:43,462 INFO [train.py:1198] (1/2) Epoch 41, batch 3500, loss[loss=0.2119, ctc_loss=0.09186, cr_loss=0.2994, attn_decoder_loss=0.2186, over 29320.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1117, cr_loss=0.3511, attn_decoder_loss=0.2383, over 5776607.81 frames. ], batch size: 71, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:20:03,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=738040.0, ans=0.1 2024-09-19 20:20:09,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=738040.0, ans=0.0 2024-09-19 20:20:17,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.19 vs. limit=15.0 2024-09-19 20:20:59,910 INFO [train.py:1198] (1/2) Epoch 41, batch 3550, loss[loss=0.2392, ctc_loss=0.1088, cr_loss=0.348, attn_decoder_loss=0.2459, over 29712.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1117, cr_loss=0.3514, attn_decoder_loss=0.2383, over 5783319.43 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:21:13,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=738240.0, ans=0.05 2024-09-19 20:21:14,689 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 8.523e+01 8.996e+01 9.507e+01 2.339e+02, threshold=1.799e+02, percent-clipped=2.0 2024-09-19 20:21:24,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-09-19 20:21:32,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=738280.0, ans=0.0 2024-09-19 20:21:48,126 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-09-19 20:22:14,209 INFO [train.py:1198] (1/2) Epoch 41, batch 3600, loss[loss=0.2142, ctc_loss=0.09365, cr_loss=0.3166, attn_decoder_loss=0.2206, over 29491.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1116, cr_loss=0.3512, attn_decoder_loss=0.2385, over 5791759.11 frames. ], batch size: 77, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:22:17,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=738400.0, ans=0.0 2024-09-19 20:22:34,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=738440.0, ans=0.1 2024-09-19 20:22:38,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=738440.0, ans=0.0 2024-09-19 20:22:39,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=738440.0, ans=0.0 2024-09-19 20:22:41,227 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:22:52,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=738480.0, ans=15.0 2024-09-19 20:23:02,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738520.0, ans=0.1 2024-09-19 20:23:30,388 INFO [train.py:1198] (1/2) Epoch 41, batch 3650, loss[loss=0.2478, ctc_loss=0.1279, cr_loss=0.3886, attn_decoder_loss=0.2525, over 29487.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1112, cr_loss=0.3499, attn_decoder_loss=0.238, over 5794556.33 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:23:38,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=738600.0, ans=0.0 2024-09-19 20:23:46,677 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.451e+01 9.065e+01 9.454e+01 1.125e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-19 20:23:47,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=738640.0, ans=15.0 2024-09-19 20:23:54,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=738640.0, ans=0.0 2024-09-19 20:24:02,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=738680.0, ans=0.125 2024-09-19 20:24:09,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=738680.0, ans=0.125 2024-09-19 20:24:16,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=738720.0, ans=0.025 2024-09-19 20:24:28,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738760.0, ans=0.1 2024-09-19 20:24:36,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=12.0 2024-09-19 20:24:44,921 INFO [train.py:1198] (1/2) Epoch 41, batch 3700, loss[loss=0.2412, ctc_loss=0.1205, cr_loss=0.38, attn_decoder_loss=0.2461, over 29715.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1115, cr_loss=0.3508, attn_decoder_loss=0.2383, over 5804600.84 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:25:05,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=738840.0, ans=0.125 2024-09-19 20:25:41,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2024-09-19 20:25:47,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=738960.0, ans=0.2 2024-09-19 20:25:54,885 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:25:58,931 INFO [train.py:1198] (1/2) Epoch 41, batch 3750, loss[loss=0.2155, ctc_loss=0.1015, cr_loss=0.3315, attn_decoder_loss=0.2208, over 29321.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1117, cr_loss=0.3516, attn_decoder_loss=0.2384, over 5808426.46 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:26:17,098 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.549e+01 9.026e+01 9.637e+01 1.696e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 20:26:21,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=739040.0, ans=0.125 2024-09-19 20:26:24,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=739040.0, ans=0.125 2024-09-19 20:26:31,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=739080.0, ans=0.125 2024-09-19 20:26:37,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=739080.0, ans=0.125 2024-09-19 20:27:02,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=739160.0, ans=0.0 2024-09-19 20:27:14,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.99 vs. limit=10.0 2024-09-19 20:27:15,513 INFO [train.py:1198] (1/2) Epoch 41, batch 3800, loss[loss=0.2445, ctc_loss=0.1096, cr_loss=0.3461, attn_decoder_loss=0.2518, over 29641.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1112, cr_loss=0.3498, attn_decoder_loss=0.2379, over 5799075.17 frames. ], batch size: 86, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:27:15,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=739200.0, ans=0.125 2024-09-19 20:27:17,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=739200.0, ans=0.025 2024-09-19 20:27:57,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=739280.0, ans=0.125 2024-09-19 20:28:14,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=739360.0, ans=0.0 2024-09-19 20:28:23,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=739360.0, ans=0.125 2024-09-19 20:28:30,234 INFO [train.py:1198] (1/2) Epoch 41, batch 3850, loss[loss=0.2388, ctc_loss=0.1086, cr_loss=0.3471, attn_decoder_loss=0.2456, over 29244.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1113, cr_loss=0.3504, attn_decoder_loss=0.2379, over 5812858.98 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:28:33,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=739400.0, ans=0.125 2024-09-19 20:28:47,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2024-09-19 20:28:47,850 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.446e+01 9.109e+01 9.536e+01 1.999e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 20:28:54,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=739440.0, ans=0.125 2024-09-19 20:29:09,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=739480.0, ans=0.125 2024-09-19 20:29:17,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=15.0 2024-09-19 20:29:27,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-19 20:29:37,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739560.0, ans=0.1 2024-09-19 20:29:40,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=739560.0, ans=0.125 2024-09-19 20:29:46,230 INFO [train.py:1198] (1/2) Epoch 41, batch 3900, loss[loss=0.2326, ctc_loss=0.1078, cr_loss=0.3465, attn_decoder_loss=0.2388, over 29635.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1118, cr_loss=0.3516, attn_decoder_loss=0.2385, over 5817224.06 frames. ], batch size: 86, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:29:55,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=739600.0, ans=0.125 2024-09-19 20:30:17,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.88 vs. limit=12.0 2024-09-19 20:30:43,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=739760.0, ans=0.2 2024-09-19 20:30:59,903 INFO [train.py:1198] (1/2) Epoch 41, batch 3950, loss[loss=0.2468, ctc_loss=0.1278, cr_loss=0.3811, attn_decoder_loss=0.2515, over 29500.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1116, cr_loss=0.3513, attn_decoder_loss=0.2384, over 5836341.28 frames. ], batch size: 97, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:31:02,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2024-09-19 20:31:09,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=739800.0, ans=0.1 2024-09-19 20:31:16,082 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.615e+01 9.061e+01 9.543e+01 2.103e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 20:31:42,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=739880.0, ans=0.125 2024-09-19 20:31:44,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=739920.0, ans=0.2 2024-09-19 20:31:54,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=739920.0, ans=0.2 2024-09-19 20:32:07,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739960.0, ans=0.1 2024-09-19 20:32:08,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=739960.0, ans=0.125 2024-09-19 20:32:15,352 INFO [train.py:1198] (1/2) Epoch 41, batch 4000, loss[loss=0.2079, ctc_loss=0.08718, cr_loss=0.299, attn_decoder_loss=0.2146, over 29514.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1114, cr_loss=0.351, attn_decoder_loss=0.2383, over 5813751.78 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 16.0 2024-09-19 20:32:17,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=740000.0, ans=0.0 2024-09-19 20:32:24,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=740000.0, ans=0.125 2024-09-19 20:32:36,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=740040.0, ans=0.0 2024-09-19 20:32:40,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=740040.0, ans=0.1 2024-09-19 20:32:40,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=740040.0, ans=0.2 2024-09-19 20:32:48,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=740080.0, ans=0.05 2024-09-19 20:33:04,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=740120.0, ans=0.0 2024-09-19 20:33:07,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=740120.0, ans=0.125 2024-09-19 20:33:09,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=740120.0, ans=0.0 2024-09-19 20:33:30,565 INFO [train.py:1198] (1/2) Epoch 41, batch 4050, loss[loss=0.243, ctc_loss=0.1266, cr_loss=0.3548, attn_decoder_loss=0.2481, over 20199.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1112, cr_loss=0.3505, attn_decoder_loss=0.238, over 5797561.11 frames. ], batch size: 209, lr: 2.67e-03, grad_scale: 16.0 2024-09-19 20:33:30,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=740200.0, ans=0.05 2024-09-19 20:33:34,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-09-19 20:33:39,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=740200.0, ans=0.125 2024-09-19 20:33:48,069 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.566e+01 9.117e+01 9.789e+01 2.862e+02, threshold=1.823e+02, percent-clipped=4.0 2024-09-19 20:33:51,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740240.0, ans=0.1 2024-09-19 20:33:54,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=740240.0, ans=0.2 2024-09-19 20:33:54,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=22.5 2024-09-19 20:33:57,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=740240.0, ans=0.125 2024-09-19 20:34:05,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=740280.0, ans=0.125 2024-09-19 20:34:15,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=740320.0, ans=0.2 2024-09-19 20:34:17,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=740320.0, ans=0.125 2024-09-19 20:34:32,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=740360.0, ans=0.125 2024-09-19 20:34:43,681 INFO [train.py:1198] (1/2) Epoch 41, batch 4100, loss[loss=0.247, ctc_loss=0.1218, cr_loss=0.3772, attn_decoder_loss=0.2525, over 29484.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1116, cr_loss=0.3513, attn_decoder_loss=0.2382, over 5793147.05 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:35:03,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=740440.0, ans=0.2 2024-09-19 20:35:08,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740440.0, ans=0.1 2024-09-19 20:35:10,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=740440.0, ans=0.125 2024-09-19 20:35:13,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=740480.0, ans=0.0 2024-09-19 20:35:20,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=740480.0, ans=0.125 2024-09-19 20:35:32,669 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:35:37,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=740520.0, ans=0.025 2024-09-19 20:35:41,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=740560.0, ans=0.05 2024-09-19 20:35:48,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740560.0, ans=0.1 2024-09-19 20:35:57,570 INFO [train.py:1198] (1/2) Epoch 41, batch 4150, loss[loss=0.231, ctc_loss=0.1144, cr_loss=0.369, attn_decoder_loss=0.2357, over 29519.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1115, cr_loss=0.3512, attn_decoder_loss=0.2381, over 5798935.43 frames. ], batch size: 77, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:36:02,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=740600.0, ans=0.04949747468305833 2024-09-19 20:36:07,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=740600.0, ans=0.0 2024-09-19 20:36:16,236 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 8.604e+01 9.031e+01 9.625e+01 1.845e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-19 20:36:16,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740640.0, ans=0.1 2024-09-19 20:36:17,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=740640.0, ans=0.2 2024-09-19 20:36:37,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=12.0 2024-09-19 20:36:50,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=740720.0, ans=0.125 2024-09-19 20:37:12,305 INFO [train.py:1198] (1/2) Epoch 41, batch 4200, loss[loss=0.2481, ctc_loss=0.1284, cr_loss=0.387, attn_decoder_loss=0.2528, over 29512.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1116, cr_loss=0.3513, attn_decoder_loss=0.2383, over 5800904.55 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:37:12,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740800.0, ans=0.125 2024-09-19 20:37:15,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.65 vs. limit=10.0 2024-09-19 20:37:18,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=740800.0, ans=0.125 2024-09-19 20:37:23,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=740800.0, ans=0.0 2024-09-19 20:37:46,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=740880.0, ans=0.07 2024-09-19 20:37:54,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740880.0, ans=0.1 2024-09-19 20:38:00,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=740920.0, ans=0.1 2024-09-19 20:38:22,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=740960.0, ans=0.1 2024-09-19 20:38:26,687 INFO [train.py:1198] (1/2) Epoch 41, batch 4250, loss[loss=0.2195, ctc_loss=0.09964, cr_loss=0.3345, attn_decoder_loss=0.2254, over 29521.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1114, cr_loss=0.3511, attn_decoder_loss=0.2387, over 5806773.52 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:38:36,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-09-19 20:38:42,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=741040.0, ans=0.2 2024-09-19 20:38:44,064 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 8.665e+01 9.196e+01 9.683e+01 5.015e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-19 20:38:46,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.57 vs. limit=15.0 2024-09-19 20:38:59,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=741080.0, ans=0.125 2024-09-19 20:39:40,136 INFO [train.py:1198] (1/2) Epoch 41, batch 4300, loss[loss=0.2391, ctc_loss=0.113, cr_loss=0.3482, attn_decoder_loss=0.2453, over 29560.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1114, cr_loss=0.3509, attn_decoder_loss=0.2389, over 5796639.75 frames. ], batch size: 87, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:40:00,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=741240.0, ans=0.2 2024-09-19 20:40:00,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=741240.0, ans=0.0 2024-09-19 20:40:06,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=741240.0, ans=0.025 2024-09-19 20:40:08,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2024-09-19 20:40:40,299 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:40:51,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=741360.0, ans=0.125 2024-09-19 20:40:55,593 INFO [train.py:1198] (1/2) Epoch 41, batch 4350, loss[loss=0.238, ctc_loss=0.1196, cr_loss=0.3773, attn_decoder_loss=0.2428, over 29478.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.114, cr_loss=0.3558, attn_decoder_loss=0.2419, over 5799069.64 frames. ], batch size: 97, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:40:56,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.11 vs. limit=22.5 2024-09-19 20:41:13,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.893e+01 9.255e+01 9.747e+01 1.701e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-19 20:41:24,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=8.0 2024-09-19 20:41:37,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=741520.0, ans=0.025 2024-09-19 20:41:37,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=741520.0, ans=0.0 2024-09-19 20:41:38,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2024-09-19 20:42:01,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=741560.0, ans=0.125 2024-09-19 20:42:01,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-19 20:42:02,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=741560.0, ans=0.125 2024-09-19 20:42:03,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2024-09-19 20:42:08,700 INFO [train.py:1198] (1/2) Epoch 41, batch 4400, loss[loss=0.2409, ctc_loss=0.1157, cr_loss=0.3485, attn_decoder_loss=0.247, over 27114.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.115, cr_loss=0.3581, attn_decoder_loss=0.2438, over 5767679.77 frames. ], batch size: 124, lr: 2.67e-03, grad_scale: 16.0 2024-09-19 20:42:13,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=741600.0, ans=0.125 2024-09-19 20:42:17,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741600.0, ans=0.1 2024-09-19 20:42:26,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741640.0, ans=0.1 2024-09-19 20:42:35,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=741640.0, ans=0.09899494936611666 2024-09-19 20:42:40,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.90 vs. limit=10.0 2024-09-19 20:42:56,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=741720.0, ans=0.125 2024-09-19 20:43:01,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=741720.0, ans=0.0 2024-09-19 20:43:03,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=741720.0, ans=0.025 2024-09-19 20:43:05,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=741720.0, ans=0.0 2024-09-19 20:43:23,296 INFO [train.py:1198] (1/2) Epoch 41, batch 4450, loss[loss=0.2529, ctc_loss=0.1365, cr_loss=0.3755, attn_decoder_loss=0.2575, over 20941.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1188, cr_loss=0.3636, attn_decoder_loss=0.246, over 5577406.70 frames. ], batch size: 209, lr: 2.67e-03, grad_scale: 16.0 2024-09-19 20:43:25,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=741800.0, ans=0.125 2024-09-19 20:43:34,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=741800.0, ans=0.125 2024-09-19 20:43:41,190 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.090e+01 9.304e+01 9.971e+01 1.121e+02 2.265e+02, threshold=1.994e+02, percent-clipped=2.0 2024-09-19 20:43:53,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=741880.0, ans=0.0 2024-09-19 20:44:20,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=741920.0, ans=0.0 2024-09-19 20:44:38,332 INFO [train.py:1198] (1/2) Epoch 41, batch 4500, loss[loss=0.2469, ctc_loss=0.1274, cr_loss=0.3532, attn_decoder_loss=0.2523, over 20509.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1218, cr_loss=0.3664, attn_decoder_loss=0.2479, over 5237844.78 frames. ], batch size: 209, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:44:57,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=742040.0, ans=0.125 2024-09-19 20:45:02,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=742040.0, ans=0.125 2024-09-19 20:45:08,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=742080.0, ans=0.2 2024-09-19 20:46:06,176 INFO [train.py:1198] (1/2) Epoch 42, batch 0, loss[loss=0.2088, ctc_loss=0.08769, cr_loss=0.2949, attn_decoder_loss=0.2157, over 29626.00 frames. ], tot_loss[loss=0.2088, ctc_loss=0.08769, cr_loss=0.2949, attn_decoder_loss=0.2157, over 29626.00 frames. ], batch size: 73, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:46:06,177 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 20:46:24,582 INFO [train.py:1230] (1/2) Epoch 42, validation: loss=0.2127, ctc_loss=0.03579, cr_loss=6.428e-15, attn_decoder_loss=0.2324, over 944034.00 frames. 2024-09-19 20:46:24,582 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 20:46:35,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=742100.0, ans=0.125 2024-09-19 20:46:36,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=742100.0, ans=0.0 2024-09-19 20:46:41,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2024-09-19 20:46:44,488 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:46:48,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=742140.0, ans=0.2 2024-09-19 20:47:14,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=742220.0, ans=0.0 2024-09-19 20:47:21,852 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 9.381e+01 1.084e+02 1.178e+02 1.554e+02, threshold=2.167e+02, percent-clipped=0.0 2024-09-19 20:47:42,178 INFO [train.py:1198] (1/2) Epoch 42, batch 50, loss[loss=0.2068, ctc_loss=0.09229, cr_loss=0.3075, attn_decoder_loss=0.2126, over 29437.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1129, cr_loss=0.3539, attn_decoder_loss=0.2386, over 1268336.09 frames. ], batch size: 70, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:47:44,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742300.0, ans=0.1 2024-09-19 20:48:26,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.58 vs. limit=10.0 2024-09-19 20:48:59,796 INFO [train.py:1198] (1/2) Epoch 42, batch 100, loss[loss=0.212, ctc_loss=0.09169, cr_loss=0.3002, attn_decoder_loss=0.2187, over 29517.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1149, cr_loss=0.3588, attn_decoder_loss=0.241, over 2253027.65 frames. ], batch size: 76, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:49:00,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=742500.0, ans=0.125 2024-09-19 20:49:05,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=742500.0, ans=0.125 2024-09-19 20:49:49,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=742620.0, ans=0.125 2024-09-19 20:49:56,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 8.687e+01 8.987e+01 9.639e+01 1.254e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-19 20:50:05,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=742660.0, ans=0.0 2024-09-19 20:50:14,291 INFO [train.py:1198] (1/2) Epoch 42, batch 150, loss[loss=0.2106, ctc_loss=0.09843, cr_loss=0.3283, attn_decoder_loss=0.2158, over 29422.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1129, cr_loss=0.3546, attn_decoder_loss=0.2387, over 3047416.37 frames. ], batch size: 70, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:50:18,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=742700.0, ans=0.025 2024-09-19 20:50:21,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-19 20:50:34,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742740.0, ans=0.1 2024-09-19 20:50:43,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742780.0, ans=0.1 2024-09-19 20:50:55,018 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:51:00,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2024-09-19 20:51:01,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=742820.0, ans=0.125 2024-09-19 20:51:05,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=742820.0, ans=0.125 2024-09-19 20:51:07,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.66 vs. limit=10.0 2024-09-19 20:51:17,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=742860.0, ans=0.0 2024-09-19 20:51:31,400 INFO [train.py:1198] (1/2) Epoch 42, batch 200, loss[loss=0.2363, ctc_loss=0.1174, cr_loss=0.3579, attn_decoder_loss=0.2415, over 27418.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1129, cr_loss=0.3549, attn_decoder_loss=0.2383, over 3659427.52 frames. ], batch size: 125, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:51:56,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-09-19 20:52:20,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743020.0, ans=0.1 2024-09-19 20:52:27,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.26 vs. limit=15.0 2024-09-19 20:52:31,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 8.542e+01 9.078e+01 9.443e+01 1.255e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 20:52:47,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-09-19 20:52:49,279 INFO [train.py:1198] (1/2) Epoch 42, batch 250, loss[loss=0.249, ctc_loss=0.1191, cr_loss=0.3578, attn_decoder_loss=0.2555, over 29340.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1122, cr_loss=0.3523, attn_decoder_loss=0.2381, over 4141075.45 frames. ], batch size: 100, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:53:07,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=743140.0, ans=0.0 2024-09-19 20:53:07,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=743140.0, ans=10.0 2024-09-19 20:53:21,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743180.0, ans=0.1 2024-09-19 20:53:29,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2024-09-19 20:53:39,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=743220.0, ans=0.125 2024-09-19 20:53:45,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=743220.0, ans=0.125 2024-09-19 20:54:02,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743260.0, ans=0.1 2024-09-19 20:54:02,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.94 vs. limit=12.0 2024-09-19 20:54:04,815 INFO [train.py:1198] (1/2) Epoch 42, batch 300, loss[loss=0.2539, ctc_loss=0.1263, cr_loss=0.391, attn_decoder_loss=0.2594, over 29514.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1117, cr_loss=0.3512, attn_decoder_loss=0.238, over 4510559.09 frames. ], batch size: 92, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:54:05,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=743300.0, ans=0.0 2024-09-19 20:54:13,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.27 vs. limit=6.0 2024-09-19 20:54:21,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.84 vs. limit=10.0 2024-09-19 20:54:25,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.31 vs. limit=15.0 2024-09-19 20:54:26,519 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:54:35,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743380.0, ans=0.125 2024-09-19 20:54:36,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=743380.0, ans=0.0 2024-09-19 20:54:39,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=743380.0, ans=0.125 2024-09-19 20:54:49,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=743420.0, ans=15.0 2024-09-19 20:54:58,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=743420.0, ans=0.95 2024-09-19 20:55:03,821 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 8.625e+01 9.047e+01 9.646e+01 1.583e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 20:55:08,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=743460.0, ans=0.2 2024-09-19 20:55:10,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.52 vs. limit=15.0 2024-09-19 20:55:22,744 INFO [train.py:1198] (1/2) Epoch 42, batch 350, loss[loss=0.2177, ctc_loss=0.102, cr_loss=0.3325, attn_decoder_loss=0.2232, over 29319.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1121, cr_loss=0.3525, attn_decoder_loss=0.2386, over 4795341.91 frames. ], batch size: 71, lr: 2.64e-03, grad_scale: 8.0 2024-09-19 20:55:45,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=743540.0, ans=0.125 2024-09-19 20:56:20,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=743620.0, ans=0.2 2024-09-19 20:56:24,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=743660.0, ans=0.125 2024-09-19 20:56:25,571 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:56:40,161 INFO [train.py:1198] (1/2) Epoch 42, batch 400, loss[loss=0.2368, ctc_loss=0.1161, cr_loss=0.3633, attn_decoder_loss=0.2422, over 29722.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1117, cr_loss=0.3523, attn_decoder_loss=0.2384, over 5025086.51 frames. ], batch size: 82, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 20:56:43,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2024-09-19 20:57:03,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=743740.0, ans=0.5 2024-09-19 20:57:03,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=743740.0, ans=0.125 2024-09-19 20:57:29,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743820.0, ans=0.1 2024-09-19 20:57:39,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.837e+01 8.484e+01 8.956e+01 9.498e+01 1.659e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 20:57:56,181 INFO [train.py:1198] (1/2) Epoch 42, batch 450, loss[loss=0.244, ctc_loss=0.1131, cr_loss=0.3522, attn_decoder_loss=0.2507, over 29691.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1118, cr_loss=0.3522, attn_decoder_loss=0.2386, over 5189019.73 frames. ], batch size: 83, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 20:57:57,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743900.0, ans=0.1 2024-09-19 20:58:11,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=743940.0, ans=0.2 2024-09-19 20:58:20,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=743940.0, ans=0.05 2024-09-19 20:58:39,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2024-09-19 20:58:46,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=744020.0, ans=0.125 2024-09-19 20:59:10,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=744100.0, ans=0.125 2024-09-19 20:59:12,127 INFO [train.py:1198] (1/2) Epoch 42, batch 500, loss[loss=0.254, ctc_loss=0.1316, cr_loss=0.3961, attn_decoder_loss=0.2588, over 29432.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1112, cr_loss=0.3514, attn_decoder_loss=0.2379, over 5332192.35 frames. ], batch size: 94, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 20:59:38,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744140.0, ans=0.1 2024-09-19 20:59:43,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=744180.0, ans=0.125 2024-09-19 20:59:51,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=744180.0, ans=0.125 2024-09-19 21:00:10,291 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:00:13,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=744220.0, ans=22.5 2024-09-19 21:00:15,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.359e+01 8.854e+01 9.452e+01 4.385e+02, threshold=1.771e+02, percent-clipped=2.0 2024-09-19 21:00:32,313 INFO [train.py:1198] (1/2) Epoch 42, batch 550, loss[loss=0.2484, ctc_loss=0.1155, cr_loss=0.3618, attn_decoder_loss=0.2551, over 28901.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1112, cr_loss=0.3514, attn_decoder_loss=0.2382, over 5425083.12 frames. ], batch size: 104, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:00:32,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=744300.0, ans=0.0 2024-09-19 21:00:46,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=744340.0, ans=0.0 2024-09-19 21:01:08,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=744380.0, ans=0.1 2024-09-19 21:01:13,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=744380.0, ans=0.2 2024-09-19 21:01:13,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=744380.0, ans=0.2 2024-09-19 21:01:25,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=744420.0, ans=0.125 2024-09-19 21:01:36,039 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:01:47,812 INFO [train.py:1198] (1/2) Epoch 42, batch 600, loss[loss=0.2497, ctc_loss=0.1261, cr_loss=0.378, attn_decoder_loss=0.255, over 29278.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1113, cr_loss=0.3513, attn_decoder_loss=0.2383, over 5509644.60 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:01:49,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=744500.0, ans=0.125 2024-09-19 21:02:03,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=744540.0, ans=0.125 2024-09-19 21:02:03,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=744540.0, ans=0.125 2024-09-19 21:02:18,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=744580.0, ans=0.2 2024-09-19 21:02:19,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=744580.0, ans=0.0 2024-09-19 21:02:22,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=744580.0, ans=0.125 2024-09-19 21:02:47,680 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.400e+01 8.795e+01 9.486e+01 1.602e+02, threshold=1.759e+02, percent-clipped=0.0 2024-09-19 21:02:59,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.14 vs. limit=10.0 2024-09-19 21:03:01,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=744700.0, ans=0.0 2024-09-19 21:03:02,680 INFO [train.py:1198] (1/2) Epoch 42, batch 650, loss[loss=0.2397, ctc_loss=0.1069, cr_loss=0.3559, attn_decoder_loss=0.2466, over 29771.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1104, cr_loss=0.3493, attn_decoder_loss=0.2376, over 5586716.15 frames. ], batch size: 81, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:03:24,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=744740.0, ans=0.125 2024-09-19 21:03:27,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=744740.0, ans=0.0 2024-09-19 21:03:30,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=744740.0, ans=0.125 2024-09-19 21:03:40,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-09-19 21:03:48,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2024-09-19 21:03:49,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=744820.0, ans=0.125 2024-09-19 21:03:53,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=744820.0, ans=0.2 2024-09-19 21:03:55,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2024-09-19 21:03:58,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=744820.0, ans=0.2 2024-09-19 21:04:03,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=744820.0, ans=0.2 2024-09-19 21:04:06,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=744860.0, ans=0.2 2024-09-19 21:04:20,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=744860.0, ans=0.0 2024-09-19 21:04:21,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=744900.0, ans=0.0 2024-09-19 21:04:22,725 INFO [train.py:1198] (1/2) Epoch 42, batch 700, loss[loss=0.222, ctc_loss=0.1015, cr_loss=0.3249, attn_decoder_loss=0.2282, over 29534.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1109, cr_loss=0.3501, attn_decoder_loss=0.2382, over 5637869.57 frames. ], batch size: 76, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:04:33,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=744900.0, ans=0.125 2024-09-19 21:04:37,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744940.0, ans=0.1 2024-09-19 21:04:39,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=744940.0, ans=0.125 2024-09-19 21:04:53,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=744980.0, ans=0.125 2024-09-19 21:05:15,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=745020.0, ans=0.125 2024-09-19 21:05:17,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.59 vs. limit=22.5 2024-09-19 21:05:23,212 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.486e+01 9.011e+01 9.700e+01 3.654e+02, threshold=1.802e+02, percent-clipped=4.0 2024-09-19 21:05:37,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=745100.0, ans=0.0 2024-09-19 21:05:38,321 INFO [train.py:1198] (1/2) Epoch 42, batch 750, loss[loss=0.2355, ctc_loss=0.1143, cr_loss=0.3571, attn_decoder_loss=0.241, over 29703.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1111, cr_loss=0.3504, attn_decoder_loss=0.2381, over 5675068.91 frames. ], batch size: 82, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:06:49,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=745260.0, ans=10.0 2024-09-19 21:06:53,479 INFO [train.py:1198] (1/2) Epoch 42, batch 800, loss[loss=0.2178, ctc_loss=0.1048, cr_loss=0.3255, attn_decoder_loss=0.2231, over 29614.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1108, cr_loss=0.3497, attn_decoder_loss=0.2379, over 5707748.31 frames. ], batch size: 73, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:06:55,430 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:07:12,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.07 vs. limit=22.5 2024-09-19 21:07:50,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=745420.0, ans=0.2 2024-09-19 21:07:50,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=745420.0, ans=0.2 2024-09-19 21:07:59,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.594e+01 9.081e+01 9.628e+01 1.457e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 21:07:59,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=745460.0, ans=0.0 2024-09-19 21:08:12,816 INFO [train.py:1198] (1/2) Epoch 42, batch 850, loss[loss=0.2429, ctc_loss=0.1188, cr_loss=0.3602, attn_decoder_loss=0.2487, over 29696.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1108, cr_loss=0.3496, attn_decoder_loss=0.2377, over 5736496.83 frames. ], batch size: 89, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:08:36,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=745540.0, ans=0.0 2024-09-19 21:08:37,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=745540.0, ans=0.125 2024-09-19 21:08:38,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745540.0, ans=0.1 2024-09-19 21:08:43,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=745580.0, ans=0.125 2024-09-19 21:08:46,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-09-19 21:08:54,035 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:08:57,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=745620.0, ans=0.125 2024-09-19 21:09:20,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=745660.0, ans=0.0 2024-09-19 21:09:20,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=745660.0, ans=0.125 2024-09-19 21:09:27,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=745700.0, ans=0.025 2024-09-19 21:09:28,709 INFO [train.py:1198] (1/2) Epoch 42, batch 900, loss[loss=0.2148, ctc_loss=0.1008, cr_loss=0.3316, attn_decoder_loss=0.22, over 29641.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1111, cr_loss=0.3501, attn_decoder_loss=0.238, over 5739805.95 frames. ], batch size: 73, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:09:31,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=745700.0, ans=0.125 2024-09-19 21:09:43,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2024-09-19 21:09:47,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=745740.0, ans=0.0 2024-09-19 21:09:47,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=745740.0, ans=0.0 2024-09-19 21:09:59,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.77 vs. limit=12.0 2024-09-19 21:10:03,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=745780.0, ans=0.07 2024-09-19 21:10:08,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=745780.0, ans=0.0 2024-09-19 21:10:09,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=745780.0, ans=0.0 2024-09-19 21:10:20,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=745820.0, ans=0.0 2024-09-19 21:10:23,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=745820.0, ans=0.125 2024-09-19 21:10:30,382 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.573e+01 9.060e+01 9.874e+01 1.680e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-19 21:10:30,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=745860.0, ans=0.0 2024-09-19 21:10:30,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=745860.0, ans=0.025 2024-09-19 21:10:43,718 INFO [train.py:1198] (1/2) Epoch 42, batch 950, loss[loss=0.2201, ctc_loss=0.09788, cr_loss=0.3177, attn_decoder_loss=0.2267, over 29499.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1109, cr_loss=0.3501, attn_decoder_loss=0.238, over 5741299.25 frames. ], batch size: 74, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:12:03,076 INFO [train.py:1198] (1/2) Epoch 42, batch 1000, loss[loss=0.2321, ctc_loss=0.1045, cr_loss=0.3407, attn_decoder_loss=0.2387, over 29487.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1118, cr_loss=0.3514, attn_decoder_loss=0.2388, over 5734697.71 frames. ], batch size: 77, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:12:09,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=746100.0, ans=0.0 2024-09-19 21:12:20,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=746140.0, ans=0.2 2024-09-19 21:12:59,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=746220.0, ans=0.125 2024-09-19 21:13:05,355 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.540e+01 9.060e+01 9.719e+01 2.106e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 21:13:05,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746260.0, ans=0.1 2024-09-19 21:13:05,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=746260.0, ans=0.0 2024-09-19 21:13:17,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=746300.0, ans=0.125 2024-09-19 21:13:18,996 INFO [train.py:1198] (1/2) Epoch 42, batch 1050, loss[loss=0.24, ctc_loss=0.1157, cr_loss=0.3664, attn_decoder_loss=0.2457, over 29673.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1118, cr_loss=0.3516, attn_decoder_loss=0.2384, over 5742876.69 frames. ], batch size: 85, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:13:25,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2024-09-19 21:13:26,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=746300.0, ans=0.0 2024-09-19 21:13:26,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=746300.0, ans=0.0 2024-09-19 21:13:32,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=746340.0, ans=0.125 2024-09-19 21:13:34,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=746340.0, ans=0.0 2024-09-19 21:14:04,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2024-09-19 21:14:10,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-09-19 21:14:33,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746500.0, ans=0.1 2024-09-19 21:14:35,084 INFO [train.py:1198] (1/2) Epoch 42, batch 1100, loss[loss=0.2317, ctc_loss=0.1107, cr_loss=0.3457, attn_decoder_loss=0.2374, over 29442.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1116, cr_loss=0.3511, attn_decoder_loss=0.2382, over 5755304.48 frames. ], batch size: 78, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:14:39,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=746500.0, ans=0.125 2024-09-19 21:15:00,083 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:15:07,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=746580.0, ans=0.125 2024-09-19 21:15:22,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=746620.0, ans=0.0 2024-09-19 21:15:31,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=746620.0, ans=0.0 2024-09-19 21:15:39,239 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.586e+01 9.042e+01 9.812e+01 2.400e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 21:15:41,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=746660.0, ans=0.0 2024-09-19 21:15:46,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-09-19 21:15:50,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=746660.0, ans=0.1 2024-09-19 21:15:55,115 INFO [train.py:1198] (1/2) Epoch 42, batch 1150, loss[loss=0.2382, ctc_loss=0.114, cr_loss=0.35, attn_decoder_loss=0.2442, over 29434.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1118, cr_loss=0.3519, attn_decoder_loss=0.2383, over 5753998.46 frames. ], batch size: 78, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:15:55,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746700.0, ans=0.1 2024-09-19 21:16:10,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=746740.0, ans=0.2 2024-09-19 21:16:17,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.89 vs. limit=22.5 2024-09-19 21:16:21,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2024-09-19 21:16:30,335 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:16:31,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=746780.0, ans=0.0 2024-09-19 21:16:38,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2024-09-19 21:16:51,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=746820.0, ans=0.0 2024-09-19 21:17:03,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=746860.0, ans=0.125 2024-09-19 21:17:06,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=746860.0, ans=0.0 2024-09-19 21:17:10,793 INFO [train.py:1198] (1/2) Epoch 42, batch 1200, loss[loss=0.2401, ctc_loss=0.1182, cr_loss=0.3722, attn_decoder_loss=0.2454, over 29673.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1122, cr_loss=0.3527, attn_decoder_loss=0.2388, over 5746445.41 frames. ], batch size: 85, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:17:11,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=746900.0, ans=0.125 2024-09-19 21:17:23,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=746900.0, ans=0.95 2024-09-19 21:17:24,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=746940.0, ans=0.125 2024-09-19 21:17:33,966 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:17:52,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=746980.0, ans=0.07 2024-09-19 21:17:56,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=747020.0, ans=0.125 2024-09-19 21:17:58,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=747020.0, ans=0.0 2024-09-19 21:17:59,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=747020.0, ans=0.0 2024-09-19 21:18:11,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=747060.0, ans=0.95 2024-09-19 21:18:13,018 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.675e+01 9.072e+01 9.806e+01 1.661e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-19 21:18:26,676 INFO [train.py:1198] (1/2) Epoch 42, batch 1250, loss[loss=0.2518, ctc_loss=0.1338, cr_loss=0.4065, attn_decoder_loss=0.2559, over 29503.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1125, cr_loss=0.3539, attn_decoder_loss=0.2394, over 5774220.85 frames. ], batch size: 92, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:18:47,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=747140.0, ans=0.0 2024-09-19 21:18:47,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-09-19 21:18:53,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=747140.0, ans=0.125 2024-09-19 21:19:04,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.73 vs. limit=15.0 2024-09-19 21:19:25,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-09-19 21:19:35,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-19 21:19:47,433 INFO [train.py:1198] (1/2) Epoch 42, batch 1300, loss[loss=0.2402, ctc_loss=0.1166, cr_loss=0.355, attn_decoder_loss=0.246, over 28282.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1114, cr_loss=0.3519, attn_decoder_loss=0.2383, over 5779229.61 frames. ], batch size: 111, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:20:10,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=747340.0, ans=0.125 2024-09-19 21:20:22,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=747380.0, ans=0.1 2024-09-19 21:20:41,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=747420.0, ans=0.125 2024-09-19 21:20:50,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.538e+01 9.081e+01 9.476e+01 1.507e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 21:21:02,971 INFO [train.py:1198] (1/2) Epoch 42, batch 1350, loss[loss=0.2321, ctc_loss=0.1117, cr_loss=0.3413, attn_decoder_loss=0.2379, over 29762.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1111, cr_loss=0.3515, attn_decoder_loss=0.2381, over 5796262.74 frames. ], batch size: 81, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:21:09,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747500.0, ans=0.1 2024-09-19 21:21:12,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=747500.0, ans=0.1 2024-09-19 21:21:29,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=747540.0, ans=0.125 2024-09-19 21:21:47,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=747620.0, ans=0.125 2024-09-19 21:21:52,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747620.0, ans=0.1 2024-09-19 21:21:52,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=6.0 2024-09-19 21:22:02,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=747660.0, ans=0.125 2024-09-19 21:22:13,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=747660.0, ans=0.125 2024-09-19 21:22:15,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=747660.0, ans=0.0 2024-09-19 21:22:17,774 INFO [train.py:1198] (1/2) Epoch 42, batch 1400, loss[loss=0.2057, ctc_loss=0.09281, cr_loss=0.3121, attn_decoder_loss=0.2113, over 29598.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.111, cr_loss=0.3517, attn_decoder_loss=0.2381, over 5807748.74 frames. ], batch size: 69, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:22:31,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=747740.0, ans=0.125 2024-09-19 21:22:48,936 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:23:02,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=747780.0, ans=0.125 2024-09-19 21:23:10,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747820.0, ans=0.1 2024-09-19 21:23:23,217 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.442e+01 9.058e+01 9.585e+01 2.575e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 21:23:30,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=747860.0, ans=0.5 2024-09-19 21:23:35,309 INFO [train.py:1198] (1/2) Epoch 42, batch 1450, loss[loss=0.2507, ctc_loss=0.1239, cr_loss=0.3734, attn_decoder_loss=0.2565, over 29438.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.111, cr_loss=0.3512, attn_decoder_loss=0.2384, over 5805270.47 frames. ], batch size: 94, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:23:40,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=747900.0, ans=0.125 2024-09-19 21:23:41,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=747900.0, ans=0.1 2024-09-19 21:23:48,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=747900.0, ans=0.125 2024-09-19 21:24:01,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=747940.0, ans=0.0 2024-09-19 21:24:30,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=748020.0, ans=0.125 2024-09-19 21:24:32,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=12.0 2024-09-19 21:24:39,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=748060.0, ans=0.1 2024-09-19 21:24:53,278 INFO [train.py:1198] (1/2) Epoch 42, batch 1500, loss[loss=0.2346, ctc_loss=0.1046, cr_loss=0.3213, attn_decoder_loss=0.2419, over 29606.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1109, cr_loss=0.3512, attn_decoder_loss=0.2388, over 5806535.61 frames. ], batch size: 86, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:24:56,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748100.0, ans=0.1 2024-09-19 21:25:33,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=748180.0, ans=0.125 2024-09-19 21:25:51,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=748220.0, ans=0.0 2024-09-19 21:25:53,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=748260.0, ans=0.025 2024-09-19 21:25:56,568 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:25:57,687 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.590e+01 8.992e+01 9.499e+01 3.130e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 21:26:05,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=748260.0, ans=0.125 2024-09-19 21:26:07,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=748260.0, ans=0.1 2024-09-19 21:26:09,799 INFO [train.py:1198] (1/2) Epoch 42, batch 1550, loss[loss=0.2546, ctc_loss=0.1265, cr_loss=0.4009, attn_decoder_loss=0.2599, over 29488.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1115, cr_loss=0.3521, attn_decoder_loss=0.2387, over 5781523.24 frames. ], batch size: 90, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:26:16,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=748300.0, ans=0.125 2024-09-19 21:26:27,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=748340.0, ans=0.1 2024-09-19 21:26:27,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=748340.0, ans=0.07 2024-09-19 21:26:36,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=748340.0, ans=0.2 2024-09-19 21:26:49,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=748380.0, ans=0.125 2024-09-19 21:27:01,645 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:27:04,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=748420.0, ans=0.0 2024-09-19 21:27:09,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=748420.0, ans=0.2 2024-09-19 21:27:26,805 INFO [train.py:1198] (1/2) Epoch 42, batch 1600, loss[loss=0.2485, ctc_loss=0.1169, cr_loss=0.3599, attn_decoder_loss=0.2551, over 29682.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1117, cr_loss=0.3525, attn_decoder_loss=0.2387, over 5763158.73 frames. ], batch size: 85, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:27:30,109 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:27:31,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=748500.0, ans=0.0 2024-09-19 21:28:11,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2024-09-19 21:28:14,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=748620.0, ans=0.0 2024-09-19 21:28:20,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=748620.0, ans=0.0 2024-09-19 21:28:32,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.535e+01 9.042e+01 9.603e+01 1.807e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 21:28:43,813 INFO [train.py:1198] (1/2) Epoch 42, batch 1650, loss[loss=0.2472, ctc_loss=0.119, cr_loss=0.3537, attn_decoder_loss=0.2536, over 29703.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1116, cr_loss=0.3518, attn_decoder_loss=0.2385, over 5758948.08 frames. ], batch size: 89, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:28:45,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=748700.0, ans=0.2 2024-09-19 21:28:59,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=748740.0, ans=0.0 2024-09-19 21:29:03,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=748740.0, ans=0.125 2024-09-19 21:29:09,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=748740.0, ans=0.125 2024-09-19 21:29:17,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=748780.0, ans=0.0 2024-09-19 21:29:26,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=748780.0, ans=0.125 2024-09-19 21:29:48,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=748860.0, ans=0.125 2024-09-19 21:29:56,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=748860.0, ans=0.2 2024-09-19 21:29:59,138 INFO [train.py:1198] (1/2) Epoch 42, batch 1700, loss[loss=0.213, ctc_loss=0.09967, cr_loss=0.316, attn_decoder_loss=0.2186, over 29570.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1112, cr_loss=0.3508, attn_decoder_loss=0.2383, over 5781474.60 frames. ], batch size: 69, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:30:34,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748980.0, ans=0.1 2024-09-19 21:30:49,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=749020.0, ans=0.125 2024-09-19 21:31:02,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.72 vs. limit=15.0 2024-09-19 21:31:04,407 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.510e+01 9.136e+01 9.466e+01 1.659e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 21:31:15,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2024-09-19 21:31:16,643 INFO [train.py:1198] (1/2) Epoch 42, batch 1750, loss[loss=0.2027, ctc_loss=0.08569, cr_loss=0.2903, attn_decoder_loss=0.2092, over 29351.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1106, cr_loss=0.3498, attn_decoder_loss=0.2378, over 5790177.28 frames. ], batch size: 67, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:31:22,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=749100.0, ans=0.09899494936611666 2024-09-19 21:31:35,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=749140.0, ans=0.125 2024-09-19 21:31:50,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.91 vs. limit=22.5 2024-09-19 21:31:57,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=749180.0, ans=0.125 2024-09-19 21:32:07,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749220.0, ans=0.1 2024-09-19 21:32:18,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=749260.0, ans=0.0 2024-09-19 21:32:34,205 INFO [train.py:1198] (1/2) Epoch 42, batch 1800, loss[loss=0.2285, ctc_loss=0.1017, cr_loss=0.3379, attn_decoder_loss=0.2351, over 29673.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.111, cr_loss=0.3504, attn_decoder_loss=0.2381, over 5791961.95 frames. ], batch size: 83, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:32:43,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=749300.0, ans=0.0 2024-09-19 21:32:52,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=749340.0, ans=0.125 2024-09-19 21:32:54,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2024-09-19 21:33:07,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749380.0, ans=0.1 2024-09-19 21:33:13,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=749380.0, ans=0.125 2024-09-19 21:33:15,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=749380.0, ans=0.2 2024-09-19 21:33:15,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749380.0, ans=0.1 2024-09-19 21:33:39,226 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.440e+01 8.862e+01 9.428e+01 1.419e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 21:33:49,904 INFO [train.py:1198] (1/2) Epoch 42, batch 1850, loss[loss=0.2352, ctc_loss=0.108, cr_loss=0.3374, attn_decoder_loss=0.2419, over 29617.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1109, cr_loss=0.35, attn_decoder_loss=0.2378, over 5797242.13 frames. ], batch size: 86, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:33:50,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.33 vs. limit=22.5 2024-09-19 21:34:00,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=749500.0, ans=0.1 2024-09-19 21:34:08,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=749540.0, ans=0.07 2024-09-19 21:34:10,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=749540.0, ans=0.0 2024-09-19 21:34:14,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=749540.0, ans=0.2 2024-09-19 21:34:24,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=749580.0, ans=22.5 2024-09-19 21:34:25,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.15 vs. limit=15.0 2024-09-19 21:34:33,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=749580.0, ans=0.09899494936611666 2024-09-19 21:34:45,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=749620.0, ans=0.1 2024-09-19 21:34:48,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=749620.0, ans=0.125 2024-09-19 21:34:55,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=749660.0, ans=0.025 2024-09-19 21:35:06,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=749700.0, ans=0.125 2024-09-19 21:35:07,214 INFO [train.py:1198] (1/2) Epoch 42, batch 1900, loss[loss=0.2467, ctc_loss=0.1223, cr_loss=0.3814, attn_decoder_loss=0.2521, over 29724.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1113, cr_loss=0.3512, attn_decoder_loss=0.2386, over 5805229.18 frames. ], batch size: 89, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:35:30,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=749740.0, ans=0.0 2024-09-19 21:35:43,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749780.0, ans=0.1 2024-09-19 21:36:01,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=749820.0, ans=0.125 2024-09-19 21:36:07,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=749820.0, ans=0.0 2024-09-19 21:36:07,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-19 21:36:13,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749860.0, ans=0.1 2024-09-19 21:36:14,505 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.827e+01 8.670e+01 9.049e+01 9.659e+01 1.303e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 21:36:20,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=749860.0, ans=0.125 2024-09-19 21:36:20,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=749860.0, ans=0.125 2024-09-19 21:36:23,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=749900.0, ans=0.125 2024-09-19 21:36:24,979 INFO [train.py:1198] (1/2) Epoch 42, batch 1950, loss[loss=0.2307, ctc_loss=0.1152, cr_loss=0.3642, attn_decoder_loss=0.2355, over 29435.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1119, cr_loss=0.3529, attn_decoder_loss=0.2396, over 5818932.10 frames. ], batch size: 78, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:36:37,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=749900.0, ans=0.125 2024-09-19 21:36:50,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=749940.0, ans=0.025 2024-09-19 21:37:13,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2024-09-19 21:37:40,247 INFO [train.py:1198] (1/2) Epoch 42, batch 2000, loss[loss=0.2012, ctc_loss=0.09356, cr_loss=0.3016, attn_decoder_loss=0.2064, over 29365.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1121, cr_loss=0.3524, attn_decoder_loss=0.2397, over 5796963.05 frames. ], batch size: 67, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:37:40,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=750100.0, ans=0.1 2024-09-19 21:37:50,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=750100.0, ans=0.2 2024-09-19 21:37:59,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=750140.0, ans=0.125 2024-09-19 21:38:17,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=22.5 2024-09-19 21:38:29,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750220.0, ans=0.1 2024-09-19 21:38:35,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=750220.0, ans=0.025 2024-09-19 21:38:47,685 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.819e+01 8.670e+01 9.136e+01 9.850e+01 1.573e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 21:38:58,256 INFO [train.py:1198] (1/2) Epoch 42, batch 2050, loss[loss=0.2036, ctc_loss=0.09471, cr_loss=0.314, attn_decoder_loss=0.2087, over 29419.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1117, cr_loss=0.3516, attn_decoder_loss=0.2387, over 5788210.95 frames. ], batch size: 70, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:38:59,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=750300.0, ans=0.125 2024-09-19 21:39:13,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=750340.0, ans=0.0 2024-09-19 21:39:38,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=750380.0, ans=0.125 2024-09-19 21:39:38,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=750380.0, ans=0.125 2024-09-19 21:39:49,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.85 vs. limit=10.0 2024-09-19 21:40:03,020 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:40:08,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=750460.0, ans=0.0 2024-09-19 21:40:16,254 INFO [train.py:1198] (1/2) Epoch 42, batch 2100, loss[loss=0.2283, ctc_loss=0.109, cr_loss=0.3387, attn_decoder_loss=0.2341, over 29746.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1113, cr_loss=0.3506, attn_decoder_loss=0.2383, over 5800069.51 frames. ], batch size: 81, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:40:18,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750500.0, ans=0.1 2024-09-19 21:40:43,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=750540.0, ans=0.125 2024-09-19 21:40:48,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=750580.0, ans=0.2 2024-09-19 21:40:59,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=750620.0, ans=0.125 2024-09-19 21:41:01,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750620.0, ans=0.1 2024-09-19 21:41:11,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=750620.0, ans=0.025 2024-09-19 21:41:20,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.604e+01 9.019e+01 9.390e+01 1.185e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-19 21:41:31,110 INFO [train.py:1198] (1/2) Epoch 42, batch 2150, loss[loss=0.222, ctc_loss=0.1038, cr_loss=0.3346, attn_decoder_loss=0.2277, over 29420.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1108, cr_loss=0.3501, attn_decoder_loss=0.2378, over 5815176.47 frames. ], batch size: 78, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:41:38,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=4.98 vs. limit=15.0 2024-09-19 21:41:50,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=750740.0, ans=0.125 2024-09-19 21:42:23,117 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:42:24,500 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:42:35,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.78 vs. limit=12.0 2024-09-19 21:42:38,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=750860.0, ans=0.035 2024-09-19 21:42:48,186 INFO [train.py:1198] (1/2) Epoch 42, batch 2200, loss[loss=0.2338, ctc_loss=0.1057, cr_loss=0.3374, attn_decoder_loss=0.2405, over 29635.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1112, cr_loss=0.3511, attn_decoder_loss=0.2381, over 5812632.41 frames. ], batch size: 86, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:42:56,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.99 vs. limit=15.0 2024-09-19 21:43:40,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=751020.0, ans=0.0 2024-09-19 21:43:55,413 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.242e+01 8.649e+01 8.991e+01 9.667e+01 4.201e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 21:44:06,041 INFO [train.py:1198] (1/2) Epoch 42, batch 2250, loss[loss=0.2364, ctc_loss=0.1073, cr_loss=0.3497, attn_decoder_loss=0.2429, over 29687.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1109, cr_loss=0.3504, attn_decoder_loss=0.2381, over 5811982.84 frames. ], batch size: 82, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:44:18,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751100.0, ans=0.1 2024-09-19 21:44:22,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=751140.0, ans=0.125 2024-09-19 21:44:51,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=751220.0, ans=0.0 2024-09-19 21:45:00,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=751220.0, ans=0.1 2024-09-19 21:45:21,538 INFO [train.py:1198] (1/2) Epoch 42, batch 2300, loss[loss=0.21, ctc_loss=0.09376, cr_loss=0.3026, attn_decoder_loss=0.2162, over 29318.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1104, cr_loss=0.3489, attn_decoder_loss=0.2372, over 5800269.49 frames. ], batch size: 71, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:45:37,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-09-19 21:45:43,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=751340.0, ans=0.0 2024-09-19 21:45:52,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.56 vs. limit=10.0 2024-09-19 21:45:53,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=751380.0, ans=0.125 2024-09-19 21:46:04,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=751380.0, ans=0.025 2024-09-19 21:46:23,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.77 vs. limit=15.0 2024-09-19 21:46:28,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 8.459e+01 9.034e+01 9.702e+01 2.715e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-19 21:46:39,162 INFO [train.py:1198] (1/2) Epoch 42, batch 2350, loss[loss=0.2401, ctc_loss=0.1132, cr_loss=0.3404, attn_decoder_loss=0.2467, over 29706.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1107, cr_loss=0.3492, attn_decoder_loss=0.2373, over 5804097.39 frames. ], batch size: 83, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:46:46,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=751500.0, ans=0.1 2024-09-19 21:46:51,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=751500.0, ans=0.125 2024-09-19 21:47:18,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=751580.0, ans=0.125 2024-09-19 21:47:40,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=751660.0, ans=0.2 2024-09-19 21:47:56,840 INFO [train.py:1198] (1/2) Epoch 42, batch 2400, loss[loss=0.2148, ctc_loss=0.09678, cr_loss=0.3127, attn_decoder_loss=0.221, over 29527.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1112, cr_loss=0.3505, attn_decoder_loss=0.2379, over 5808217.01 frames. ], batch size: 76, lr: 2.62e-03, grad_scale: 32.0 2024-09-19 21:47:58,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=751700.0, ans=0.05 2024-09-19 21:48:08,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=751700.0, ans=0.05 2024-09-19 21:48:11,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=751740.0, ans=0.2 2024-09-19 21:48:16,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=751740.0, ans=0.2 2024-09-19 21:48:29,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.53 vs. limit=22.5 2024-09-19 21:49:03,289 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.663e+01 9.186e+01 9.777e+01 4.524e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-19 21:49:12,370 INFO [train.py:1198] (1/2) Epoch 42, batch 2450, loss[loss=0.2418, ctc_loss=0.1241, cr_loss=0.3811, attn_decoder_loss=0.2464, over 29699.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1117, cr_loss=0.3516, attn_decoder_loss=0.2388, over 5783888.25 frames. ], batch size: 82, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:49:25,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2024-09-19 21:49:28,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=751940.0, ans=0.025 2024-09-19 21:49:38,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=751940.0, ans=0.0 2024-09-19 21:49:41,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=751940.0, ans=0.1 2024-09-19 21:49:49,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751980.0, ans=0.1 2024-09-19 21:50:00,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2024-09-19 21:50:01,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=751980.0, ans=0.2 2024-09-19 21:50:08,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=752020.0, ans=0.2 2024-09-19 21:50:15,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.51 vs. limit=15.0 2024-09-19 21:50:22,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=752060.0, ans=0.025 2024-09-19 21:50:31,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=752060.0, ans=0.05 2024-09-19 21:50:37,331 INFO [train.py:1198] (1/2) Epoch 42, batch 2500, loss[loss=0.2388, ctc_loss=0.1135, cr_loss=0.3476, attn_decoder_loss=0.245, over 29611.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1118, cr_loss=0.3522, attn_decoder_loss=0.239, over 5794303.28 frames. ], batch size: 86, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:50:54,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=752140.0, ans=0.125 2024-09-19 21:51:25,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=752220.0, ans=0.1 2024-09-19 21:51:32,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.91 vs. limit=10.0 2024-09-19 21:51:34,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=752220.0, ans=0.125 2024-09-19 21:51:36,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=752220.0, ans=0.0 2024-09-19 21:51:46,109 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 8.685e+01 9.215e+01 9.799e+01 2.260e+02, threshold=1.843e+02, percent-clipped=2.0 2024-09-19 21:51:54,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=752300.0, ans=15.0 2024-09-19 21:51:55,269 INFO [train.py:1198] (1/2) Epoch 42, batch 2550, loss[loss=0.2077, ctc_loss=0.09291, cr_loss=0.3152, attn_decoder_loss=0.2135, over 29356.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1121, cr_loss=0.3526, attn_decoder_loss=0.2392, over 5797332.20 frames. ], batch size: 67, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:52:37,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752380.0, ans=0.1 2024-09-19 21:52:41,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.33 vs. limit=22.5 2024-09-19 21:52:49,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=752420.0, ans=0.0 2024-09-19 21:52:50,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=752420.0, ans=0.125 2024-09-19 21:53:05,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=752460.0, ans=0.125 2024-09-19 21:53:09,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=752500.0, ans=0.125 2024-09-19 21:53:10,743 INFO [train.py:1198] (1/2) Epoch 42, batch 2600, loss[loss=0.2314, ctc_loss=0.1055, cr_loss=0.3498, attn_decoder_loss=0.2376, over 29440.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1121, cr_loss=0.3526, attn_decoder_loss=0.2393, over 5793340.08 frames. ], batch size: 78, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:53:18,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=752500.0, ans=0.125 2024-09-19 21:53:45,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=752580.0, ans=0.1 2024-09-19 21:53:47,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=752580.0, ans=0.125 2024-09-19 21:53:59,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=752620.0, ans=0.125 2024-09-19 21:54:02,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=752620.0, ans=0.125 2024-09-19 21:54:12,002 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:54:18,979 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.622e+01 9.143e+01 9.724e+01 1.437e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-19 21:54:19,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=752660.0, ans=0.0 2024-09-19 21:54:27,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.72 vs. limit=10.0 2024-09-19 21:54:27,792 INFO [train.py:1198] (1/2) Epoch 42, batch 2650, loss[loss=0.2489, ctc_loss=0.1213, cr_loss=0.38, attn_decoder_loss=0.2546, over 29212.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1123, cr_loss=0.3534, attn_decoder_loss=0.2396, over 5798576.53 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:54:31,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=752700.0, ans=0.125 2024-09-19 21:54:48,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.64 vs. limit=10.0 2024-09-19 21:55:26,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=752820.0, ans=22.5 2024-09-19 21:55:31,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=752860.0, ans=0.125 2024-09-19 21:55:45,581 INFO [train.py:1198] (1/2) Epoch 42, batch 2700, loss[loss=0.2359, ctc_loss=0.1127, cr_loss=0.3489, attn_decoder_loss=0.2418, over 29536.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1125, cr_loss=0.354, attn_decoder_loss=0.2399, over 5795450.67 frames. ], batch size: 87, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:55:45,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=752900.0, ans=0.125 2024-09-19 21:55:45,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752900.0, ans=0.1 2024-09-19 21:55:46,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.49 vs. limit=15.0 2024-09-19 21:56:17,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752980.0, ans=0.1 2024-09-19 21:56:51,999 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.707e+01 9.259e+01 9.781e+01 2.020e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-19 21:56:58,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=753060.0, ans=0.125 2024-09-19 21:57:00,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=12.0 2024-09-19 21:57:01,147 INFO [train.py:1198] (1/2) Epoch 42, batch 2750, loss[loss=0.2257, ctc_loss=0.1091, cr_loss=0.3364, attn_decoder_loss=0.2311, over 29506.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1116, cr_loss=0.3524, attn_decoder_loss=0.2386, over 5794203.58 frames. ], batch size: 75, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:57:03,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=753100.0, ans=0.125 2024-09-19 21:57:15,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=753100.0, ans=0.09899494936611666 2024-09-19 21:57:17,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2024-09-19 21:57:24,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=753140.0, ans=0.125 2024-09-19 21:57:36,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=753180.0, ans=0.0 2024-09-19 21:57:48,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=753220.0, ans=0.125 2024-09-19 21:57:48,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=753220.0, ans=0.0 2024-09-19 21:57:53,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=22.5 2024-09-19 21:57:57,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-19 21:58:15,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=753260.0, ans=0.2 2024-09-19 21:58:18,585 INFO [train.py:1198] (1/2) Epoch 42, batch 2800, loss[loss=0.2599, ctc_loss=0.1541, cr_loss=0.3868, attn_decoder_loss=0.2631, over 20355.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1116, cr_loss=0.3521, attn_decoder_loss=0.2386, over 5774225.89 frames. ], batch size: 209, lr: 2.62e-03, grad_scale: 32.0 2024-09-19 21:58:21,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=753300.0, ans=0.125 2024-09-19 21:58:24,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=753300.0, ans=0.0 2024-09-19 21:58:38,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=753340.0, ans=0.125 2024-09-19 21:58:40,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2024-09-19 21:58:40,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-19 21:58:45,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=753340.0, ans=0.09899494936611666 2024-09-19 21:58:47,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=753380.0, ans=0.0 2024-09-19 21:58:54,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=753380.0, ans=0.125 2024-09-19 21:59:19,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2024-09-19 21:59:23,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=753460.0, ans=0.1 2024-09-19 21:59:29,300 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.790e+01 9.273e+01 9.887e+01 2.081e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 21:59:35,330 INFO [train.py:1198] (1/2) Epoch 42, batch 2850, loss[loss=0.2278, ctc_loss=0.1093, cr_loss=0.3615, attn_decoder_loss=0.233, over 29464.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.112, cr_loss=0.353, attn_decoder_loss=0.2391, over 5761404.64 frames. ], batch size: 77, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:59:43,722 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.64 vs. limit=15.0 2024-09-19 21:59:43,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2024-09-19 21:59:47,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=753500.0, ans=0.07 2024-09-19 21:59:55,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=753540.0, ans=0.125 2024-09-19 22:00:08,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=753580.0, ans=0.2 2024-09-19 22:00:25,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.85 vs. limit=22.5 2024-09-19 22:00:30,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=753620.0, ans=0.125 2024-09-19 22:00:51,144 INFO [train.py:1198] (1/2) Epoch 42, batch 2900, loss[loss=0.2325, ctc_loss=0.1091, cr_loss=0.3516, attn_decoder_loss=0.2384, over 29431.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1126, cr_loss=0.3542, attn_decoder_loss=0.2402, over 5786903.65 frames. ], batch size: 79, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:01:12,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=22.5 2024-09-19 22:01:12,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.60 vs. limit=15.0 2024-09-19 22:01:24,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2024-09-19 22:01:40,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=753820.0, ans=0.125 2024-09-19 22:01:52,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=753860.0, ans=0.025 2024-09-19 22:02:00,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=753860.0, ans=0.2 2024-09-19 22:02:02,878 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.715e+01 9.227e+01 9.833e+01 2.599e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-19 22:02:07,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=753900.0, ans=0.125 2024-09-19 22:02:08,884 INFO [train.py:1198] (1/2) Epoch 42, batch 2950, loss[loss=0.2297, ctc_loss=0.1161, cr_loss=0.3521, attn_decoder_loss=0.2345, over 29503.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1116, cr_loss=0.3517, attn_decoder_loss=0.2389, over 5781682.57 frames. ], batch size: 75, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:02:31,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=12.0 2024-09-19 22:02:31,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=753940.0, ans=0.2 2024-09-19 22:02:43,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=753980.0, ans=0.0 2024-09-19 22:02:44,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=753980.0, ans=6.0 2024-09-19 22:02:57,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2024-09-19 22:03:10,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.28 vs. limit=12.0 2024-09-19 22:03:11,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=754060.0, ans=0.0 2024-09-19 22:03:26,596 INFO [train.py:1198] (1/2) Epoch 42, batch 3000, loss[loss=0.2233, ctc_loss=0.1062, cr_loss=0.3445, attn_decoder_loss=0.2286, over 29742.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.111, cr_loss=0.3507, attn_decoder_loss=0.2384, over 5782820.15 frames. ], batch size: 81, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:03:26,597 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 22:03:44,994 INFO [train.py:1230] (1/2) Epoch 42, validation: loss=0.212, ctc_loss=0.03659, cr_loss=6.044e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-19 22:03:44,994 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 22:04:06,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=754140.0, ans=0.125 2024-09-19 22:04:08,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=754140.0, ans=0.0 2024-09-19 22:04:32,431 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:04:35,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754220.0, ans=0.1 2024-09-19 22:04:37,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-09-19 22:04:54,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.10 vs. limit=22.5 2024-09-19 22:04:56,824 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 8.635e+01 9.210e+01 9.879e+01 1.269e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-19 22:05:03,031 INFO [train.py:1198] (1/2) Epoch 42, batch 3050, loss[loss=0.2269, ctc_loss=0.1157, cr_loss=0.3756, attn_decoder_loss=0.2309, over 29523.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.112, cr_loss=0.3527, attn_decoder_loss=0.2393, over 5777157.85 frames. ], batch size: 76, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:05:13,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754300.0, ans=0.1 2024-09-19 22:05:28,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=754340.0, ans=0.0 2024-09-19 22:05:35,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2024-09-19 22:05:36,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-09-19 22:05:39,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754380.0, ans=0.1 2024-09-19 22:05:41,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=754380.0, ans=0.0 2024-09-19 22:05:43,017 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:05:48,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=12.0 2024-09-19 22:06:08,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=754460.0, ans=0.0 2024-09-19 22:06:17,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=754500.0, ans=0.125 2024-09-19 22:06:18,285 INFO [train.py:1198] (1/2) Epoch 42, batch 3100, loss[loss=0.2447, ctc_loss=0.1205, cr_loss=0.3463, attn_decoder_loss=0.2507, over 29250.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1115, cr_loss=0.3511, attn_decoder_loss=0.2388, over 5776606.62 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:06:38,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=754540.0, ans=0.125 2024-09-19 22:06:47,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=754580.0, ans=0.125 2024-09-19 22:06:56,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=754580.0, ans=0.5 2024-09-19 22:07:17,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-19 22:07:25,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=754660.0, ans=0.0 2024-09-19 22:07:30,010 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.575e+01 9.075e+01 9.708e+01 6.330e+02, threshold=1.815e+02, percent-clipped=2.0 2024-09-19 22:07:34,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=22.5 2024-09-19 22:07:36,113 INFO [train.py:1198] (1/2) Epoch 42, batch 3150, loss[loss=0.2451, ctc_loss=0.1251, cr_loss=0.3831, attn_decoder_loss=0.2499, over 28872.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1115, cr_loss=0.3515, attn_decoder_loss=0.2389, over 5782731.97 frames. ], batch size: 104, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:07:36,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754700.0, ans=0.1 2024-09-19 22:07:49,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=754740.0, ans=0.0 2024-09-19 22:08:01,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2024-09-19 22:08:05,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=754780.0, ans=0.125 2024-09-19 22:08:12,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=754780.0, ans=0.125 2024-09-19 22:08:17,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.85 vs. limit=12.0 2024-09-19 22:08:53,272 INFO [train.py:1198] (1/2) Epoch 42, batch 3200, loss[loss=0.2429, ctc_loss=0.1253, cr_loss=0.3862, attn_decoder_loss=0.2473, over 29409.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1111, cr_loss=0.3504, attn_decoder_loss=0.2382, over 5792088.41 frames. ], batch size: 79, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 22:08:53,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=754900.0, ans=0.0 2024-09-19 22:09:21,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=754940.0, ans=0.125 2024-09-19 22:09:40,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.11 vs. limit=15.0 2024-09-19 22:09:44,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-09-19 22:10:03,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.581e+01 9.115e+01 9.616e+01 1.393e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-19 22:10:09,482 INFO [train.py:1198] (1/2) Epoch 42, batch 3250, loss[loss=0.2355, ctc_loss=0.1124, cr_loss=0.3572, attn_decoder_loss=0.2412, over 29709.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1115, cr_loss=0.3514, attn_decoder_loss=0.239, over 5797780.25 frames. ], batch size: 84, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:10:09,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=755100.0, ans=0.125 2024-09-19 22:10:34,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=755140.0, ans=0.0 2024-09-19 22:10:39,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=755180.0, ans=0.5 2024-09-19 22:10:58,610 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:11:01,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=755220.0, ans=0.125 2024-09-19 22:11:22,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755260.0, ans=0.1 2024-09-19 22:11:26,689 INFO [train.py:1198] (1/2) Epoch 42, batch 3300, loss[loss=0.2385, ctc_loss=0.108, cr_loss=0.34, attn_decoder_loss=0.2455, over 28205.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1106, cr_loss=0.3493, attn_decoder_loss=0.2376, over 5796266.26 frames. ], batch size: 111, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:11:27,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=755300.0, ans=0.125 2024-09-19 22:11:54,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=755340.0, ans=0.125 2024-09-19 22:12:15,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.93 vs. limit=10.0 2024-09-19 22:12:29,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755460.0, ans=0.1 2024-09-19 22:12:36,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755460.0, ans=0.1 2024-09-19 22:12:39,475 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.624e+01 9.226e+01 9.886e+01 3.496e+02, threshold=1.845e+02, percent-clipped=4.0 2024-09-19 22:12:44,130 INFO [train.py:1198] (1/2) Epoch 42, batch 3350, loss[loss=0.243, ctc_loss=0.1149, cr_loss=0.3617, attn_decoder_loss=0.2492, over 28905.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1112, cr_loss=0.3504, attn_decoder_loss=0.2385, over 5773200.21 frames. ], batch size: 104, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:12:49,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.58 vs. limit=15.0 2024-09-19 22:13:13,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=755580.0, ans=0.0 2024-09-19 22:13:32,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=755620.0, ans=0.125 2024-09-19 22:13:44,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=755660.0, ans=0.0 2024-09-19 22:13:46,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=755660.0, ans=0.05 2024-09-19 22:13:59,650 INFO [train.py:1198] (1/2) Epoch 42, batch 3400, loss[loss=0.1948, ctc_loss=0.08564, cr_loss=0.3038, attn_decoder_loss=0.2001, over 29339.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1112, cr_loss=0.3506, attn_decoder_loss=0.2384, over 5767081.18 frames. ], batch size: 67, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:14:18,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=755740.0, ans=0.125 2024-09-19 22:14:18,154 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:14:18,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=755740.0, ans=0.125 2024-09-19 22:14:23,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-19 22:14:34,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=755780.0, ans=0.125 2024-09-19 22:15:12,723 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.618e+01 8.954e+01 9.599e+01 1.831e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 22:15:17,196 INFO [train.py:1198] (1/2) Epoch 42, batch 3450, loss[loss=0.252, ctc_loss=0.1273, cr_loss=0.3955, attn_decoder_loss=0.257, over 28324.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1112, cr_loss=0.3508, attn_decoder_loss=0.2385, over 5776014.67 frames. ], batch size: 111, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:15:44,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=755940.0, ans=0.025 2024-09-19 22:15:45,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=755980.0, ans=0.2 2024-09-19 22:15:51,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=755980.0, ans=0.125 2024-09-19 22:16:00,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=755980.0, ans=0.125 2024-09-19 22:16:09,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=756020.0, ans=0.07 2024-09-19 22:16:28,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.29 vs. limit=10.0 2024-09-19 22:16:34,970 INFO [train.py:1198] (1/2) Epoch 42, batch 3500, loss[loss=0.2092, ctc_loss=0.09518, cr_loss=0.3179, attn_decoder_loss=0.2149, over 29339.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1109, cr_loss=0.3498, attn_decoder_loss=0.2378, over 5777161.48 frames. ], batch size: 71, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:16:35,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=756100.0, ans=0.2 2024-09-19 22:16:36,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756100.0, ans=0.1 2024-09-19 22:16:38,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=756100.0, ans=0.0 2024-09-19 22:16:44,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=756100.0, ans=0.125 2024-09-19 22:16:53,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=756140.0, ans=0.2 2024-09-19 22:17:21,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=756220.0, ans=0.025 2024-09-19 22:17:39,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=756260.0, ans=0.07 2024-09-19 22:17:44,786 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.622e+01 9.000e+01 9.662e+01 3.411e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-19 22:17:49,259 INFO [train.py:1198] (1/2) Epoch 42, batch 3550, loss[loss=0.2407, ctc_loss=0.1126, cr_loss=0.3455, attn_decoder_loss=0.2472, over 29693.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.111, cr_loss=0.3502, attn_decoder_loss=0.2381, over 5783445.41 frames. ], batch size: 89, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:18:29,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=756380.0, ans=0.07 2024-09-19 22:18:32,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=756420.0, ans=0.125 2024-09-19 22:18:54,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756460.0, ans=0.1 2024-09-19 22:19:00,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=756460.0, ans=0.95 2024-09-19 22:19:01,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=756500.0, ans=0.125 2024-09-19 22:19:02,913 INFO [train.py:1198] (1/2) Epoch 42, batch 3600, loss[loss=0.2342, ctc_loss=0.1142, cr_loss=0.3729, attn_decoder_loss=0.2392, over 29506.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.111, cr_loss=0.3504, attn_decoder_loss=0.238, over 5792490.81 frames. ], batch size: 77, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:19:10,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=756500.0, ans=0.2 2024-09-19 22:19:16,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=756540.0, ans=0.125 2024-09-19 22:19:24,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.99 vs. limit=10.0 2024-09-19 22:20:07,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=756660.0, ans=0.125 2024-09-19 22:20:14,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.526e+01 8.930e+01 9.587e+01 1.613e+02, threshold=1.786e+02, percent-clipped=0.0 2024-09-19 22:20:19,034 INFO [train.py:1198] (1/2) Epoch 42, batch 3650, loss[loss=0.2557, ctc_loss=0.1338, cr_loss=0.4009, attn_decoder_loss=0.2604, over 29474.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1108, cr_loss=0.3506, attn_decoder_loss=0.2376, over 5794608.58 frames. ], batch size: 90, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:20:49,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=756780.0, ans=0.125 2024-09-19 22:20:49,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=756780.0, ans=0.125 2024-09-19 22:20:57,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=756780.0, ans=0.1 2024-09-19 22:21:34,293 INFO [train.py:1198] (1/2) Epoch 42, batch 3700, loss[loss=0.2485, ctc_loss=0.1207, cr_loss=0.3712, attn_decoder_loss=0.2545, over 29712.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1108, cr_loss=0.3505, attn_decoder_loss=0.2379, over 5803300.20 frames. ], batch size: 84, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:21:46,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=756900.0, ans=0.2 2024-09-19 22:22:04,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.30 vs. limit=15.0 2024-09-19 22:22:19,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=757020.0, ans=0.2 2024-09-19 22:22:25,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=757020.0, ans=0.2 2024-09-19 22:22:44,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=757060.0, ans=0.0 2024-09-19 22:22:45,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.08 vs. limit=10.0 2024-09-19 22:22:46,050 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.506e+01 9.125e+01 9.829e+01 2.175e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-19 22:22:50,498 INFO [train.py:1198] (1/2) Epoch 42, batch 3750, loss[loss=0.2084, ctc_loss=0.08639, cr_loss=0.3078, attn_decoder_loss=0.2152, over 29329.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1103, cr_loss=0.3492, attn_decoder_loss=0.2375, over 5807490.48 frames. ], batch size: 67, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:22:52,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.14 vs. limit=15.0 2024-09-19 22:22:56,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=757100.0, ans=0.125 2024-09-19 22:23:02,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=757100.0, ans=0.2 2024-09-19 22:23:15,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=757140.0, ans=0.125 2024-09-19 22:23:24,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=757180.0, ans=0.07 2024-09-19 22:23:33,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=757220.0, ans=0.0 2024-09-19 22:23:49,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=757260.0, ans=0.125 2024-09-19 22:24:04,584 INFO [train.py:1198] (1/2) Epoch 42, batch 3800, loss[loss=0.2343, ctc_loss=0.106, cr_loss=0.3517, attn_decoder_loss=0.2407, over 29635.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.11, cr_loss=0.3481, attn_decoder_loss=0.2369, over 5796864.96 frames. ], batch size: 86, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:24:07,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757300.0, ans=0.1 2024-09-19 22:24:37,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=757380.0, ans=0.125 2024-09-19 22:24:50,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=757420.0, ans=0.125 2024-09-19 22:24:52,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=757420.0, ans=0.125 2024-09-19 22:25:03,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=757460.0, ans=0.125 2024-09-19 22:25:12,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757460.0, ans=0.0 2024-09-19 22:25:13,884 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.649e+01 9.029e+01 9.772e+01 5.131e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-19 22:25:18,244 INFO [train.py:1198] (1/2) Epoch 42, batch 3850, loss[loss=0.2349, ctc_loss=0.1088, cr_loss=0.3498, attn_decoder_loss=0.2412, over 29298.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.11, cr_loss=0.3481, attn_decoder_loss=0.2368, over 5811276.18 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:25:18,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=757500.0, ans=0.125 2024-09-19 22:25:50,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2024-09-19 22:26:04,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757620.0, ans=0.1 2024-09-19 22:26:09,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=757620.0, ans=10.0 2024-09-19 22:26:25,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=757660.0, ans=0.2 2024-09-19 22:26:26,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=757660.0, ans=0.0 2024-09-19 22:26:30,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=15.0 2024-09-19 22:26:33,899 INFO [train.py:1198] (1/2) Epoch 42, batch 3900, loss[loss=0.2406, ctc_loss=0.1217, cr_loss=0.3841, attn_decoder_loss=0.2453, over 29619.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1104, cr_loss=0.3491, attn_decoder_loss=0.2374, over 5814872.43 frames. ], batch size: 86, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:26:35,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=757700.0, ans=0.025 2024-09-19 22:26:47,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=757740.0, ans=0.125 2024-09-19 22:26:50,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=757740.0, ans=0.015 2024-09-19 22:27:09,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757780.0, ans=0.0 2024-09-19 22:27:44,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.680e+01 8.643e+01 9.033e+01 9.490e+01 1.279e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-19 22:27:49,120 INFO [train.py:1198] (1/2) Epoch 42, batch 3950, loss[loss=0.2472, ctc_loss=0.1259, cr_loss=0.3939, attn_decoder_loss=0.2519, over 29516.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1101, cr_loss=0.3487, attn_decoder_loss=0.2376, over 5834470.90 frames. ], batch size: 97, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:27:49,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=757900.0, ans=0.125 2024-09-19 22:28:22,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2024-09-19 22:28:35,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.81 vs. limit=15.0 2024-09-19 22:28:35,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.87 vs. limit=10.0 2024-09-19 22:28:45,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=758020.0, ans=0.0 2024-09-19 22:28:55,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=758060.0, ans=0.125 2024-09-19 22:29:02,375 INFO [train.py:1198] (1/2) Epoch 42, batch 4000, loss[loss=0.2162, ctc_loss=0.09555, cr_loss=0.3211, attn_decoder_loss=0.2225, over 29484.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1103, cr_loss=0.3486, attn_decoder_loss=0.2375, over 5811291.36 frames. ], batch size: 74, lr: 2.61e-03, grad_scale: 32.0 2024-09-19 22:29:11,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=758100.0, ans=0.0 2024-09-19 22:29:11,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=758100.0, ans=0.125 2024-09-19 22:29:18,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=758140.0, ans=0.125 2024-09-19 22:29:18,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=758140.0, ans=0.0 2024-09-19 22:29:55,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=758220.0, ans=0.2 2024-09-19 22:30:03,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=758260.0, ans=0.125 2024-09-19 22:30:13,115 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.535e+01 8.990e+01 9.708e+01 1.890e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 22:30:16,057 INFO [train.py:1198] (1/2) Epoch 42, batch 4050, loss[loss=0.2552, ctc_loss=0.1376, cr_loss=0.3752, attn_decoder_loss=0.26, over 20786.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1106, cr_loss=0.3489, attn_decoder_loss=0.2376, over 5796178.47 frames. ], batch size: 210, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:30:35,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-09-19 22:31:00,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=758420.0, ans=0.0 2024-09-19 22:31:12,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=758420.0, ans=0.125 2024-09-19 22:31:28,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2024-09-19 22:31:31,091 INFO [train.py:1198] (1/2) Epoch 42, batch 4100, loss[loss=0.2491, ctc_loss=0.1262, cr_loss=0.4039, attn_decoder_loss=0.2538, over 29503.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1111, cr_loss=0.3502, attn_decoder_loss=0.238, over 5792124.13 frames. ], batch size: 90, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:32:10,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=758580.0, ans=0.0 2024-09-19 22:32:30,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=758660.0, ans=0.0 2024-09-19 22:32:37,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=758660.0, ans=0.125 2024-09-19 22:32:42,742 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.799e+01 9.217e+01 9.992e+01 1.793e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-19 22:32:45,670 INFO [train.py:1198] (1/2) Epoch 42, batch 4150, loss[loss=0.2265, ctc_loss=0.1123, cr_loss=0.3429, attn_decoder_loss=0.2316, over 29514.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.111, cr_loss=0.3501, attn_decoder_loss=0.2379, over 5798655.96 frames. ], batch size: 77, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:32:57,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=758700.0, ans=0.2 2024-09-19 22:33:10,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=758740.0, ans=0.125 2024-09-19 22:33:21,051 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:33:22,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=758780.0, ans=0.125 2024-09-19 22:33:31,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=758820.0, ans=0.0 2024-09-19 22:33:33,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.83 vs. limit=6.0 2024-09-19 22:33:44,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=758860.0, ans=0.125 2024-09-19 22:33:58,999 INFO [train.py:1198] (1/2) Epoch 42, batch 4200, loss[loss=0.2561, ctc_loss=0.1317, cr_loss=0.3952, attn_decoder_loss=0.2612, over 29482.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1111, cr_loss=0.3504, attn_decoder_loss=0.2383, over 5801101.21 frames. ], batch size: 90, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:34:03,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=758900.0, ans=0.2 2024-09-19 22:34:05,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2024-09-19 22:34:41,934 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:35:10,285 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.752e+01 9.321e+01 9.976e+01 3.736e+02, threshold=1.864e+02, percent-clipped=1.0 2024-09-19 22:35:13,188 INFO [train.py:1198] (1/2) Epoch 42, batch 4250, loss[loss=0.2169, ctc_loss=0.1013, cr_loss=0.3247, attn_decoder_loss=0.2225, over 29493.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1104, cr_loss=0.349, attn_decoder_loss=0.2382, over 5806864.92 frames. ], batch size: 74, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:35:38,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=759140.0, ans=0.2 2024-09-19 22:35:58,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-09-19 22:36:02,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=759220.0, ans=0.125 2024-09-19 22:36:14,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=759260.0, ans=0.5 2024-09-19 22:36:16,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=759260.0, ans=0.125 2024-09-19 22:36:27,661 INFO [train.py:1198] (1/2) Epoch 42, batch 4300, loss[loss=0.2432, ctc_loss=0.1174, cr_loss=0.3589, attn_decoder_loss=0.2492, over 29533.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1107, cr_loss=0.3494, attn_decoder_loss=0.2384, over 5796201.35 frames. ], batch size: 87, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:36:33,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=759300.0, ans=0.125 2024-09-19 22:36:33,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=759300.0, ans=0.09899494936611666 2024-09-19 22:36:54,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=759340.0, ans=0.125 2024-09-19 22:37:02,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=759380.0, ans=0.5 2024-09-19 22:37:09,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=759380.0, ans=0.125 2024-09-19 22:37:14,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=759420.0, ans=0.2 2024-09-19 22:37:24,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=759420.0, ans=0.0 2024-09-19 22:37:31,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=759460.0, ans=0.2 2024-09-19 22:37:38,936 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.652e+01 9.323e+01 9.871e+01 1.907e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-19 22:37:41,932 INFO [train.py:1198] (1/2) Epoch 42, batch 4350, loss[loss=0.256, ctc_loss=0.1322, cr_loss=0.3989, attn_decoder_loss=0.2609, over 29460.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1138, cr_loss=0.3559, attn_decoder_loss=0.242, over 5797522.14 frames. ], batch size: 97, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:38:09,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=759540.0, ans=0.125 2024-09-19 22:38:14,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-09-19 22:38:20,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=759580.0, ans=0.0 2024-09-19 22:38:51,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-09-19 22:38:52,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2024-09-19 22:38:56,125 INFO [train.py:1198] (1/2) Epoch 42, batch 4400, loss[loss=0.2494, ctc_loss=0.1263, cr_loss=0.3766, attn_decoder_loss=0.2547, over 27545.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1146, cr_loss=0.3576, attn_decoder_loss=0.2437, over 5768215.03 frames. ], batch size: 124, lr: 2.61e-03, grad_scale: 32.0 2024-09-19 22:39:21,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=759740.0, ans=0.125 2024-09-19 22:39:24,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=759780.0, ans=0.1 2024-09-19 22:39:32,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=759780.0, ans=0.0 2024-09-19 22:39:42,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759820.0, ans=0.1 2024-09-19 22:39:45,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=759820.0, ans=0.0 2024-09-19 22:39:57,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=759860.0, ans=0.0 2024-09-19 22:40:07,760 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.246e+01 9.179e+01 9.547e+01 1.014e+02 2.970e+02, threshold=1.909e+02, percent-clipped=2.0 2024-09-19 22:40:08,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=759900.0, ans=0.0 2024-09-19 22:40:09,217 INFO [train.py:1198] (1/2) Epoch 42, batch 4450, loss[loss=0.2553, ctc_loss=0.142, cr_loss=0.3625, attn_decoder_loss=0.2598, over 19998.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1181, cr_loss=0.3629, attn_decoder_loss=0.2458, over 5576898.15 frames. ], batch size: 209, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:40:13,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=759900.0, ans=0.125 2024-09-19 22:40:23,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=759940.0, ans=0.125 2024-09-19 22:40:31,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=759940.0, ans=0.0 2024-09-19 22:40:33,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=759940.0, ans=0.2 2024-09-19 22:40:46,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=759980.0, ans=0.125 2024-09-19 22:41:03,747 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:41:17,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=760060.0, ans=0.125 2024-09-19 22:41:25,586 INFO [train.py:1198] (1/2) Epoch 42, batch 4500, loss[loss=0.2453, ctc_loss=0.1287, cr_loss=0.3846, attn_decoder_loss=0.2497, over 20148.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1205, cr_loss=0.3646, attn_decoder_loss=0.2473, over 5235255.73 frames. ], batch size: 211, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:41:33,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=760100.0, ans=0.125 2024-09-19 22:42:41,118 INFO [train.py:1198] (1/2) Epoch 43, batch 0, loss[loss=0.2141, ctc_loss=0.09162, cr_loss=0.3107, attn_decoder_loss=0.2208, over 29613.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.09162, cr_loss=0.3107, attn_decoder_loss=0.2208, over 29613.00 frames. ], batch size: 73, lr: 2.58e-03, grad_scale: 16.0 2024-09-19 22:42:41,119 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 22:42:45,046 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0199, 4.9225, 4.4091, 4.6909], device='cuda:1') 2024-09-19 22:43:00,148 INFO [train.py:1230] (1/2) Epoch 43, validation: loss=0.2125, ctc_loss=0.03634, cr_loss=6.648e-15, attn_decoder_loss=0.2321, over 944034.00 frames. 2024-09-19 22:43:00,148 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-19 22:43:07,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-09-19 22:43:22,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-09-19 22:43:31,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.99 vs. limit=15.0 2024-09-19 22:43:35,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=760280.0, ans=0.125 2024-09-19 22:43:38,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=760280.0, ans=15.0 2024-09-19 22:43:39,929 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 1.042e+02 1.140e+02 1.225e+02 1.755e+02, threshold=2.281e+02, percent-clipped=0.0 2024-09-19 22:44:11,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=760360.0, ans=0.025 2024-09-19 22:44:14,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=760360.0, ans=0.125 2024-09-19 22:44:17,480 INFO [train.py:1198] (1/2) Epoch 43, batch 50, loss[loss=0.2071, ctc_loss=0.08851, cr_loss=0.2892, attn_decoder_loss=0.2139, over 29407.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.113, cr_loss=0.356, attn_decoder_loss=0.2403, over 1269659.90 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:44:17,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=760400.0, ans=0.0 2024-09-19 22:44:32,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=760440.0, ans=0.125 2024-09-19 22:44:55,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=760480.0, ans=0.2 2024-09-19 22:45:18,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=760560.0, ans=0.025 2024-09-19 22:45:21,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.80 vs. limit=6.0 2024-09-19 22:45:29,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=760560.0, ans=0.5 2024-09-19 22:45:33,223 INFO [train.py:1198] (1/2) Epoch 43, batch 100, loss[loss=0.2371, ctc_loss=0.1236, cr_loss=0.3826, attn_decoder_loss=0.2412, over 29511.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1133, cr_loss=0.355, attn_decoder_loss=0.2411, over 2252681.19 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:45:34,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=760600.0, ans=0.125 2024-09-19 22:46:10,562 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.246e+01 8.774e+01 9.184e+01 9.707e+01 2.214e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-19 22:46:10,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=760680.0, ans=0.0 2024-09-19 22:46:18,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=760720.0, ans=0.125 2024-09-19 22:46:19,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=760720.0, ans=0.0 2024-09-19 22:46:53,106 INFO [train.py:1198] (1/2) Epoch 43, batch 150, loss[loss=0.2178, ctc_loss=0.1051, cr_loss=0.326, attn_decoder_loss=0.223, over 29412.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1113, cr_loss=0.3509, attn_decoder_loss=0.239, over 3047069.27 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:47:07,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=760840.0, ans=0.2 2024-09-19 22:47:42,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=760920.0, ans=0.125 2024-09-19 22:48:07,804 INFO [train.py:1198] (1/2) Epoch 43, batch 200, loss[loss=0.2475, ctc_loss=0.1275, cr_loss=0.3736, attn_decoder_loss=0.2525, over 27439.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1109, cr_loss=0.3506, attn_decoder_loss=0.2378, over 3658866.59 frames. ], batch size: 124, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:48:16,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2024-09-19 22:48:17,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-09-19 22:48:18,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=761000.0, ans=0.2 2024-09-19 22:48:38,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=761080.0, ans=0.2 2024-09-19 22:48:45,406 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.451e+01 8.919e+01 9.338e+01 1.606e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 22:49:23,042 INFO [train.py:1198] (1/2) Epoch 43, batch 250, loss[loss=0.2532, ctc_loss=0.1242, cr_loss=0.3698, attn_decoder_loss=0.2594, over 29228.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1106, cr_loss=0.3504, attn_decoder_loss=0.2377, over 4142916.46 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:49:43,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2024-09-19 22:50:05,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=761280.0, ans=0.1 2024-09-19 22:50:06,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=22.5 2024-09-19 22:50:26,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-09-19 22:50:40,731 INFO [train.py:1198] (1/2) Epoch 43, batch 300, loss[loss=0.2438, ctc_loss=0.1216, cr_loss=0.3813, attn_decoder_loss=0.2489, over 29491.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1102, cr_loss=0.3493, attn_decoder_loss=0.2374, over 4510841.92 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:50:41,078 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:51:20,734 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.675e+01 9.148e+01 9.609e+01 2.085e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-19 22:51:28,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=761520.0, ans=0.09899494936611666 2024-09-19 22:51:33,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761520.0, ans=0.1 2024-09-19 22:51:47,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.45 vs. limit=15.0 2024-09-19 22:51:51,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=761560.0, ans=0.025 2024-09-19 22:51:59,191 INFO [train.py:1198] (1/2) Epoch 43, batch 350, loss[loss=0.2061, ctc_loss=0.09101, cr_loss=0.3062, attn_decoder_loss=0.212, over 29295.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1109, cr_loss=0.3507, attn_decoder_loss=0.2382, over 4795956.86 frames. ], batch size: 71, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:52:03,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.63 vs. limit=15.0 2024-09-19 22:52:14,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=761640.0, ans=0.125 2024-09-19 22:52:39,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=761680.0, ans=0.0 2024-09-19 22:52:44,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=761720.0, ans=0.0 2024-09-19 22:52:48,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2024-09-19 22:52:54,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-09-19 22:52:58,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=761760.0, ans=0.125 2024-09-19 22:53:14,428 INFO [train.py:1198] (1/2) Epoch 43, batch 400, loss[loss=0.2385, ctc_loss=0.113, cr_loss=0.3845, attn_decoder_loss=0.2439, over 29690.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1106, cr_loss=0.3502, attn_decoder_loss=0.2377, over 5026444.94 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 32.0 2024-09-19 22:53:16,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-09-19 22:53:19,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=761800.0, ans=0.0 2024-09-19 22:53:45,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=761880.0, ans=0.0 2024-09-19 22:53:47,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=761880.0, ans=0.07 2024-09-19 22:53:53,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.678e+01 9.168e+01 9.670e+01 1.497e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-19 22:54:00,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=761920.0, ans=0.2 2024-09-19 22:54:04,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=761920.0, ans=0.2 2024-09-19 22:54:06,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=761920.0, ans=0.125 2024-09-19 22:54:11,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-09-19 22:54:15,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761960.0, ans=0.1 2024-09-19 22:54:26,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=761960.0, ans=0.0 2024-09-19 22:54:32,416 INFO [train.py:1198] (1/2) Epoch 43, batch 450, loss[loss=0.2364, ctc_loss=0.1096, cr_loss=0.3396, attn_decoder_loss=0.2429, over 29703.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1108, cr_loss=0.3503, attn_decoder_loss=0.238, over 5187165.90 frames. ], batch size: 83, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:54:32,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=762000.0, ans=0.1 2024-09-19 22:55:04,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.72 vs. limit=5.0 2024-09-19 22:55:05,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762080.0, ans=0.1 2024-09-19 22:55:05,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=762080.0, ans=0.125 2024-09-19 22:55:09,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=762080.0, ans=0.0 2024-09-19 22:55:12,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=762080.0, ans=0.125 2024-09-19 22:55:12,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=762080.0, ans=0.125 2024-09-19 22:55:38,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=762160.0, ans=0.0 2024-09-19 22:55:50,316 INFO [train.py:1198] (1/2) Epoch 43, batch 500, loss[loss=0.2474, ctc_loss=0.1181, cr_loss=0.3731, attn_decoder_loss=0.2535, over 29415.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1106, cr_loss=0.3503, attn_decoder_loss=0.2374, over 5328870.64 frames. ], batch size: 94, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:55:50,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=762200.0, ans=0.09899494936611666 2024-09-19 22:56:14,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=762240.0, ans=0.125 2024-09-19 22:56:14,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=762240.0, ans=0.0 2024-09-19 22:56:19,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=762280.0, ans=0.0 2024-09-19 22:56:29,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.617e+01 8.998e+01 9.696e+01 3.544e+02, threshold=1.800e+02, percent-clipped=2.0 2024-09-19 22:56:36,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=762320.0, ans=10.0 2024-09-19 22:56:43,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=762320.0, ans=0.125 2024-09-19 22:57:01,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=762360.0, ans=0.0 2024-09-19 22:57:06,489 INFO [train.py:1198] (1/2) Epoch 43, batch 550, loss[loss=0.2416, ctc_loss=0.1123, cr_loss=0.3496, attn_decoder_loss=0.2482, over 28791.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1108, cr_loss=0.3506, attn_decoder_loss=0.2377, over 5421502.84 frames. ], batch size: 104, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:57:19,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2024-09-19 22:57:29,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=762440.0, ans=0.0 2024-09-19 22:57:41,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=762480.0, ans=0.0 2024-09-19 22:57:54,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=762520.0, ans=0.07 2024-09-19 22:58:24,034 INFO [train.py:1198] (1/2) Epoch 43, batch 600, loss[loss=0.2469, ctc_loss=0.1232, cr_loss=0.3807, attn_decoder_loss=0.2522, over 29295.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1112, cr_loss=0.3512, attn_decoder_loss=0.2379, over 5507455.25 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:58:28,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=762600.0, ans=0.125 2024-09-19 22:59:05,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 8.510e+01 8.971e+01 9.586e+01 1.722e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 22:59:24,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=762760.0, ans=0.125 2024-09-19 22:59:41,227 INFO [train.py:1198] (1/2) Epoch 43, batch 650, loss[loss=0.2341, ctc_loss=0.1087, cr_loss=0.3341, attn_decoder_loss=0.2407, over 29755.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1101, cr_loss=0.3484, attn_decoder_loss=0.2368, over 5585369.31 frames. ], batch size: 81, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:59:52,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-09-19 22:59:53,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=762800.0, ans=0.025 2024-09-19 22:59:57,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-09-19 23:00:05,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=762840.0, ans=0.125 2024-09-19 23:00:25,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=762920.0, ans=0.025 2024-09-19 23:00:38,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762920.0, ans=0.1 2024-09-19 23:00:43,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-19 23:00:44,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762960.0, ans=0.1 2024-09-19 23:00:49,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.84 vs. limit=15.0 2024-09-19 23:00:56,552 INFO [train.py:1198] (1/2) Epoch 43, batch 700, loss[loss=0.2267, ctc_loss=0.1069, cr_loss=0.3411, attn_decoder_loss=0.2325, over 29535.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1109, cr_loss=0.3504, attn_decoder_loss=0.2377, over 5635666.80 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:01:14,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=763040.0, ans=0.2 2024-09-19 23:01:25,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=763080.0, ans=0.0 2024-09-19 23:01:35,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.576e+01 9.155e+01 9.558e+01 1.416e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-19 23:01:42,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=763120.0, ans=0.0 2024-09-19 23:01:46,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=763120.0, ans=0.125 2024-09-19 23:01:54,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=763120.0, ans=0.125 2024-09-19 23:01:55,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=763160.0, ans=0.025 2024-09-19 23:02:00,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=763160.0, ans=0.125 2024-09-19 23:02:03,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=763160.0, ans=0.04949747468305833 2024-09-19 23:02:14,565 INFO [train.py:1198] (1/2) Epoch 43, batch 750, loss[loss=0.2247, ctc_loss=0.108, cr_loss=0.3331, attn_decoder_loss=0.2303, over 29705.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1104, cr_loss=0.3485, attn_decoder_loss=0.2372, over 5675253.17 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:02:47,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=763280.0, ans=0.125 2024-09-19 23:03:00,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=763320.0, ans=0.2 2024-09-19 23:03:03,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=763320.0, ans=0.0 2024-09-19 23:03:05,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-09-19 23:03:28,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-09-19 23:03:31,977 INFO [train.py:1198] (1/2) Epoch 43, batch 800, loss[loss=0.2064, ctc_loss=0.08905, cr_loss=0.294, attn_decoder_loss=0.2129, over 29595.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1108, cr_loss=0.3499, attn_decoder_loss=0.2376, over 5705231.03 frames. ], batch size: 73, lr: 2.57e-03, grad_scale: 32.0 2024-09-19 23:03:34,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-19 23:03:39,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=763400.0, ans=0.125 2024-09-19 23:03:47,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=763440.0, ans=0.0 2024-09-19 23:04:13,917 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.564e+01 8.973e+01 9.746e+01 2.709e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-19 23:04:14,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2024-09-19 23:04:20,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=763520.0, ans=0.125 2024-09-19 23:04:29,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=763520.0, ans=0.125 2024-09-19 23:04:32,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763560.0, ans=0.1 2024-09-19 23:04:36,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=763560.0, ans=0.0 2024-09-19 23:04:42,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2024-09-19 23:04:47,009 INFO [train.py:1198] (1/2) Epoch 43, batch 850, loss[loss=0.2321, ctc_loss=0.1026, cr_loss=0.3281, attn_decoder_loss=0.2392, over 29736.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1101, cr_loss=0.3482, attn_decoder_loss=0.2369, over 5734550.08 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:04:48,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=763600.0, ans=0.125 2024-09-19 23:05:00,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=763640.0, ans=0.025 2024-09-19 23:05:09,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=763640.0, ans=0.125 2024-09-19 23:05:33,876 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:05:44,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=763720.0, ans=0.2 2024-09-19 23:05:53,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=763760.0, ans=0.0 2024-09-19 23:06:03,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=763800.0, ans=0.0 2024-09-19 23:06:04,722 INFO [train.py:1198] (1/2) Epoch 43, batch 900, loss[loss=0.2052, ctc_loss=0.08667, cr_loss=0.291, attn_decoder_loss=0.2119, over 29621.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1105, cr_loss=0.3491, attn_decoder_loss=0.2376, over 5739380.00 frames. ], batch size: 73, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:06:09,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=763800.0, ans=0.125 2024-09-19 23:06:12,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=763800.0, ans=0.125 2024-09-19 23:06:42,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=763880.0, ans=0.025 2024-09-19 23:06:46,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.546e+01 9.046e+01 9.640e+01 1.475e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 23:06:48,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2024-09-19 23:07:04,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=763920.0, ans=0.125 2024-09-19 23:07:10,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=763960.0, ans=0.0 2024-09-19 23:07:12,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=12.0 2024-09-19 23:07:16,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=763960.0, ans=0.2 2024-09-19 23:07:22,194 INFO [train.py:1198] (1/2) Epoch 43, batch 950, loss[loss=0.218, ctc_loss=0.09513, cr_loss=0.315, attn_decoder_loss=0.2247, over 29496.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1105, cr_loss=0.3491, attn_decoder_loss=0.2377, over 5740038.17 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:07:38,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=764040.0, ans=0.125 2024-09-19 23:07:53,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=764080.0, ans=0.2 2024-09-19 23:07:58,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=764080.0, ans=0.05 2024-09-19 23:08:36,655 INFO [train.py:1198] (1/2) Epoch 43, batch 1000, loss[loss=0.221, ctc_loss=0.1039, cr_loss=0.3367, attn_decoder_loss=0.2265, over 29507.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1113, cr_loss=0.3504, attn_decoder_loss=0.2385, over 5735711.37 frames. ], batch size: 77, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:08:41,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=764200.0, ans=0.05 2024-09-19 23:09:18,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.366e+01 8.655e+01 9.178e+01 9.837e+01 2.417e+02, threshold=1.836e+02, percent-clipped=2.0 2024-09-19 23:09:24,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=22.5 2024-09-19 23:09:33,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2024-09-19 23:09:50,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=764360.0, ans=0.0 2024-09-19 23:09:54,201 INFO [train.py:1198] (1/2) Epoch 43, batch 1050, loss[loss=0.237, ctc_loss=0.1145, cr_loss=0.358, attn_decoder_loss=0.2426, over 29666.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1114, cr_loss=0.3508, attn_decoder_loss=0.2381, over 5744350.81 frames. ], batch size: 85, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:09:56,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=764400.0, ans=0.2 2024-09-19 23:10:14,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=764440.0, ans=0.125 2024-09-19 23:10:18,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=764440.0, ans=0.125 2024-09-19 23:10:32,697 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:10:35,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=764480.0, ans=0.2 2024-09-19 23:10:43,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.27 vs. limit=22.5 2024-09-19 23:10:57,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=22.5 2024-09-19 23:11:06,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=764560.0, ans=0.125 2024-09-19 23:11:11,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.82 vs. limit=15.0 2024-09-19 23:11:11,872 INFO [train.py:1198] (1/2) Epoch 43, batch 1100, loss[loss=0.226, ctc_loss=0.107, cr_loss=0.3338, attn_decoder_loss=0.2318, over 29456.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1111, cr_loss=0.3496, attn_decoder_loss=0.2379, over 5757890.24 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:11:15,226 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:11:16,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=764600.0, ans=0.0 2024-09-19 23:11:33,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=764640.0, ans=0.1 2024-09-19 23:11:41,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.48 vs. limit=15.0 2024-09-19 23:11:48,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=764680.0, ans=0.0 2024-09-19 23:11:54,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.378e+01 8.891e+01 9.353e+01 1.322e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 23:12:11,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=764760.0, ans=0.0 2024-09-19 23:12:21,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=764760.0, ans=0.125 2024-09-19 23:12:27,897 INFO [train.py:1198] (1/2) Epoch 43, batch 1150, loss[loss=0.2153, ctc_loss=0.09932, cr_loss=0.3217, attn_decoder_loss=0.221, over 29437.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1104, cr_loss=0.3487, attn_decoder_loss=0.2375, over 5755684.19 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:12:32,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764800.0, ans=0.1 2024-09-19 23:12:36,030 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:12:40,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=764800.0, ans=0.125 2024-09-19 23:12:48,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=764840.0, ans=0.125 2024-09-19 23:13:00,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=764880.0, ans=0.125 2024-09-19 23:13:09,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=764880.0, ans=0.125 2024-09-19 23:13:12,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=764920.0, ans=0.95 2024-09-19 23:13:13,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.95 vs. limit=22.5 2024-09-19 23:13:15,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764920.0, ans=0.1 2024-09-19 23:13:21,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-09-19 23:13:37,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-09-19 23:13:37,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.30 vs. limit=15.0 2024-09-19 23:13:45,768 INFO [train.py:1198] (1/2) Epoch 43, batch 1200, loss[loss=0.2474, ctc_loss=0.1202, cr_loss=0.3618, attn_decoder_loss=0.2535, over 29661.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1111, cr_loss=0.35, attn_decoder_loss=0.2384, over 5748452.26 frames. ], batch size: 85, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:13:49,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=765000.0, ans=0.125 2024-09-19 23:14:28,137 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.366e+01 8.714e+01 9.128e+01 9.687e+01 4.379e+02, threshold=1.826e+02, percent-clipped=2.0 2024-09-19 23:14:34,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=765120.0, ans=0.125 2024-09-19 23:14:56,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=765160.0, ans=0.09899494936611666 2024-09-19 23:15:03,373 INFO [train.py:1198] (1/2) Epoch 43, batch 1250, loss[loss=0.2642, ctc_loss=0.1474, cr_loss=0.4502, attn_decoder_loss=0.2671, over 29539.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1119, cr_loss=0.3525, attn_decoder_loss=0.2393, over 5775370.11 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:15:23,406 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:15:27,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=765240.0, ans=0.125 2024-09-19 23:15:35,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=765280.0, ans=0.0 2024-09-19 23:15:41,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=765280.0, ans=0.0 2024-09-19 23:15:48,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=765320.0, ans=0.0 2024-09-19 23:15:57,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.26 vs. limit=10.0 2024-09-19 23:15:59,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=765320.0, ans=0.1 2024-09-19 23:16:07,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=765360.0, ans=0.025 2024-09-19 23:16:19,063 INFO [train.py:1198] (1/2) Epoch 43, batch 1300, loss[loss=0.2435, ctc_loss=0.1133, cr_loss=0.3558, attn_decoder_loss=0.2501, over 28348.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1114, cr_loss=0.3512, attn_decoder_loss=0.2386, over 5779502.48 frames. ], batch size: 111, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:16:56,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-09-19 23:17:01,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.583e+01 9.046e+01 9.582e+01 1.774e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 23:17:18,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=765560.0, ans=0.035 2024-09-19 23:17:37,164 INFO [train.py:1198] (1/2) Epoch 43, batch 1350, loss[loss=0.2377, ctc_loss=0.1138, cr_loss=0.3519, attn_decoder_loss=0.2436, over 29759.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1111, cr_loss=0.3506, attn_decoder_loss=0.2383, over 5797934.82 frames. ], batch size: 81, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:17:45,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=22.5 2024-09-19 23:18:34,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=765720.0, ans=0.0 2024-09-19 23:18:35,850 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:18:54,303 INFO [train.py:1198] (1/2) Epoch 43, batch 1400, loss[loss=0.2065, ctc_loss=0.09014, cr_loss=0.3006, attn_decoder_loss=0.2128, over 29588.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1105, cr_loss=0.35, attn_decoder_loss=0.2379, over 5808784.94 frames. ], batch size: 69, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:19:15,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=765840.0, ans=0.0 2024-09-19 23:19:22,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=765880.0, ans=0.125 2024-09-19 23:19:33,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765880.0, ans=0.1 2024-09-19 23:19:36,333 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.462e+01 9.127e+01 9.642e+01 1.340e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-19 23:19:39,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=765920.0, ans=0.125 2024-09-19 23:19:41,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=765920.0, ans=0.125 2024-09-19 23:19:56,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=765960.0, ans=0.2 2024-09-19 23:20:02,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=765960.0, ans=0.0 2024-09-19 23:20:09,473 INFO [train.py:1198] (1/2) Epoch 43, batch 1450, loss[loss=0.2403, ctc_loss=0.1198, cr_loss=0.3761, attn_decoder_loss=0.2454, over 29437.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1106, cr_loss=0.3503, attn_decoder_loss=0.238, over 5805816.72 frames. ], batch size: 94, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:20:21,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=766000.0, ans=0.025 2024-09-19 23:20:33,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=766040.0, ans=0.2 2024-09-19 23:20:48,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=766080.0, ans=0.125 2024-09-19 23:20:52,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2024-09-19 23:20:56,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=766120.0, ans=0.125 2024-09-19 23:21:06,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=766120.0, ans=0.2 2024-09-19 23:21:14,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=766160.0, ans=0.0 2024-09-19 23:21:26,805 INFO [train.py:1198] (1/2) Epoch 43, batch 1500, loss[loss=0.2433, ctc_loss=0.1141, cr_loss=0.3603, attn_decoder_loss=0.2497, over 29635.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1105, cr_loss=0.3497, attn_decoder_loss=0.2383, over 5807238.34 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:21:30,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=766200.0, ans=0.1 2024-09-19 23:21:40,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=766240.0, ans=0.0 2024-09-19 23:22:09,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.602e+01 9.131e+01 9.560e+01 1.543e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-19 23:22:25,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2024-09-19 23:22:29,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=766360.0, ans=0.0 2024-09-19 23:22:30,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-09-19 23:22:30,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=15.0 2024-09-19 23:22:38,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.24 vs. limit=15.0 2024-09-19 23:22:39,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=766360.0, ans=0.1 2024-09-19 23:22:45,498 INFO [train.py:1198] (1/2) Epoch 43, batch 1550, loss[loss=0.2539, ctc_loss=0.1239, cr_loss=0.3681, attn_decoder_loss=0.2602, over 29510.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1108, cr_loss=0.3503, attn_decoder_loss=0.2385, over 5782940.70 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:23:00,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=766440.0, ans=0.1 2024-09-19 23:23:17,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2024-09-19 23:23:24,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=766480.0, ans=0.125 2024-09-19 23:24:00,259 INFO [train.py:1198] (1/2) Epoch 43, batch 1600, loss[loss=0.2392, ctc_loss=0.1152, cr_loss=0.3697, attn_decoder_loss=0.2447, over 29692.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1107, cr_loss=0.3495, attn_decoder_loss=0.2382, over 5763852.38 frames. ], batch size: 85, lr: 2.56e-03, grad_scale: 32.0 2024-09-19 23:24:03,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=766600.0, ans=0.125 2024-09-19 23:24:44,022 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 8.578e+01 9.126e+01 9.935e+01 1.775e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-19 23:24:45,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=766720.0, ans=0.025 2024-09-19 23:25:09,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=766760.0, ans=0.025 2024-09-19 23:25:10,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=766760.0, ans=0.125 2024-09-19 23:25:17,747 INFO [train.py:1198] (1/2) Epoch 43, batch 1650, loss[loss=0.2462, ctc_loss=0.1192, cr_loss=0.3751, attn_decoder_loss=0.252, over 29739.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1107, cr_loss=0.3494, attn_decoder_loss=0.2381, over 5759616.47 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:25:22,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=766800.0, ans=0.2 2024-09-19 23:25:31,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=766840.0, ans=0.025 2024-09-19 23:25:52,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=766880.0, ans=0.2 2024-09-19 23:26:10,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=766920.0, ans=0.09899494936611666 2024-09-19 23:26:10,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=766920.0, ans=0.2 2024-09-19 23:26:11,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.79 vs. limit=22.5 2024-09-19 23:26:20,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766960.0, ans=0.1 2024-09-19 23:26:34,714 INFO [train.py:1198] (1/2) Epoch 43, batch 1700, loss[loss=0.206, ctc_loss=0.09917, cr_loss=0.335, attn_decoder_loss=0.2104, over 29586.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1104, cr_loss=0.3492, attn_decoder_loss=0.2378, over 5780312.30 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:26:38,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=767000.0, ans=0.2 2024-09-19 23:26:57,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff2.min_abs, batch_count=767040.0, ans=0.1 2024-09-19 23:27:00,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=767040.0, ans=0.125 2024-09-19 23:27:08,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=767080.0, ans=0.125 2024-09-19 23:27:17,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=767080.0, ans=0.125 2024-09-19 23:27:18,340 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.484e+01 9.017e+01 9.514e+01 1.146e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-19 23:27:39,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=767160.0, ans=0.0 2024-09-19 23:27:44,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=767160.0, ans=0.0 2024-09-19 23:27:50,590 INFO [train.py:1198] (1/2) Epoch 43, batch 1750, loss[loss=0.2024, ctc_loss=0.09261, cr_loss=0.3115, attn_decoder_loss=0.2077, over 29381.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1099, cr_loss=0.3485, attn_decoder_loss=0.2372, over 5787517.25 frames. ], batch size: 67, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:28:09,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-19 23:28:25,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=767280.0, ans=0.125 2024-09-19 23:28:46,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=767320.0, ans=0.125 2024-09-19 23:29:06,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=767400.0, ans=0.2 2024-09-19 23:29:07,598 INFO [train.py:1198] (1/2) Epoch 43, batch 1800, loss[loss=0.2441, ctc_loss=0.1203, cr_loss=0.3678, attn_decoder_loss=0.2497, over 29696.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1101, cr_loss=0.3487, attn_decoder_loss=0.2376, over 5789704.19 frames. ], batch size: 83, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:29:38,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=767480.0, ans=0.125 2024-09-19 23:29:45,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=767480.0, ans=0.125 2024-09-19 23:29:47,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=767480.0, ans=0.0 2024-09-19 23:29:50,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=767480.0, ans=0.125 2024-09-19 23:29:51,304 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.348e+01 8.939e+01 9.600e+01 1.459e+02, threshold=1.788e+02, percent-clipped=0.0 2024-09-19 23:30:12,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=767560.0, ans=0.0 2024-09-19 23:30:12,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=767560.0, ans=0.025 2024-09-19 23:30:15,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=767560.0, ans=0.125 2024-09-19 23:30:20,604 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:30:23,258 INFO [train.py:1198] (1/2) Epoch 43, batch 1850, loss[loss=0.2485, ctc_loss=0.1216, cr_loss=0.3692, attn_decoder_loss=0.2544, over 29611.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1101, cr_loss=0.349, attn_decoder_loss=0.2375, over 5795027.02 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:30:36,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=767600.0, ans=0.09899494936611666 2024-09-19 23:30:41,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767640.0, ans=0.1 2024-09-19 23:31:06,035 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:31:08,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.96 vs. limit=10.0 2024-09-19 23:31:21,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-19 23:31:40,272 INFO [train.py:1198] (1/2) Epoch 43, batch 1900, loss[loss=0.2471, ctc_loss=0.1187, cr_loss=0.3766, attn_decoder_loss=0.253, over 29708.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1108, cr_loss=0.3498, attn_decoder_loss=0.2382, over 5804335.63 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:31:43,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=767800.0, ans=0.04949747468305833 2024-09-19 23:31:45,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767800.0, ans=0.1 2024-09-19 23:31:55,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=767840.0, ans=0.125 2024-09-19 23:32:06,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=767840.0, ans=0.125 2024-09-19 23:32:07,945 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:32:24,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 8.779e+01 9.176e+01 9.742e+01 1.549e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-19 23:32:39,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767960.0, ans=0.1 2024-09-19 23:32:41,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=767960.0, ans=0.125 2024-09-19 23:32:42,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=767960.0, ans=0.125 2024-09-19 23:33:04,950 INFO [train.py:1198] (1/2) Epoch 43, batch 1950, loss[loss=0.229, ctc_loss=0.1128, cr_loss=0.356, attn_decoder_loss=0.234, over 29451.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1115, cr_loss=0.3512, attn_decoder_loss=0.2394, over 5819362.59 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:33:14,456 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:33:20,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=768040.0, ans=0.2 2024-09-19 23:33:31,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=768040.0, ans=0.125 2024-09-19 23:33:37,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768080.0, ans=0.1 2024-09-19 23:33:37,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.91 vs. limit=10.0 2024-09-19 23:33:55,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768120.0, ans=0.1 2024-09-19 23:33:55,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=768120.0, ans=0.0 2024-09-19 23:34:01,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=768120.0, ans=0.09899494936611666 2024-09-19 23:34:01,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=768120.0, ans=0.0 2024-09-19 23:34:07,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=768160.0, ans=0.125 2024-09-19 23:34:20,488 INFO [train.py:1198] (1/2) Epoch 43, batch 2000, loss[loss=0.2077, ctc_loss=0.09201, cr_loss=0.3097, attn_decoder_loss=0.2136, over 29346.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1118, cr_loss=0.3516, attn_decoder_loss=0.2394, over 5796871.44 frames. ], batch size: 67, lr: 2.56e-03, grad_scale: 32.0 2024-09-19 23:34:39,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=768240.0, ans=0.125 2024-09-19 23:34:51,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2024-09-19 23:35:00,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=768280.0, ans=0.0 2024-09-19 23:35:07,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.936e+01 8.662e+01 9.256e+01 9.828e+01 2.553e+02, threshold=1.851e+02, percent-clipped=3.0 2024-09-19 23:35:19,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2024-09-19 23:35:20,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768320.0, ans=0.1 2024-09-19 23:35:25,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.06 vs. limit=15.0 2024-09-19 23:35:28,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.45 vs. limit=6.0 2024-09-19 23:35:38,230 INFO [train.py:1198] (1/2) Epoch 43, batch 2050, loss[loss=0.2046, ctc_loss=0.08885, cr_loss=0.2911, attn_decoder_loss=0.211, over 29426.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1111, cr_loss=0.35, attn_decoder_loss=0.2384, over 5788160.41 frames. ], batch size: 70, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:35:46,102 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:35:59,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=768440.0, ans=0.0 2024-09-19 23:36:35,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=768520.0, ans=0.125 2024-09-19 23:36:35,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=768520.0, ans=0.025 2024-09-19 23:36:55,474 INFO [train.py:1198] (1/2) Epoch 43, batch 2100, loss[loss=0.2307, ctc_loss=0.1045, cr_loss=0.3486, attn_decoder_loss=0.237, over 29751.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1103, cr_loss=0.349, attn_decoder_loss=0.2379, over 5799749.63 frames. ], batch size: 81, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:37:04,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=768600.0, ans=0.125 2024-09-19 23:37:19,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=768640.0, ans=0.125 2024-09-19 23:37:34,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768680.0, ans=0.1 2024-09-19 23:37:34,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=768680.0, ans=0.125 2024-09-19 23:37:41,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.395e+01 8.911e+01 9.448e+01 1.160e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-19 23:38:01,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768760.0, ans=0.1 2024-09-19 23:38:10,689 INFO [train.py:1198] (1/2) Epoch 43, batch 2150, loss[loss=0.227, ctc_loss=0.1087, cr_loss=0.3452, attn_decoder_loss=0.2324, over 29442.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1101, cr_loss=0.3486, attn_decoder_loss=0.2373, over 5814097.03 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:38:14,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=768800.0, ans=0.125 2024-09-19 23:38:32,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=768840.0, ans=0.125 2024-09-19 23:38:44,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=768880.0, ans=0.2 2024-09-19 23:38:47,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=768880.0, ans=0.025 2024-09-19 23:39:04,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=768920.0, ans=0.125 2024-09-19 23:39:14,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.39 vs. limit=15.0 2024-09-19 23:39:28,381 INFO [train.py:1198] (1/2) Epoch 43, batch 2200, loss[loss=0.2386, ctc_loss=0.1191, cr_loss=0.3661, attn_decoder_loss=0.2438, over 29639.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1105, cr_loss=0.3493, attn_decoder_loss=0.2377, over 5810609.59 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:39:28,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=769000.0, ans=0.0 2024-09-19 23:39:36,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=769000.0, ans=0.0 2024-09-19 23:39:45,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=769040.0, ans=10.0 2024-09-19 23:39:55,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=769040.0, ans=0.0 2024-09-19 23:39:57,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.74 vs. limit=15.0 2024-09-19 23:40:04,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=769080.0, ans=0.2 2024-09-19 23:40:13,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=769120.0, ans=0.125 2024-09-19 23:40:14,959 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.529e+01 9.034e+01 9.598e+01 1.063e+03, threshold=1.807e+02, percent-clipped=3.0 2024-09-19 23:40:46,043 INFO [train.py:1198] (1/2) Epoch 43, batch 2250, loss[loss=0.2355, ctc_loss=0.1051, cr_loss=0.3241, attn_decoder_loss=0.2428, over 29723.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1103, cr_loss=0.349, attn_decoder_loss=0.2378, over 5809686.13 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:41:07,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=769240.0, ans=0.125 2024-09-19 23:41:33,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2024-09-19 23:41:37,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=769320.0, ans=0.125 2024-09-19 23:42:01,234 INFO [train.py:1198] (1/2) Epoch 43, batch 2300, loss[loss=0.2106, ctc_loss=0.09194, cr_loss=0.3037, attn_decoder_loss=0.217, over 29328.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1098, cr_loss=0.3482, attn_decoder_loss=0.237, over 5797930.02 frames. ], batch size: 71, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:42:30,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=769440.0, ans=0.125 2024-09-19 23:42:39,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=769480.0, ans=0.0 2024-09-19 23:42:42,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=769480.0, ans=0.1 2024-09-19 23:42:49,953 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.425e+01 9.007e+01 9.590e+01 1.483e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-19 23:43:03,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=769560.0, ans=0.0 2024-09-19 23:43:03,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=769560.0, ans=0.125 2024-09-19 23:43:05,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=769560.0, ans=0.125 2024-09-19 23:43:13,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-09-19 23:43:13,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.82 vs. limit=12.0 2024-09-19 23:43:19,010 INFO [train.py:1198] (1/2) Epoch 43, batch 2350, loss[loss=0.2514, ctc_loss=0.1315, cr_loss=0.3989, attn_decoder_loss=0.2558, over 29697.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1097, cr_loss=0.3481, attn_decoder_loss=0.2371, over 5804068.08 frames. ], batch size: 83, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:43:22,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=769600.0, ans=0.1 2024-09-19 23:43:31,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-09-19 23:43:47,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=769680.0, ans=0.125 2024-09-19 23:43:49,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=769680.0, ans=0.125 2024-09-19 23:44:05,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=769720.0, ans=0.125 2024-09-19 23:44:18,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=769760.0, ans=0.2 2024-09-19 23:44:36,555 INFO [train.py:1198] (1/2) Epoch 43, batch 2400, loss[loss=0.2277, ctc_loss=0.1104, cr_loss=0.3471, attn_decoder_loss=0.2331, over 29539.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1101, cr_loss=0.3484, attn_decoder_loss=0.2374, over 5807063.07 frames. ], batch size: 76, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:44:42,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=769800.0, ans=0.0 2024-09-19 23:44:51,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=769840.0, ans=0.0 2024-09-19 23:45:05,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=769880.0, ans=0.0 2024-09-19 23:45:14,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-09-19 23:45:16,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769880.0, ans=0.1 2024-09-19 23:45:22,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=769920.0, ans=0.0 2024-09-19 23:45:22,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=769920.0, ans=0.125 2024-09-19 23:45:23,259 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 8.868e+01 9.245e+01 1.005e+02 2.989e+02, threshold=1.849e+02, percent-clipped=3.0 2024-09-19 23:45:40,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=769960.0, ans=0.0 2024-09-19 23:45:46,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.91 vs. limit=22.5 2024-09-19 23:45:52,121 INFO [train.py:1198] (1/2) Epoch 43, batch 2450, loss[loss=0.2437, ctc_loss=0.1227, cr_loss=0.3812, attn_decoder_loss=0.2487, over 29719.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.111, cr_loss=0.3502, attn_decoder_loss=0.2383, over 5784598.26 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:45:53,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2024-09-19 23:45:56,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=770000.0, ans=0.125 2024-09-19 23:46:23,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=770080.0, ans=0.125 2024-09-19 23:46:29,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770080.0, ans=0.1 2024-09-19 23:46:41,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=770120.0, ans=0.125 2024-09-19 23:46:41,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=770120.0, ans=0.0 2024-09-19 23:47:02,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=770160.0, ans=0.125 2024-09-19 23:47:07,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=770160.0, ans=0.0 2024-09-19 23:47:09,624 INFO [train.py:1198] (1/2) Epoch 43, batch 2500, loss[loss=0.2448, ctc_loss=0.1121, cr_loss=0.3554, attn_decoder_loss=0.2516, over 29641.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1114, cr_loss=0.3515, attn_decoder_loss=0.2386, over 5794424.45 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:47:35,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=770240.0, ans=0.0 2024-09-19 23:47:37,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=770240.0, ans=0.125 2024-09-19 23:47:56,926 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.730e+01 9.095e+01 9.659e+01 1.544e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-19 23:48:03,411 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:48:18,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770360.0, ans=0.1 2024-09-19 23:48:28,151 INFO [train.py:1198] (1/2) Epoch 43, batch 2550, loss[loss=0.2094, ctc_loss=0.09626, cr_loss=0.3249, attn_decoder_loss=0.2148, over 29328.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1115, cr_loss=0.3517, attn_decoder_loss=0.2387, over 5798073.02 frames. ], batch size: 67, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:48:36,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2024-09-19 23:48:46,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=770440.0, ans=0.125 2024-09-19 23:49:09,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=770480.0, ans=0.125 2024-09-19 23:49:15,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-09-19 23:49:43,745 INFO [train.py:1198] (1/2) Epoch 43, batch 2600, loss[loss=0.2295, ctc_loss=0.1141, cr_loss=0.3708, attn_decoder_loss=0.2341, over 29465.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1117, cr_loss=0.3519, attn_decoder_loss=0.2389, over 5794684.44 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:49:54,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=770600.0, ans=0.125 2024-09-19 23:49:56,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=770600.0, ans=0.125 2024-09-19 23:50:06,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=770640.0, ans=0.125 2024-09-19 23:50:07,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=770640.0, ans=0.0 2024-09-19 23:50:28,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=770680.0, ans=0.0 2024-09-19 23:50:32,553 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.619e+01 9.177e+01 9.694e+01 1.714e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-19 23:50:57,094 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:51:01,476 INFO [train.py:1198] (1/2) Epoch 43, batch 2650, loss[loss=0.2397, ctc_loss=0.1122, cr_loss=0.359, attn_decoder_loss=0.2459, over 29257.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1116, cr_loss=0.3521, attn_decoder_loss=0.239, over 5801868.68 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:51:15,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=770840.0, ans=0.125 2024-09-19 23:51:33,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=770880.0, ans=0.0 2024-09-19 23:51:52,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=770920.0, ans=0.025 2024-09-19 23:52:18,213 INFO [train.py:1198] (1/2) Epoch 43, batch 2700, loss[loss=0.2425, ctc_loss=0.1116, cr_loss=0.345, attn_decoder_loss=0.2493, over 29529.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1115, cr_loss=0.3519, attn_decoder_loss=0.2394, over 5798106.82 frames. ], batch size: 87, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:52:37,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-19 23:52:47,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=771080.0, ans=0.1 2024-09-19 23:52:53,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-09-19 23:52:59,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771080.0, ans=0.1 2024-09-19 23:53:05,377 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.492e+01 9.068e+01 9.521e+01 1.768e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-19 23:53:16,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=771120.0, ans=0.125 2024-09-19 23:53:34,513 INFO [train.py:1198] (1/2) Epoch 43, batch 2750, loss[loss=0.2276, ctc_loss=0.1119, cr_loss=0.3586, attn_decoder_loss=0.2325, over 29531.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1105, cr_loss=0.3502, attn_decoder_loss=0.2382, over 5797455.83 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:53:37,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=771200.0, ans=0.125 2024-09-19 23:53:46,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=771200.0, ans=0.0 2024-09-19 23:53:54,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.44 vs. limit=15.0 2024-09-19 23:53:55,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=771240.0, ans=0.125 2024-09-19 23:54:11,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=771280.0, ans=0.125 2024-09-19 23:54:34,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=771320.0, ans=0.04949747468305833 2024-09-19 23:54:48,047 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:54:52,207 INFO [train.py:1198] (1/2) Epoch 43, batch 2800, loss[loss=0.2472, ctc_loss=0.1324, cr_loss=0.3653, attn_decoder_loss=0.2518, over 20053.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1107, cr_loss=0.3501, attn_decoder_loss=0.2383, over 5778327.63 frames. ], batch size: 211, lr: 2.56e-03, grad_scale: 32.0 2024-09-19 23:55:07,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-19 23:55:19,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=771440.0, ans=0.125 2024-09-19 23:55:28,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=771480.0, ans=0.125 2024-09-19 23:55:40,236 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.712e+01 9.201e+01 9.753e+01 5.037e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-19 23:55:59,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.31 vs. limit=22.5 2024-09-19 23:56:03,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=771560.0, ans=0.125 2024-09-19 23:56:07,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=771600.0, ans=0.125 2024-09-19 23:56:08,851 INFO [train.py:1198] (1/2) Epoch 43, batch 2850, loss[loss=0.2284, ctc_loss=0.1065, cr_loss=0.3426, attn_decoder_loss=0.2343, over 29484.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.111, cr_loss=0.3505, attn_decoder_loss=0.2385, over 5762972.53 frames. ], batch size: 77, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:56:13,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=771600.0, ans=0.0 2024-09-19 23:56:15,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=771600.0, ans=0.125 2024-09-19 23:56:19,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=771600.0, ans=0.0 2024-09-19 23:56:35,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=771640.0, ans=0.0 2024-09-19 23:56:50,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-19 23:56:54,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=771720.0, ans=0.125 2024-09-19 23:56:59,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=771720.0, ans=0.2 2024-09-19 23:57:06,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=771720.0, ans=0.05 2024-09-19 23:57:08,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.37 vs. limit=10.0 2024-09-19 23:57:17,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=771760.0, ans=0.1 2024-09-19 23:57:22,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=771760.0, ans=0.125 2024-09-19 23:57:24,705 INFO [train.py:1198] (1/2) Epoch 43, batch 2900, loss[loss=0.2307, ctc_loss=0.1099, cr_loss=0.3549, attn_decoder_loss=0.2362, over 29406.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1117, cr_loss=0.3523, attn_decoder_loss=0.2397, over 5788103.58 frames. ], batch size: 79, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:57:27,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.05 vs. limit=12.0 2024-09-19 23:57:53,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=771880.0, ans=0.125 2024-09-19 23:58:14,811 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.451e+01 8.980e+01 9.523e+01 1.534e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-19 23:58:20,051 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.48 vs. limit=15.0 2024-09-19 23:58:42,034 INFO [train.py:1198] (1/2) Epoch 43, batch 2950, loss[loss=0.2237, ctc_loss=0.1083, cr_loss=0.3523, attn_decoder_loss=0.2287, over 29510.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1104, cr_loss=0.3498, attn_decoder_loss=0.2384, over 5782224.05 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:58:48,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=772000.0, ans=0.125 2024-09-19 23:59:16,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.81 vs. limit=22.5 2024-09-19 23:59:43,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=772160.0, ans=0.0 2024-09-19 23:59:55,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=772160.0, ans=0.0 2024-09-19 23:59:57,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=12.0 2024-09-19 23:59:59,894 INFO [train.py:1198] (1/2) Epoch 43, batch 3000, loss[loss=0.2369, ctc_loss=0.1081, cr_loss=0.3579, attn_decoder_loss=0.2432, over 29755.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1099, cr_loss=0.3487, attn_decoder_loss=0.2381, over 5783415.84 frames. ], batch size: 81, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:59:59,894 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 00:00:18,201 INFO [train.py:1230] (1/2) Epoch 43, validation: loss=0.2118, ctc_loss=0.03672, cr_loss=6.551e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-20 00:00:18,201 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 00:00:29,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772200.0, ans=0.1 2024-09-20 00:00:37,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=772240.0, ans=0.125 2024-09-20 00:00:37,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=772240.0, ans=0.0 2024-09-20 00:01:06,788 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.607e+01 9.085e+01 9.850e+01 2.122e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-20 00:01:11,872 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:01:17,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=772360.0, ans=0.0 2024-09-20 00:01:17,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=772360.0, ans=0.125 2024-09-20 00:01:34,010 INFO [train.py:1198] (1/2) Epoch 43, batch 3050, loss[loss=0.2262, ctc_loss=0.1131, cr_loss=0.3512, attn_decoder_loss=0.231, over 29539.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.111, cr_loss=0.3505, attn_decoder_loss=0.2392, over 5776778.23 frames. ], batch size: 76, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:01:54,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=772440.0, ans=0.2 2024-09-20 00:01:54,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=772440.0, ans=0.1 2024-09-20 00:01:56,211 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:01:57,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=772440.0, ans=0.0 2024-09-20 00:02:12,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=772480.0, ans=0.05 2024-09-20 00:02:14,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-09-20 00:02:51,502 INFO [train.py:1198] (1/2) Epoch 43, batch 3100, loss[loss=0.2445, ctc_loss=0.1216, cr_loss=0.3538, attn_decoder_loss=0.2503, over 29216.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1111, cr_loss=0.3498, attn_decoder_loss=0.2388, over 5778106.95 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:02:52,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2024-09-20 00:03:00,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772600.0, ans=0.1 2024-09-20 00:03:09,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.97 vs. limit=22.5 2024-09-20 00:03:13,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.17 vs. limit=10.0 2024-09-20 00:03:41,811 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.560e+01 8.944e+01 9.719e+01 1.343e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-20 00:03:54,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=772760.0, ans=0.125 2024-09-20 00:04:02,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.95 vs. limit=22.5 2024-09-20 00:04:09,807 INFO [train.py:1198] (1/2) Epoch 43, batch 3150, loss[loss=0.2381, ctc_loss=0.112, cr_loss=0.3575, attn_decoder_loss=0.2442, over 28902.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1108, cr_loss=0.3496, attn_decoder_loss=0.2386, over 5783711.43 frames. ], batch size: 104, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:04:12,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2024-09-20 00:04:15,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-20 00:04:19,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=772800.0, ans=0.025 2024-09-20 00:04:29,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=772840.0, ans=0.125 2024-09-20 00:04:35,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772840.0, ans=0.1 2024-09-20 00:05:07,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=772920.0, ans=0.2 2024-09-20 00:05:14,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2024-09-20 00:05:25,183 INFO [train.py:1198] (1/2) Epoch 43, batch 3200, loss[loss=0.236, ctc_loss=0.1192, cr_loss=0.3771, attn_decoder_loss=0.2406, over 29413.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1109, cr_loss=0.3495, attn_decoder_loss=0.2383, over 5794489.62 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:05:25,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=773000.0, ans=0.0 2024-09-20 00:05:41,985 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:05:50,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=773040.0, ans=0.125 2024-09-20 00:05:55,035 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:05:55,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-09-20 00:06:07,098 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:06:17,438 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 8.459e+01 9.068e+01 9.712e+01 1.068e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 00:06:43,195 INFO [train.py:1198] (1/2) Epoch 43, batch 3250, loss[loss=0.2456, ctc_loss=0.1246, cr_loss=0.3949, attn_decoder_loss=0.2503, over 29715.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1112, cr_loss=0.3506, attn_decoder_loss=0.2389, over 5800841.85 frames. ], batch size: 84, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:06:45,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=773200.0, ans=0.1 2024-09-20 00:07:01,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=773240.0, ans=0.5 2024-09-20 00:07:21,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=773280.0, ans=0.125 2024-09-20 00:07:33,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=773320.0, ans=0.125 2024-09-20 00:07:41,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=773320.0, ans=0.125 2024-09-20 00:07:50,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=773360.0, ans=0.0 2024-09-20 00:07:50,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=773360.0, ans=0.125 2024-09-20 00:07:52,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=773360.0, ans=0.125 2024-09-20 00:08:00,883 INFO [train.py:1198] (1/2) Epoch 43, batch 3300, loss[loss=0.236, ctc_loss=0.1097, cr_loss=0.3362, attn_decoder_loss=0.2425, over 28277.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1103, cr_loss=0.3484, attn_decoder_loss=0.2374, over 5798078.31 frames. ], batch size: 111, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:08:01,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=773400.0, ans=0.1 2024-09-20 00:08:21,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=773440.0, ans=0.125 2024-09-20 00:08:40,636 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:08:41,929 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:08:43,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=773480.0, ans=0.0 2024-09-20 00:08:46,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=773520.0, ans=0.125 2024-09-20 00:08:47,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=773520.0, ans=0.0 2024-09-20 00:08:47,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=773520.0, ans=0.0 2024-09-20 00:08:52,111 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.624e+01 9.248e+01 9.741e+01 2.844e+02, threshold=1.850e+02, percent-clipped=2.0 2024-09-20 00:09:08,606 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:09:10,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=773560.0, ans=0.125 2024-09-20 00:09:13,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.08 vs. limit=15.0 2024-09-20 00:09:16,242 INFO [train.py:1198] (1/2) Epoch 43, batch 3350, loss[loss=0.2498, ctc_loss=0.1216, cr_loss=0.3802, attn_decoder_loss=0.2556, over 29014.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1106, cr_loss=0.349, attn_decoder_loss=0.2381, over 5773758.72 frames. ], batch size: 104, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:09:28,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=773600.0, ans=0.2 2024-09-20 00:09:37,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=773640.0, ans=0.1 2024-09-20 00:09:40,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=773640.0, ans=0.0 2024-09-20 00:09:50,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=773680.0, ans=0.125 2024-09-20 00:10:04,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773720.0, ans=0.1 2024-09-20 00:10:34,069 INFO [train.py:1198] (1/2) Epoch 43, batch 3400, loss[loss=0.2058, ctc_loss=0.09137, cr_loss=0.3018, attn_decoder_loss=0.2118, over 29328.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1107, cr_loss=0.349, attn_decoder_loss=0.238, over 5765795.04 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:10:40,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=773800.0, ans=0.0 2024-09-20 00:10:44,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=12.0 2024-09-20 00:10:46,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=773800.0, ans=0.125 2024-09-20 00:10:49,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=773840.0, ans=0.125 2024-09-20 00:10:50,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=773840.0, ans=0.1 2024-09-20 00:11:01,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-09-20 00:11:03,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=773840.0, ans=0.125 2024-09-20 00:11:11,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=773880.0, ans=0.125 2024-09-20 00:11:15,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=773880.0, ans=0.2 2024-09-20 00:11:27,406 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.527e+01 9.240e+01 9.845e+01 1.909e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-20 00:11:41,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=773960.0, ans=0.04949747468305833 2024-09-20 00:11:47,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=773960.0, ans=0.1 2024-09-20 00:11:47,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=773960.0, ans=0.0 2024-09-20 00:11:51,437 INFO [train.py:1198] (1/2) Epoch 43, batch 3450, loss[loss=0.2259, ctc_loss=0.09544, cr_loss=0.3142, attn_decoder_loss=0.2334, over 28213.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1109, cr_loss=0.3501, attn_decoder_loss=0.2384, over 5774264.13 frames. ], batch size: 111, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:12:13,198 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:12:18,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=12.0 2024-09-20 00:12:43,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774120.0, ans=0.1 2024-09-20 00:12:52,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=774160.0, ans=0.125 2024-09-20 00:13:01,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=774160.0, ans=0.0 2024-09-20 00:13:06,960 INFO [train.py:1198] (1/2) Epoch 43, batch 3500, loss[loss=0.2105, ctc_loss=0.09298, cr_loss=0.3048, attn_decoder_loss=0.2168, over 29315.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1104, cr_loss=0.3494, attn_decoder_loss=0.2378, over 5776231.11 frames. ], batch size: 71, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:13:08,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=774200.0, ans=0.125 2024-09-20 00:13:12,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.34 vs. limit=22.5 2024-09-20 00:13:19,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=774200.0, ans=0.125 2024-09-20 00:13:22,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=774240.0, ans=0.0 2024-09-20 00:13:38,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.10 vs. limit=15.0 2024-09-20 00:13:59,841 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.502e+01 8.947e+01 9.671e+01 2.846e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-20 00:14:22,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=774400.0, ans=0.025 2024-09-20 00:14:23,849 INFO [train.py:1198] (1/2) Epoch 43, batch 3550, loss[loss=0.2428, ctc_loss=0.1033, cr_loss=0.3393, attn_decoder_loss=0.2508, over 29690.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1104, cr_loss=0.3495, attn_decoder_loss=0.2379, over 5781522.46 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:14:35,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=774400.0, ans=0.125 2024-09-20 00:14:54,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=774480.0, ans=0.0 2024-09-20 00:15:01,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.10 vs. limit=22.5 2024-09-20 00:15:03,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=774480.0, ans=0.0 2024-09-20 00:15:06,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=774520.0, ans=0.2 2024-09-20 00:15:09,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=774520.0, ans=0.0 2024-09-20 00:15:11,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=774520.0, ans=0.2 2024-09-20 00:15:19,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.70 vs. limit=15.0 2024-09-20 00:15:34,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=774560.0, ans=0.1 2024-09-20 00:15:35,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2024-09-20 00:15:37,572 INFO [train.py:1198] (1/2) Epoch 43, batch 3600, loss[loss=0.2292, ctc_loss=0.111, cr_loss=0.347, attn_decoder_loss=0.2346, over 29494.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1103, cr_loss=0.349, attn_decoder_loss=0.238, over 5790633.41 frames. ], batch size: 77, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:15:45,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=774600.0, ans=0.0 2024-09-20 00:15:57,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=774640.0, ans=0.0 2024-09-20 00:16:06,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=774640.0, ans=0.125 2024-09-20 00:16:24,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=774720.0, ans=0.125 2024-09-20 00:16:30,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.541e+01 9.175e+01 9.569e+01 2.464e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-20 00:16:30,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=774720.0, ans=0.04949747468305833 2024-09-20 00:16:36,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=774720.0, ans=0.0 2024-09-20 00:16:40,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=774760.0, ans=0.125 2024-09-20 00:16:49,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774760.0, ans=0.1 2024-09-20 00:16:52,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=774800.0, ans=0.5 2024-09-20 00:16:54,083 INFO [train.py:1198] (1/2) Epoch 43, batch 3650, loss[loss=0.2424, ctc_loss=0.1243, cr_loss=0.3826, attn_decoder_loss=0.247, over 29485.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1098, cr_loss=0.3481, attn_decoder_loss=0.2373, over 5793112.75 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:17:03,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.94 vs. limit=10.0 2024-09-20 00:17:07,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=774840.0, ans=0.0 2024-09-20 00:17:13,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=774840.0, ans=0.125 2024-09-20 00:17:15,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=774840.0, ans=0.125 2024-09-20 00:17:31,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=774880.0, ans=0.0 2024-09-20 00:17:43,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=774920.0, ans=0.125 2024-09-20 00:17:46,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=774920.0, ans=0.0 2024-09-20 00:17:53,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=774960.0, ans=0.125 2024-09-20 00:18:06,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=774960.0, ans=0.0 2024-09-20 00:18:08,758 INFO [train.py:1198] (1/2) Epoch 43, batch 3700, loss[loss=0.2422, ctc_loss=0.1069, cr_loss=0.3568, attn_decoder_loss=0.2494, over 29706.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1096, cr_loss=0.3477, attn_decoder_loss=0.2374, over 5803602.83 frames. ], batch size: 84, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:18:10,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=775000.0, ans=0.125 2024-09-20 00:18:40,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.81 vs. limit=15.0 2024-09-20 00:18:42,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=775080.0, ans=0.125 2024-09-20 00:18:42,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=775080.0, ans=0.125 2024-09-20 00:18:45,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=775080.0, ans=0.125 2024-09-20 00:18:58,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=775120.0, ans=0.2 2024-09-20 00:18:59,087 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 8.457e+01 9.128e+01 9.477e+01 6.609e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-20 00:19:00,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=775120.0, ans=0.125 2024-09-20 00:19:11,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=775160.0, ans=0.0 2024-09-20 00:19:18,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=775160.0, ans=0.125 2024-09-20 00:19:23,195 INFO [train.py:1198] (1/2) Epoch 43, batch 3750, loss[loss=0.2063, ctc_loss=0.09031, cr_loss=0.3072, attn_decoder_loss=0.2124, over 29344.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1098, cr_loss=0.3481, attn_decoder_loss=0.2373, over 5807304.54 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:19:38,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=775240.0, ans=0.1 2024-09-20 00:19:44,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-09-20 00:19:53,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775280.0, ans=0.1 2024-09-20 00:20:06,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775280.0, ans=0.1 2024-09-20 00:20:06,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=775280.0, ans=0.125 2024-09-20 00:20:12,928 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:20:26,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=775360.0, ans=0.125 2024-09-20 00:20:39,071 INFO [train.py:1198] (1/2) Epoch 43, batch 3800, loss[loss=0.2337, ctc_loss=0.1162, cr_loss=0.3477, attn_decoder_loss=0.2391, over 29618.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1094, cr_loss=0.3468, attn_decoder_loss=0.237, over 5797917.32 frames. ], batch size: 86, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:20:46,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=775400.0, ans=0.2 2024-09-20 00:21:03,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.78 vs. limit=22.5 2024-09-20 00:21:28,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=775520.0, ans=0.125 2024-09-20 00:21:29,526 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.559e+01 9.199e+01 9.773e+01 2.259e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-20 00:21:31,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=775520.0, ans=0.125 2024-09-20 00:21:40,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=775560.0, ans=0.2 2024-09-20 00:21:43,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=775560.0, ans=0.125 2024-09-20 00:21:55,046 INFO [train.py:1198] (1/2) Epoch 43, batch 3850, loss[loss=0.2582, ctc_loss=0.1266, cr_loss=0.383, attn_decoder_loss=0.2643, over 29199.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1094, cr_loss=0.3471, attn_decoder_loss=0.2369, over 5811825.43 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:22:07,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775600.0, ans=0.1 2024-09-20 00:22:10,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=775640.0, ans=0.125 2024-09-20 00:22:26,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=775680.0, ans=0.125 2024-09-20 00:22:44,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=775720.0, ans=0.025 2024-09-20 00:22:44,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=775720.0, ans=0.0 2024-09-20 00:22:48,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=775720.0, ans=0.0 2024-09-20 00:22:59,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.62 vs. limit=22.5 2024-09-20 00:23:09,194 INFO [train.py:1198] (1/2) Epoch 43, batch 3900, loss[loss=0.2483, ctc_loss=0.1272, cr_loss=0.3921, attn_decoder_loss=0.253, over 29643.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1104, cr_loss=0.3499, attn_decoder_loss=0.2379, over 5816700.23 frames. ], batch size: 86, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:23:09,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=775800.0, ans=0.125 2024-09-20 00:23:24,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=775840.0, ans=0.125 2024-09-20 00:23:50,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=775880.0, ans=0.025 2024-09-20 00:23:59,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.663e+01 9.061e+01 9.537e+01 1.215e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-20 00:24:11,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=775960.0, ans=0.125 2024-09-20 00:24:18,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=775960.0, ans=0.0 2024-09-20 00:24:23,505 INFO [train.py:1198] (1/2) Epoch 43, batch 3950, loss[loss=0.2512, ctc_loss=0.1272, cr_loss=0.4017, attn_decoder_loss=0.2561, over 29496.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1106, cr_loss=0.3506, attn_decoder_loss=0.2382, over 5835989.40 frames. ], batch size: 97, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:24:32,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=776000.0, ans=0.125 2024-09-20 00:24:51,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=776080.0, ans=0.125 2024-09-20 00:25:01,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=12.0 2024-09-20 00:25:04,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=776080.0, ans=0.025 2024-09-20 00:25:10,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=776120.0, ans=0.2 2024-09-20 00:25:30,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.76 vs. limit=15.0 2024-09-20 00:25:38,031 INFO [train.py:1198] (1/2) Epoch 43, batch 4000, loss[loss=0.2217, ctc_loss=0.1109, cr_loss=0.3472, attn_decoder_loss=0.2263, over 29490.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1108, cr_loss=0.3507, attn_decoder_loss=0.2383, over 5812831.16 frames. ], batch size: 74, lr: 2.55e-03, grad_scale: 32.0 2024-09-20 00:25:38,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=776200.0, ans=0.025 2024-09-20 00:25:44,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=12.0 2024-09-20 00:25:51,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=776240.0, ans=0.125 2024-09-20 00:26:03,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=776240.0, ans=0.125 2024-09-20 00:26:10,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-20 00:26:25,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=776320.0, ans=0.0 2024-09-20 00:26:26,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=12.0 2024-09-20 00:26:29,832 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.878e+01 9.363e+01 9.780e+01 3.308e+02, threshold=1.873e+02, percent-clipped=2.0 2024-09-20 00:26:53,237 INFO [train.py:1198] (1/2) Epoch 43, batch 4050, loss[loss=0.2467, ctc_loss=0.1354, cr_loss=0.3734, attn_decoder_loss=0.2507, over 20491.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1106, cr_loss=0.3504, attn_decoder_loss=0.2381, over 5797468.92 frames. ], batch size: 209, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:27:08,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=776440.0, ans=0.025 2024-09-20 00:27:15,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=776440.0, ans=22.5 2024-09-20 00:28:00,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-09-20 00:28:06,605 INFO [train.py:1198] (1/2) Epoch 43, batch 4100, loss[loss=0.2525, ctc_loss=0.1275, cr_loss=0.3806, attn_decoder_loss=0.2579, over 29469.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.111, cr_loss=0.3513, attn_decoder_loss=0.2384, over 5792826.62 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:28:11,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=776600.0, ans=0.125 2024-09-20 00:28:17,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-09-20 00:28:21,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776640.0, ans=0.1 2024-09-20 00:28:57,813 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.624e+01 9.289e+01 9.929e+01 2.714e+02, threshold=1.858e+02, percent-clipped=2.0 2024-09-20 00:29:05,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.35 vs. limit=15.0 2024-09-20 00:29:15,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=776760.0, ans=0.025 2024-09-20 00:29:20,438 INFO [train.py:1198] (1/2) Epoch 43, batch 4150, loss[loss=0.2208, ctc_loss=0.09867, cr_loss=0.3231, attn_decoder_loss=0.2272, over 29496.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1105, cr_loss=0.3499, attn_decoder_loss=0.2378, over 5798167.59 frames. ], batch size: 77, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:29:32,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776800.0, ans=0.1 2024-09-20 00:29:35,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=776840.0, ans=0.07 2024-09-20 00:29:48,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=776840.0, ans=0.125 2024-09-20 00:29:53,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=776880.0, ans=0.125 2024-09-20 00:30:02,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=776880.0, ans=0.0 2024-09-20 00:30:10,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.17 vs. limit=15.0 2024-09-20 00:30:20,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=776960.0, ans=0.125 2024-09-20 00:30:36,201 INFO [train.py:1198] (1/2) Epoch 43, batch 4200, loss[loss=0.2527, ctc_loss=0.13, cr_loss=0.3892, attn_decoder_loss=0.2577, over 29470.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1105, cr_loss=0.3498, attn_decoder_loss=0.2379, over 5800030.74 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:30:45,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=777000.0, ans=0.125 2024-09-20 00:31:16,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=777080.0, ans=0.125 2024-09-20 00:31:29,059 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.571e+01 8.984e+01 9.502e+01 1.265e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-20 00:31:36,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=777160.0, ans=0.125 2024-09-20 00:31:49,479 INFO [train.py:1198] (1/2) Epoch 43, batch 4250, loss[loss=0.2158, ctc_loss=0.1061, cr_loss=0.328, attn_decoder_loss=0.2207, over 29536.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1107, cr_loss=0.3503, attn_decoder_loss=0.2383, over 5806456.85 frames. ], batch size: 74, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:31:50,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.90 vs. limit=12.0 2024-09-20 00:33:01,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=777400.0, ans=0.125 2024-09-20 00:33:02,930 INFO [train.py:1198] (1/2) Epoch 43, batch 4300, loss[loss=0.2342, ctc_loss=0.1117, cr_loss=0.3536, attn_decoder_loss=0.2399, over 29511.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1104, cr_loss=0.3496, attn_decoder_loss=0.2385, over 5795700.43 frames. ], batch size: 87, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:33:07,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=777400.0, ans=0.125 2024-09-20 00:33:07,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=777400.0, ans=0.2 2024-09-20 00:33:12,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2024-09-20 00:33:27,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777440.0, ans=0.1 2024-09-20 00:33:33,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=777480.0, ans=0.125 2024-09-20 00:33:36,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=777480.0, ans=0.1 2024-09-20 00:33:46,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=777520.0, ans=0.0 2024-09-20 00:33:49,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777520.0, ans=0.1 2024-09-20 00:33:56,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=777520.0, ans=0.0 2024-09-20 00:33:57,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.855e+01 9.292e+01 9.899e+01 2.383e+02, threshold=1.858e+02, percent-clipped=1.0 2024-09-20 00:33:59,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=777520.0, ans=0.125 2024-09-20 00:34:18,757 INFO [train.py:1198] (1/2) Epoch 43, batch 4350, loss[loss=0.2532, ctc_loss=0.1253, cr_loss=0.3822, attn_decoder_loss=0.2589, over 29474.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1128, cr_loss=0.3552, attn_decoder_loss=0.2413, over 5797273.86 frames. ], batch size: 97, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:34:30,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=777600.0, ans=0.125 2024-09-20 00:34:42,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=777640.0, ans=0.125 2024-09-20 00:34:54,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=777680.0, ans=0.0 2024-09-20 00:34:54,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=777680.0, ans=0.0 2024-09-20 00:34:58,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.70 vs. limit=22.5 2024-09-20 00:35:07,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=777720.0, ans=0.0 2024-09-20 00:35:13,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=777720.0, ans=0.0 2024-09-20 00:35:13,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=777720.0, ans=0.1 2024-09-20 00:35:21,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2024-09-20 00:35:21,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=777760.0, ans=0.0 2024-09-20 00:35:31,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=777800.0, ans=15.0 2024-09-20 00:35:31,784 INFO [train.py:1198] (1/2) Epoch 43, batch 4400, loss[loss=0.2468, ctc_loss=0.1236, cr_loss=0.3771, attn_decoder_loss=0.2521, over 27151.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1139, cr_loss=0.3577, attn_decoder_loss=0.2433, over 5764863.86 frames. ], batch size: 124, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:35:42,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777800.0, ans=0.1 2024-09-20 00:35:44,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=777840.0, ans=0.125 2024-09-20 00:36:24,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=777920.0, ans=0.025 2024-09-20 00:36:25,880 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.138e+01 9.169e+01 9.548e+01 1.005e+02 2.703e+02, threshold=1.910e+02, percent-clipped=1.0 2024-09-20 00:36:27,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=777920.0, ans=0.125 2024-09-20 00:36:46,775 INFO [train.py:1198] (1/2) Epoch 43, batch 4450, loss[loss=0.2488, ctc_loss=0.1239, cr_loss=0.3659, attn_decoder_loss=0.2545, over 20846.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1175, cr_loss=0.3634, attn_decoder_loss=0.2454, over 5572209.66 frames. ], batch size: 209, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:37:02,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=778040.0, ans=0.125 2024-09-20 00:37:03,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=778040.0, ans=0.125 2024-09-20 00:37:09,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778040.0, ans=0.1 2024-09-20 00:37:24,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=778080.0, ans=0.0 2024-09-20 00:37:27,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=778080.0, ans=0.1 2024-09-20 00:37:30,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=778120.0, ans=0.125 2024-09-20 00:37:48,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=778160.0, ans=0.125 2024-09-20 00:37:48,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=778160.0, ans=0.125 2024-09-20 00:38:01,715 INFO [train.py:1198] (1/2) Epoch 43, batch 4500, loss[loss=0.2444, ctc_loss=0.1242, cr_loss=0.3429, attn_decoder_loss=0.2502, over 20254.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1203, cr_loss=0.3656, attn_decoder_loss=0.2473, over 5229674.72 frames. ], batch size: 209, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:38:04,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2024-09-20 00:38:27,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=778240.0, ans=0.0 2024-09-20 00:39:29,388 INFO [train.py:1198] (1/2) Epoch 44, batch 0, loss[loss=0.2158, ctc_loss=0.1008, cr_loss=0.3388, attn_decoder_loss=0.221, over 29591.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1008, cr_loss=0.3388, attn_decoder_loss=0.221, over 29591.00 frames. ], batch size: 73, lr: 2.52e-03, grad_scale: 16.0 2024-09-20 00:39:29,388 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 00:39:43,183 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3692, 5.4703, 5.0992, 3.1988], device='cuda:1') 2024-09-20 00:39:47,832 INFO [train.py:1230] (1/2) Epoch 44, validation: loss=0.2131, ctc_loss=0.03639, cr_loss=8.375e-15, attn_decoder_loss=0.2327, over 944034.00 frames. 2024-09-20 00:39:47,833 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 00:39:52,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=778300.0, ans=0.2 2024-09-20 00:39:54,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=12.0 2024-09-20 00:40:05,917 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.560e+01 1.073e+02 1.152e+02 1.272e+02 3.214e+02, threshold=2.305e+02, percent-clipped=2.0 2024-09-20 00:40:09,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2024-09-20 00:40:32,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.43 vs. limit=15.0 2024-09-20 00:40:33,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=778420.0, ans=0.0 2024-09-20 00:40:39,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2024-09-20 00:40:41,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=778420.0, ans=0.0 2024-09-20 00:41:03,896 INFO [train.py:1198] (1/2) Epoch 44, batch 50, loss[loss=0.208, ctc_loss=0.09482, cr_loss=0.3227, attn_decoder_loss=0.2134, over 29479.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1125, cr_loss=0.3533, attn_decoder_loss=0.2394, over 1265257.35 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 16.0 2024-09-20 00:41:08,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=778500.0, ans=0.125 2024-09-20 00:41:18,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2024-09-20 00:41:29,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.33 vs. limit=15.0 2024-09-20 00:41:42,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=778580.0, ans=0.0 2024-09-20 00:41:58,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=778620.0, ans=0.125 2024-09-20 00:41:58,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=778620.0, ans=0.125 2024-09-20 00:42:10,713 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.76 vs. limit=22.5 2024-09-20 00:42:13,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=778660.0, ans=0.07 2024-09-20 00:42:14,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778660.0, ans=0.1 2024-09-20 00:42:23,332 INFO [train.py:1198] (1/2) Epoch 44, batch 100, loss[loss=0.2302, ctc_loss=0.1168, cr_loss=0.3607, attn_decoder_loss=0.2347, over 29543.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1137, cr_loss=0.357, attn_decoder_loss=0.2416, over 2250498.94 frames. ], batch size: 76, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:42:41,356 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.747e+01 9.046e+01 9.804e+01 1.542e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-20 00:42:41,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=778740.0, ans=0.2 2024-09-20 00:43:11,619 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:43:22,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.48 vs. limit=15.0 2024-09-20 00:43:37,842 INFO [train.py:1198] (1/2) Epoch 44, batch 150, loss[loss=0.207, ctc_loss=0.09263, cr_loss=0.3099, attn_decoder_loss=0.2129, over 29430.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1115, cr_loss=0.3512, attn_decoder_loss=0.239, over 3044969.81 frames. ], batch size: 70, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:43:44,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-09-20 00:43:49,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.29 vs. limit=10.0 2024-09-20 00:43:51,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=778940.0, ans=0.0 2024-09-20 00:44:16,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-09-20 00:44:38,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=779060.0, ans=0.025 2024-09-20 00:44:52,617 INFO [train.py:1198] (1/2) Epoch 44, batch 200, loss[loss=0.2462, ctc_loss=0.1245, cr_loss=0.3783, attn_decoder_loss=0.2513, over 27141.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.111, cr_loss=0.3509, attn_decoder_loss=0.2382, over 3657293.12 frames. ], batch size: 124, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:45:13,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.429e+01 8.994e+01 9.673e+01 1.827e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-20 00:45:14,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.58 vs. limit=15.0 2024-09-20 00:45:30,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=779180.0, ans=0.1 2024-09-20 00:45:59,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=779260.0, ans=0.125 2024-09-20 00:46:04,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=779260.0, ans=0.0 2024-09-20 00:46:08,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779260.0, ans=0.1 2024-09-20 00:46:09,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.81 vs. limit=15.0 2024-09-20 00:46:11,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=779300.0, ans=0.2 2024-09-20 00:46:12,945 INFO [train.py:1198] (1/2) Epoch 44, batch 250, loss[loss=0.2473, ctc_loss=0.1179, cr_loss=0.3689, attn_decoder_loss=0.2535, over 29258.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1107, cr_loss=0.3503, attn_decoder_loss=0.2381, over 4140108.58 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:46:34,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=779340.0, ans=0.125 2024-09-20 00:46:46,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=779380.0, ans=0.125 2024-09-20 00:47:28,081 INFO [train.py:1198] (1/2) Epoch 44, batch 300, loss[loss=0.2441, ctc_loss=0.1218, cr_loss=0.3619, attn_decoder_loss=0.2497, over 29534.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1105, cr_loss=0.3504, attn_decoder_loss=0.2379, over 4508263.72 frames. ], batch size: 92, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:47:28,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=779500.0, ans=0.125 2024-09-20 00:47:31,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=779500.0, ans=0.125 2024-09-20 00:47:37,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779500.0, ans=0.1 2024-09-20 00:47:37,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=779500.0, ans=0.025 2024-09-20 00:47:47,520 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.498e+01 8.969e+01 9.392e+01 3.050e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-20 00:47:47,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=779540.0, ans=0.125 2024-09-20 00:47:56,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=779580.0, ans=0.125 2024-09-20 00:48:04,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=779580.0, ans=0.125 2024-09-20 00:48:18,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=22.5 2024-09-20 00:48:27,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=779660.0, ans=0.2 2024-09-20 00:48:36,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-09-20 00:48:43,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=22.5 2024-09-20 00:48:43,471 INFO [train.py:1198] (1/2) Epoch 44, batch 350, loss[loss=0.2162, ctc_loss=0.09987, cr_loss=0.3255, attn_decoder_loss=0.2219, over 29315.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1111, cr_loss=0.3514, attn_decoder_loss=0.2385, over 4794301.74 frames. ], batch size: 71, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 00:48:56,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=779700.0, ans=0.125 2024-09-20 00:48:59,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=779740.0, ans=0.125 2024-09-20 00:49:10,322 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:49:29,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=779820.0, ans=0.0 2024-09-20 00:49:56,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.27 vs. limit=15.0 2024-09-20 00:50:00,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779860.0, ans=0.1 2024-09-20 00:50:03,129 INFO [train.py:1198] (1/2) Epoch 44, batch 400, loss[loss=0.2338, ctc_loss=0.1147, cr_loss=0.3502, attn_decoder_loss=0.2393, over 29715.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1107, cr_loss=0.3509, attn_decoder_loss=0.2381, over 5024348.33 frames. ], batch size: 82, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:50:22,883 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.550e+01 9.066e+01 9.796e+01 2.019e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-20 00:50:38,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=779980.0, ans=0.125 2024-09-20 00:50:40,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=779980.0, ans=0.125 2024-09-20 00:50:54,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780020.0, ans=0.1 2024-09-20 00:50:58,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=780020.0, ans=0.125 2024-09-20 00:50:58,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=780020.0, ans=0.125 2024-09-20 00:51:17,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=780060.0, ans=0.2 2024-09-20 00:51:19,797 INFO [train.py:1198] (1/2) Epoch 44, batch 450, loss[loss=0.2485, ctc_loss=0.1184, cr_loss=0.3593, attn_decoder_loss=0.255, over 29714.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1112, cr_loss=0.3516, attn_decoder_loss=0.2386, over 5185219.60 frames. ], batch size: 83, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:51:33,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=780140.0, ans=0.0 2024-09-20 00:51:42,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=780140.0, ans=0.035 2024-09-20 00:51:44,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=12.0 2024-09-20 00:52:04,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=780220.0, ans=0.125 2024-09-20 00:52:33,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=780260.0, ans=0.1 2024-09-20 00:52:35,604 INFO [train.py:1198] (1/2) Epoch 44, batch 500, loss[loss=0.24, ctc_loss=0.1121, cr_loss=0.3644, attn_decoder_loss=0.2461, over 29435.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1105, cr_loss=0.3501, attn_decoder_loss=0.2375, over 5328994.15 frames. ], batch size: 94, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:52:35,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=780300.0, ans=0.125 2024-09-20 00:52:46,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=780300.0, ans=0.125 2024-09-20 00:52:57,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.573e+01 8.977e+01 9.726e+01 1.793e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-20 00:53:32,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=780420.0, ans=10.0 2024-09-20 00:53:48,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=780460.0, ans=0.0 2024-09-20 00:53:55,482 INFO [train.py:1198] (1/2) Epoch 44, batch 550, loss[loss=0.2401, ctc_loss=0.1102, cr_loss=0.3436, attn_decoder_loss=0.2469, over 28770.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1104, cr_loss=0.3499, attn_decoder_loss=0.2376, over 5421725.11 frames. ], batch size: 104, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:54:03,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=780500.0, ans=0.2 2024-09-20 00:54:07,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=780500.0, ans=0.125 2024-09-20 00:54:24,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=780580.0, ans=0.125 2024-09-20 00:54:27,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=780580.0, ans=0.0 2024-09-20 00:54:27,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=780580.0, ans=0.125 2024-09-20 00:54:37,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780580.0, ans=0.1 2024-09-20 00:54:39,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=780620.0, ans=10.0 2024-09-20 00:54:56,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.94 vs. limit=15.0 2024-09-20 00:54:59,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=780660.0, ans=0.125 2024-09-20 00:55:10,821 INFO [train.py:1198] (1/2) Epoch 44, batch 600, loss[loss=0.2453, ctc_loss=0.12, cr_loss=0.386, attn_decoder_loss=0.2506, over 29242.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1101, cr_loss=0.3499, attn_decoder_loss=0.2377, over 5508764.59 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:55:30,172 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.511e+01 9.110e+01 9.777e+01 1.650e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-20 00:55:44,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=780780.0, ans=0.125 2024-09-20 00:56:00,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.46 vs. limit=15.0 2024-09-20 00:56:08,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=780820.0, ans=0.125 2024-09-20 00:56:13,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780860.0, ans=0.1 2024-09-20 00:56:26,263 INFO [train.py:1198] (1/2) Epoch 44, batch 650, loss[loss=0.2339, ctc_loss=0.1091, cr_loss=0.3461, attn_decoder_loss=0.2401, over 29777.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1091, cr_loss=0.3477, attn_decoder_loss=0.2371, over 5586566.65 frames. ], batch size: 81, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:56:27,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-09-20 00:56:38,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=780900.0, ans=0.95 2024-09-20 00:56:43,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=780940.0, ans=0.0 2024-09-20 00:57:07,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=780980.0, ans=15.0 2024-09-20 00:57:20,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=781020.0, ans=0.125 2024-09-20 00:57:39,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=781060.0, ans=0.125 2024-09-20 00:57:46,187 INFO [train.py:1198] (1/2) Epoch 44, batch 700, loss[loss=0.228, ctc_loss=0.1024, cr_loss=0.3487, attn_decoder_loss=0.2342, over 29503.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1096, cr_loss=0.3486, attn_decoder_loss=0.2376, over 5638444.53 frames. ], batch size: 76, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:57:48,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2024-09-20 00:58:02,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=781140.0, ans=0.0 2024-09-20 00:58:05,626 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.523e+01 8.995e+01 9.436e+01 1.726e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-20 00:58:12,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.99 vs. limit=10.0 2024-09-20 00:58:31,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=12.0 2024-09-20 00:58:51,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.43 vs. limit=6.0 2024-09-20 00:59:00,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=781300.0, ans=0.2 2024-09-20 00:59:01,550 INFO [train.py:1198] (1/2) Epoch 44, batch 750, loss[loss=0.2337, ctc_loss=0.1052, cr_loss=0.3392, attn_decoder_loss=0.2404, over 29681.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1094, cr_loss=0.3477, attn_decoder_loss=0.2374, over 5676215.28 frames. ], batch size: 82, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:59:07,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=781300.0, ans=0.0 2024-09-20 00:59:10,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=781300.0, ans=0.0 2024-09-20 00:59:25,918 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:59:27,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=781340.0, ans=0.0 2024-09-20 01:00:03,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=781460.0, ans=0.0 2024-09-20 01:00:16,838 INFO [train.py:1198] (1/2) Epoch 44, batch 800, loss[loss=0.2118, ctc_loss=0.09986, cr_loss=0.3337, attn_decoder_loss=0.2168, over 29619.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1094, cr_loss=0.3478, attn_decoder_loss=0.2373, over 5707831.19 frames. ], batch size: 73, lr: 2.51e-03, grad_scale: 32.0 2024-09-20 01:00:20,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=781500.0, ans=0.0 2024-09-20 01:00:37,969 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.478e+01 8.977e+01 9.680e+01 1.726e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-20 01:00:51,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=8.0 2024-09-20 01:00:55,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=781580.0, ans=0.125 2024-09-20 01:00:58,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=781580.0, ans=0.025 2024-09-20 01:01:17,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=781620.0, ans=0.2 2024-09-20 01:01:34,654 INFO [train.py:1198] (1/2) Epoch 44, batch 850, loss[loss=0.2458, ctc_loss=0.121, cr_loss=0.3785, attn_decoder_loss=0.2513, over 29714.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1091, cr_loss=0.3471, attn_decoder_loss=0.2371, over 5737127.43 frames. ], batch size: 89, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:01:56,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=781740.0, ans=0.125 2024-09-20 01:01:56,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=781740.0, ans=0.125 2024-09-20 01:02:05,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=781780.0, ans=0.0 2024-09-20 01:02:19,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=781780.0, ans=0.025 2024-09-20 01:02:32,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.46 vs. limit=15.0 2024-09-20 01:02:39,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=781860.0, ans=0.2 2024-09-20 01:02:49,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=781860.0, ans=0.125 2024-09-20 01:02:52,368 INFO [train.py:1198] (1/2) Epoch 44, batch 900, loss[loss=0.2169, ctc_loss=0.09752, cr_loss=0.3264, attn_decoder_loss=0.2229, over 29602.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1095, cr_loss=0.3476, attn_decoder_loss=0.2373, over 5741884.04 frames. ], batch size: 73, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:03:10,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=781940.0, ans=0.125 2024-09-20 01:03:15,005 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.619e+01 9.074e+01 9.618e+01 1.505e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-20 01:03:15,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=781940.0, ans=0.5 2024-09-20 01:03:25,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=781980.0, ans=0.5 2024-09-20 01:03:27,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.33 vs. limit=22.5 2024-09-20 01:03:31,791 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:03:39,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=782020.0, ans=0.2 2024-09-20 01:03:41,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2024-09-20 01:03:45,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=782020.0, ans=0.125 2024-09-20 01:03:52,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=782060.0, ans=0.1 2024-09-20 01:04:07,360 INFO [train.py:1198] (1/2) Epoch 44, batch 950, loss[loss=0.2123, ctc_loss=0.08919, cr_loss=0.2939, attn_decoder_loss=0.2195, over 29494.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1097, cr_loss=0.3478, attn_decoder_loss=0.2375, over 5745363.79 frames. ], batch size: 74, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:04:08,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-09-20 01:04:16,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=782100.0, ans=0.125 2024-09-20 01:04:28,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-20 01:04:37,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-09-20 01:04:39,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=782180.0, ans=0.125 2024-09-20 01:04:47,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=782180.0, ans=0.0 2024-09-20 01:04:47,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=782180.0, ans=0.1 2024-09-20 01:04:52,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=782180.0, ans=0.2 2024-09-20 01:05:11,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=782260.0, ans=0.2 2024-09-20 01:05:21,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=782260.0, ans=0.125 2024-09-20 01:05:24,399 INFO [train.py:1198] (1/2) Epoch 44, batch 1000, loss[loss=0.2208, ctc_loss=0.1045, cr_loss=0.3347, attn_decoder_loss=0.2262, over 29510.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1105, cr_loss=0.3493, attn_decoder_loss=0.2383, over 5739810.97 frames. ], batch size: 77, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:05:24,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=782300.0, ans=0.1 2024-09-20 01:05:32,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=782300.0, ans=0.125 2024-09-20 01:05:39,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=782300.0, ans=0.125 2024-09-20 01:05:49,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.837e+01 9.355e+01 1.004e+02 2.810e+02, threshold=1.871e+02, percent-clipped=1.0 2024-09-20 01:06:01,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=782380.0, ans=0.0 2024-09-20 01:06:03,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=782380.0, ans=0.125 2024-09-20 01:06:09,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782380.0, ans=0.1 2024-09-20 01:06:14,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=782420.0, ans=0.1 2024-09-20 01:06:26,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=782460.0, ans=0.2 2024-09-20 01:06:26,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=782460.0, ans=0.1 2024-09-20 01:06:31,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.02 vs. limit=22.5 2024-09-20 01:06:35,350 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:06:40,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2024-09-20 01:06:42,625 INFO [train.py:1198] (1/2) Epoch 44, batch 1050, loss[loss=0.2383, ctc_loss=0.1069, cr_loss=0.3399, attn_decoder_loss=0.2453, over 29677.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1101, cr_loss=0.3484, attn_decoder_loss=0.2376, over 5746852.39 frames. ], batch size: 85, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:06:44,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=782500.0, ans=0.125 2024-09-20 01:06:46,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=782500.0, ans=0.2 2024-09-20 01:07:07,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=782540.0, ans=0.025 2024-09-20 01:07:10,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=782540.0, ans=0.125 2024-09-20 01:07:12,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-20 01:07:19,488 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:07:20,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=782580.0, ans=0.0 2024-09-20 01:07:44,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-09-20 01:07:47,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=782660.0, ans=0.0 2024-09-20 01:07:57,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=782700.0, ans=0.125 2024-09-20 01:07:58,413 INFO [train.py:1198] (1/2) Epoch 44, batch 1100, loss[loss=0.2353, ctc_loss=0.1118, cr_loss=0.3498, attn_decoder_loss=0.2413, over 29442.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1096, cr_loss=0.3476, attn_decoder_loss=0.237, over 5757843.35 frames. ], batch size: 78, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:08:23,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.487e+01 8.944e+01 9.556e+01 1.706e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-20 01:08:38,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=782780.0, ans=0.2 2024-09-20 01:09:16,487 INFO [train.py:1198] (1/2) Epoch 44, batch 1150, loss[loss=0.2274, ctc_loss=0.1074, cr_loss=0.355, attn_decoder_loss=0.2329, over 29454.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1096, cr_loss=0.3478, attn_decoder_loss=0.2372, over 5753714.82 frames. ], batch size: 78, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:09:28,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=782900.0, ans=0.2 2024-09-20 01:09:49,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=782980.0, ans=0.125 2024-09-20 01:10:33,778 INFO [train.py:1198] (1/2) Epoch 44, batch 1200, loss[loss=0.2405, ctc_loss=0.1119, cr_loss=0.345, attn_decoder_loss=0.2471, over 29661.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.11, cr_loss=0.3483, attn_decoder_loss=0.2378, over 5746868.15 frames. ], batch size: 85, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:10:40,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=783100.0, ans=0.0 2024-09-20 01:10:52,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=783140.0, ans=0.0 2024-09-20 01:10:56,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.611e+01 9.178e+01 9.686e+01 1.323e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-20 01:11:00,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=783140.0, ans=0.125 2024-09-20 01:11:01,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=783140.0, ans=0.125 2024-09-20 01:11:04,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=783180.0, ans=0.125 2024-09-20 01:11:42,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=783260.0, ans=0.125 2024-09-20 01:11:50,058 INFO [train.py:1198] (1/2) Epoch 44, batch 1250, loss[loss=0.2471, ctc_loss=0.1133, cr_loss=0.3485, attn_decoder_loss=0.2542, over 29513.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1106, cr_loss=0.35, attn_decoder_loss=0.2385, over 5774285.00 frames. ], batch size: 92, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:12:01,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.18 vs. limit=15.0 2024-09-20 01:12:02,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=783300.0, ans=0.015 2024-09-20 01:12:09,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=783340.0, ans=0.025 2024-09-20 01:12:25,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=783380.0, ans=0.0 2024-09-20 01:12:33,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=783380.0, ans=0.5 2024-09-20 01:12:48,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=783420.0, ans=0.125 2024-09-20 01:12:49,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=783420.0, ans=0.1 2024-09-20 01:13:00,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=783460.0, ans=0.05 2024-09-20 01:13:07,789 INFO [train.py:1198] (1/2) Epoch 44, batch 1300, loss[loss=0.2507, ctc_loss=0.1213, cr_loss=0.3649, attn_decoder_loss=0.257, over 28407.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1102, cr_loss=0.349, attn_decoder_loss=0.2378, over 5779010.53 frames. ], batch size: 111, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:13:09,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=783500.0, ans=0.125 2024-09-20 01:13:23,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=783540.0, ans=0.0 2024-09-20 01:13:23,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-09-20 01:13:30,710 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.490e+01 8.901e+01 9.557e+01 1.827e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-20 01:13:54,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=783620.0, ans=0.0 2024-09-20 01:13:57,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=783620.0, ans=0.125 2024-09-20 01:14:19,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-09-20 01:14:20,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=783660.0, ans=0.125 2024-09-20 01:14:25,847 INFO [train.py:1198] (1/2) Epoch 44, batch 1350, loss[loss=0.2298, ctc_loss=0.1086, cr_loss=0.3415, attn_decoder_loss=0.2356, over 29720.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1099, cr_loss=0.349, attn_decoder_loss=0.2377, over 5796406.59 frames. ], batch size: 81, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:14:46,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=783740.0, ans=0.2 2024-09-20 01:14:46,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=783740.0, ans=0.125 2024-09-20 01:14:46,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=783740.0, ans=0.1 2024-09-20 01:14:48,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=783740.0, ans=0.0 2024-09-20 01:14:51,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=783740.0, ans=0.1 2024-09-20 01:15:07,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=783780.0, ans=0.025 2024-09-20 01:15:22,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=783820.0, ans=0.2 2024-09-20 01:15:25,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=783860.0, ans=0.0 2024-09-20 01:15:31,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=783860.0, ans=0.125 2024-09-20 01:15:40,845 INFO [train.py:1198] (1/2) Epoch 44, batch 1400, loss[loss=0.2042, ctc_loss=0.08623, cr_loss=0.2869, attn_decoder_loss=0.2109, over 29567.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1094, cr_loss=0.3483, attn_decoder_loss=0.2372, over 5807602.73 frames. ], batch size: 69, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:15:54,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=783940.0, ans=0.125 2024-09-20 01:16:03,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.567e+01 9.208e+01 9.655e+01 2.033e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-20 01:16:14,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=12.0 2024-09-20 01:16:16,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-09-20 01:16:35,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784020.0, ans=0.1 2024-09-20 01:16:47,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784020.0, ans=0.1 2024-09-20 01:16:53,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2024-09-20 01:16:56,844 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:16:59,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=784060.0, ans=0.125 2024-09-20 01:17:05,607 INFO [train.py:1198] (1/2) Epoch 44, batch 1450, loss[loss=0.2459, ctc_loss=0.1244, cr_loss=0.3641, attn_decoder_loss=0.2513, over 29484.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1095, cr_loss=0.3482, attn_decoder_loss=0.2375, over 5804508.84 frames. ], batch size: 94, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:17:25,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=784140.0, ans=0.0 2024-09-20 01:17:28,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=784140.0, ans=0.125 2024-09-20 01:17:47,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=784180.0, ans=0.0 2024-09-20 01:17:55,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.77 vs. limit=15.0 2024-09-20 01:18:06,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=784260.0, ans=0.0 2024-09-20 01:18:17,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=784260.0, ans=0.2 2024-09-20 01:18:20,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=784260.0, ans=0.2 2024-09-20 01:18:23,244 INFO [train.py:1198] (1/2) Epoch 44, batch 1500, loss[loss=0.241, ctc_loss=0.1205, cr_loss=0.378, attn_decoder_loss=0.246, over 29618.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1101, cr_loss=0.3492, attn_decoder_loss=0.2381, over 5804621.35 frames. ], batch size: 86, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:18:28,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2024-09-20 01:18:41,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-09-20 01:18:42,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=12.0 2024-09-20 01:18:44,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784340.0, ans=0.1 2024-09-20 01:18:47,500 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.610e+01 9.049e+01 9.653e+01 2.114e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-20 01:18:50,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.68 vs. limit=10.0 2024-09-20 01:19:12,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=784420.0, ans=0.0 2024-09-20 01:19:38,998 INFO [train.py:1198] (1/2) Epoch 44, batch 1550, loss[loss=0.2581, ctc_loss=0.1301, cr_loss=0.3947, attn_decoder_loss=0.2635, over 29533.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1106, cr_loss=0.3498, attn_decoder_loss=0.2381, over 5779849.29 frames. ], batch size: 90, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:19:40,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=784500.0, ans=0.125 2024-09-20 01:19:52,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=784540.0, ans=0.1 2024-09-20 01:19:53,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=12.0 2024-09-20 01:20:10,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2024-09-20 01:20:29,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.58 vs. limit=15.0 2024-09-20 01:20:49,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-09-20 01:20:55,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=784700.0, ans=0.125 2024-09-20 01:20:56,201 INFO [train.py:1198] (1/2) Epoch 44, batch 1600, loss[loss=0.2386, ctc_loss=0.1099, cr_loss=0.3469, attn_decoder_loss=0.2452, over 29673.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1104, cr_loss=0.3498, attn_decoder_loss=0.2379, over 5761692.75 frames. ], batch size: 85, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:20:56,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=784700.0, ans=0.125 2024-09-20 01:21:17,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=784740.0, ans=10.0 2024-09-20 01:21:20,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.674e+01 9.197e+01 9.675e+01 9.690e+02, threshold=1.839e+02, percent-clipped=2.0 2024-09-20 01:21:30,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=784780.0, ans=0.125 2024-09-20 01:22:04,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2024-09-20 01:22:14,095 INFO [train.py:1198] (1/2) Epoch 44, batch 1650, loss[loss=0.2373, ctc_loss=0.1039, cr_loss=0.3313, attn_decoder_loss=0.2448, over 29700.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1098, cr_loss=0.3483, attn_decoder_loss=0.2375, over 5756680.97 frames. ], batch size: 89, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:22:24,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=784900.0, ans=0.1 2024-09-20 01:22:42,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=784980.0, ans=0.125 2024-09-20 01:22:50,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=784980.0, ans=0.2 2024-09-20 01:23:29,265 INFO [train.py:1198] (1/2) Epoch 44, batch 1700, loss[loss=0.2114, ctc_loss=0.09354, cr_loss=0.3106, attn_decoder_loss=0.2176, over 29562.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1096, cr_loss=0.3478, attn_decoder_loss=0.2376, over 5778362.82 frames. ], batch size: 69, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:23:29,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=785100.0, ans=0.025 2024-09-20 01:23:34,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=785100.0, ans=0.0 2024-09-20 01:23:41,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=785100.0, ans=0.2 2024-09-20 01:23:52,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=785140.0, ans=0.04949747468305833 2024-09-20 01:23:55,595 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.621e+01 9.129e+01 9.684e+01 1.448e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-20 01:24:16,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=785220.0, ans=0.05 2024-09-20 01:24:32,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-20 01:24:38,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=785260.0, ans=0.125 2024-09-20 01:24:46,809 INFO [train.py:1198] (1/2) Epoch 44, batch 1750, loss[loss=0.2139, ctc_loss=0.09729, cr_loss=0.3297, attn_decoder_loss=0.2195, over 29352.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1094, cr_loss=0.348, attn_decoder_loss=0.2375, over 5787360.89 frames. ], batch size: 67, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:25:04,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-20 01:25:20,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=785380.0, ans=0.125 2024-09-20 01:25:35,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=785420.0, ans=10.0 2024-09-20 01:26:03,910 INFO [train.py:1198] (1/2) Epoch 44, batch 1800, loss[loss=0.2415, ctc_loss=0.1167, cr_loss=0.356, attn_decoder_loss=0.2474, over 29698.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1098, cr_loss=0.3488, attn_decoder_loss=0.2378, over 5789928.54 frames. ], batch size: 83, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:26:08,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=785500.0, ans=0.125 2024-09-20 01:26:27,989 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.530e+01 8.993e+01 9.458e+01 1.310e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-20 01:26:37,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=785580.0, ans=0.2 2024-09-20 01:27:17,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=785660.0, ans=0.5 2024-09-20 01:27:17,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=785660.0, ans=0.0 2024-09-20 01:27:19,811 INFO [train.py:1198] (1/2) Epoch 44, batch 1850, loss[loss=0.2388, ctc_loss=0.1148, cr_loss=0.3545, attn_decoder_loss=0.2447, over 29635.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1097, cr_loss=0.349, attn_decoder_loss=0.2376, over 5795555.77 frames. ], batch size: 86, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:27:26,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=785700.0, ans=0.0 2024-09-20 01:27:36,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=785740.0, ans=0.125 2024-09-20 01:27:49,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=785740.0, ans=0.1 2024-09-20 01:27:50,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=785780.0, ans=0.0 2024-09-20 01:28:19,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=785820.0, ans=0.125 2024-09-20 01:28:37,314 INFO [train.py:1198] (1/2) Epoch 44, batch 1900, loss[loss=0.2522, ctc_loss=0.1203, cr_loss=0.3864, attn_decoder_loss=0.2583, over 29723.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1097, cr_loss=0.349, attn_decoder_loss=0.238, over 5803689.71 frames. ], batch size: 89, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:28:46,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2024-09-20 01:28:55,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=785940.0, ans=0.125 2024-09-20 01:29:01,486 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.601e+01 9.112e+01 9.762e+01 1.549e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-20 01:29:03,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=785940.0, ans=0.1 2024-09-20 01:29:04,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=785940.0, ans=0.1 2024-09-20 01:29:09,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=785980.0, ans=0.2 2024-09-20 01:29:15,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.73 vs. limit=10.0 2024-09-20 01:29:34,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=786020.0, ans=0.125 2024-09-20 01:29:37,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786020.0, ans=0.1 2024-09-20 01:29:54,964 INFO [train.py:1198] (1/2) Epoch 44, batch 1950, loss[loss=0.2283, ctc_loss=0.1003, cr_loss=0.3277, attn_decoder_loss=0.2353, over 29442.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1103, cr_loss=0.3501, attn_decoder_loss=0.2391, over 5818241.63 frames. ], batch size: 78, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:30:04,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=786100.0, ans=0.125 2024-09-20 01:30:08,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-20 01:30:32,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=786180.0, ans=0.09899494936611666 2024-09-20 01:30:50,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.12 vs. limit=15.0 2024-09-20 01:31:02,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.18 vs. limit=15.0 2024-09-20 01:31:08,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=786300.0, ans=0.125 2024-09-20 01:31:09,698 INFO [train.py:1198] (1/2) Epoch 44, batch 2000, loss[loss=0.1971, ctc_loss=0.08472, cr_loss=0.2902, attn_decoder_loss=0.2032, over 29357.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1108, cr_loss=0.3505, attn_decoder_loss=0.2393, over 5796196.93 frames. ], batch size: 67, lr: 2.50e-03, grad_scale: 32.0 2024-09-20 01:31:36,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=786340.0, ans=0.125 2024-09-20 01:31:37,699 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 8.592e+01 9.152e+01 9.700e+01 1.620e+02, threshold=1.830e+02, percent-clipped=0.0 2024-09-20 01:31:37,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=786340.0, ans=0.2 2024-09-20 01:31:45,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=786380.0, ans=0.125 2024-09-20 01:31:54,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=786380.0, ans=0.5 2024-09-20 01:31:56,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786420.0, ans=0.1 2024-09-20 01:32:05,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=786420.0, ans=0.0 2024-09-20 01:32:08,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=786420.0, ans=0.125 2024-09-20 01:32:26,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=786500.0, ans=0.1 2024-09-20 01:32:27,935 INFO [train.py:1198] (1/2) Epoch 44, batch 2050, loss[loss=0.2008, ctc_loss=0.08845, cr_loss=0.2835, attn_decoder_loss=0.207, over 29453.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1104, cr_loss=0.3497, attn_decoder_loss=0.2386, over 5787404.81 frames. ], batch size: 70, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:32:59,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=786580.0, ans=0.2 2024-09-20 01:33:14,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786620.0, ans=0.1 2024-09-20 01:33:24,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=12.0 2024-09-20 01:33:32,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.71 vs. limit=15.0 2024-09-20 01:33:44,954 INFO [train.py:1198] (1/2) Epoch 44, batch 2100, loss[loss=0.2325, ctc_loss=0.1082, cr_loss=0.3408, attn_decoder_loss=0.2387, over 29754.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1099, cr_loss=0.349, attn_decoder_loss=0.2379, over 5800337.71 frames. ], batch size: 81, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:33:49,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=786700.0, ans=0.125 2024-09-20 01:33:58,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=786740.0, ans=0.2 2024-09-20 01:34:10,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.592e+01 8.953e+01 9.546e+01 1.075e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-20 01:34:13,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=786780.0, ans=0.125 2024-09-20 01:34:59,824 INFO [train.py:1198] (1/2) Epoch 44, batch 2150, loss[loss=0.2396, ctc_loss=0.1219, cr_loss=0.3951, attn_decoder_loss=0.2439, over 29433.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1095, cr_loss=0.3488, attn_decoder_loss=0.2375, over 5815014.52 frames. ], batch size: 78, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:35:06,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=786900.0, ans=0.07 2024-09-20 01:35:09,364 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:35:26,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=786940.0, ans=0.0 2024-09-20 01:35:43,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=786980.0, ans=0.125 2024-09-20 01:35:49,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=787020.0, ans=0.125 2024-09-20 01:35:50,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=787020.0, ans=0.1 2024-09-20 01:36:07,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=787060.0, ans=0.0 2024-09-20 01:36:17,940 INFO [train.py:1198] (1/2) Epoch 44, batch 2200, loss[loss=0.2352, ctc_loss=0.1096, cr_loss=0.3628, attn_decoder_loss=0.2411, over 29626.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1095, cr_loss=0.3484, attn_decoder_loss=0.2372, over 5811115.01 frames. ], batch size: 86, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:36:34,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=787140.0, ans=0.125 2024-09-20 01:36:40,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=787140.0, ans=0.125 2024-09-20 01:36:43,209 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.551e+01 8.996e+01 9.508e+01 1.674e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-20 01:36:46,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=787180.0, ans=0.0 2024-09-20 01:36:49,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=787180.0, ans=0.025 2024-09-20 01:37:12,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=787220.0, ans=0.125 2024-09-20 01:37:12,842 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:37:17,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=787260.0, ans=0.2 2024-09-20 01:37:35,604 INFO [train.py:1198] (1/2) Epoch 44, batch 2250, loss[loss=0.2382, ctc_loss=0.1161, cr_loss=0.3454, attn_decoder_loss=0.2441, over 29714.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1095, cr_loss=0.3481, attn_decoder_loss=0.2373, over 5809877.23 frames. ], batch size: 82, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:37:35,992 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:37:44,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=787300.0, ans=0.0 2024-09-20 01:37:53,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.59 vs. limit=15.0 2024-09-20 01:38:17,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=787380.0, ans=0.125 2024-09-20 01:38:50,741 INFO [train.py:1198] (1/2) Epoch 44, batch 2300, loss[loss=0.2058, ctc_loss=0.09286, cr_loss=0.3223, attn_decoder_loss=0.2112, over 29311.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.109, cr_loss=0.3469, attn_decoder_loss=0.2363, over 5796722.15 frames. ], batch size: 71, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:39:18,221 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.668e+01 9.192e+01 9.767e+01 1.748e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-20 01:39:20,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=787540.0, ans=0.0 2024-09-20 01:39:23,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=787580.0, ans=0.125 2024-09-20 01:39:32,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2024-09-20 01:39:36,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-09-20 01:39:58,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=787660.0, ans=0.025 2024-09-20 01:40:08,534 INFO [train.py:1198] (1/2) Epoch 44, batch 2350, loss[loss=0.2417, ctc_loss=0.1243, cr_loss=0.3962, attn_decoder_loss=0.246, over 29685.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1095, cr_loss=0.3479, attn_decoder_loss=0.2369, over 5802072.37 frames. ], batch size: 83, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:40:14,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=787700.0, ans=0.125 2024-09-20 01:40:19,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=787700.0, ans=0.125 2024-09-20 01:40:20,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=787700.0, ans=0.125 2024-09-20 01:40:31,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787740.0, ans=0.1 2024-09-20 01:40:55,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=787820.0, ans=0.125 2024-09-20 01:41:21,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=787860.0, ans=0.2 2024-09-20 01:41:22,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2024-09-20 01:41:26,338 INFO [train.py:1198] (1/2) Epoch 44, batch 2400, loss[loss=0.2229, ctc_loss=0.1003, cr_loss=0.3474, attn_decoder_loss=0.2288, over 29572.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1099, cr_loss=0.3492, attn_decoder_loss=0.2375, over 5806382.58 frames. ], batch size: 76, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:41:40,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=787940.0, ans=0.125 2024-09-20 01:41:52,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=787940.0, ans=0.1 2024-09-20 01:41:53,425 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.728e+01 9.218e+01 9.758e+01 1.607e+02, threshold=1.844e+02, percent-clipped=0.0 2024-09-20 01:41:55,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2024-09-20 01:42:42,333 INFO [train.py:1198] (1/2) Epoch 44, batch 2450, loss[loss=0.244, ctc_loss=0.1262, cr_loss=0.4025, attn_decoder_loss=0.2482, over 29720.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1105, cr_loss=0.3504, attn_decoder_loss=0.2385, over 5782909.97 frames. ], batch size: 82, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:43:04,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=788140.0, ans=0.0 2024-09-20 01:43:10,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=788140.0, ans=0.125 2024-09-20 01:43:13,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=788180.0, ans=0.125 2024-09-20 01:43:40,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=788220.0, ans=0.2 2024-09-20 01:43:58,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=788300.0, ans=0.025 2024-09-20 01:43:59,852 INFO [train.py:1198] (1/2) Epoch 44, batch 2500, loss[loss=0.2334, ctc_loss=0.1027, cr_loss=0.3407, attn_decoder_loss=0.2404, over 29631.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1102, cr_loss=0.3499, attn_decoder_loss=0.2384, over 5792796.82 frames. ], batch size: 86, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:44:22,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=788340.0, ans=0.2 2024-09-20 01:44:26,964 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.639e+01 9.215e+01 9.726e+01 1.262e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-20 01:44:39,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=788380.0, ans=0.2 2024-09-20 01:44:45,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=788420.0, ans=0.015 2024-09-20 01:44:59,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=788460.0, ans=0.125 2024-09-20 01:45:17,643 INFO [train.py:1198] (1/2) Epoch 44, batch 2550, loss[loss=0.2011, ctc_loss=0.08944, cr_loss=0.2988, attn_decoder_loss=0.2068, over 29325.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.11, cr_loss=0.3496, attn_decoder_loss=0.2383, over 5796723.91 frames. ], batch size: 67, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:45:31,365 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:46:01,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=788620.0, ans=0.0 2024-09-20 01:46:23,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=788660.0, ans=0.1 2024-09-20 01:46:26,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=788660.0, ans=0.2 2024-09-20 01:46:32,704 INFO [train.py:1198] (1/2) Epoch 44, batch 2600, loss[loss=0.233, ctc_loss=0.1053, cr_loss=0.3393, attn_decoder_loss=0.2397, over 29430.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1098, cr_loss=0.3491, attn_decoder_loss=0.2383, over 5794029.51 frames. ], batch size: 78, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:46:54,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=788740.0, ans=0.125 2024-09-20 01:47:00,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=788740.0, ans=0.125 2024-09-20 01:47:03,305 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 8.497e+01 9.008e+01 9.570e+01 2.359e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 01:47:09,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=788780.0, ans=0.0 2024-09-20 01:47:14,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=788780.0, ans=0.05 2024-09-20 01:47:17,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=788780.0, ans=0.125 2024-09-20 01:47:32,552 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:47:43,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=788860.0, ans=0.0 2024-09-20 01:47:50,434 INFO [train.py:1198] (1/2) Epoch 44, batch 2650, loss[loss=0.2516, ctc_loss=0.1263, cr_loss=0.3905, attn_decoder_loss=0.2568, over 29305.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.11, cr_loss=0.3497, attn_decoder_loss=0.2385, over 5800891.33 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:48:08,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=788940.0, ans=0.125 2024-09-20 01:48:54,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=789060.0, ans=0.05 2024-09-20 01:49:07,973 INFO [train.py:1198] (1/2) Epoch 44, batch 2700, loss[loss=0.2548, ctc_loss=0.1171, cr_loss=0.3813, attn_decoder_loss=0.2616, over 29510.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1104, cr_loss=0.3505, attn_decoder_loss=0.2389, over 5796905.50 frames. ], batch size: 87, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:49:18,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=789100.0, ans=0.0 2024-09-20 01:49:21,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=789140.0, ans=0.0 2024-09-20 01:49:23,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=789140.0, ans=0.0 2024-09-20 01:49:36,542 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.638e+01 9.038e+01 9.626e+01 7.105e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-20 01:49:38,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=789180.0, ans=0.125 2024-09-20 01:49:46,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.34 vs. limit=10.0 2024-09-20 01:49:47,352 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:49:47,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=789180.0, ans=0.1 2024-09-20 01:50:04,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.22 vs. limit=22.5 2024-09-20 01:50:08,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=789260.0, ans=0.1 2024-09-20 01:50:22,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=789300.0, ans=0.025 2024-09-20 01:50:23,490 INFO [train.py:1198] (1/2) Epoch 44, batch 2750, loss[loss=0.2262, ctc_loss=0.1081, cr_loss=0.3554, attn_decoder_loss=0.2314, over 29503.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1099, cr_loss=0.3489, attn_decoder_loss=0.2379, over 5795460.65 frames. ], batch size: 75, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:50:42,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=789340.0, ans=0.0 2024-09-20 01:50:46,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=789340.0, ans=0.0 2024-09-20 01:50:57,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=789380.0, ans=0.125 2024-09-20 01:51:00,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2024-09-20 01:51:10,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-09-20 01:51:15,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-09-20 01:51:21,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=789420.0, ans=0.125 2024-09-20 01:51:23,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=789420.0, ans=0.125 2024-09-20 01:51:41,196 INFO [train.py:1198] (1/2) Epoch 44, batch 2800, loss[loss=0.2513, ctc_loss=0.1396, cr_loss=0.3926, attn_decoder_loss=0.255, over 19668.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1102, cr_loss=0.3494, attn_decoder_loss=0.2381, over 5776509.94 frames. ], batch size: 211, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:52:05,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=789540.0, ans=0.0 2024-09-20 01:52:06,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=22.5 2024-09-20 01:52:09,705 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 8.779e+01 9.114e+01 9.644e+01 1.703e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-20 01:52:48,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=789660.0, ans=0.125 2024-09-20 01:52:58,697 INFO [train.py:1198] (1/2) Epoch 44, batch 2850, loss[loss=0.2211, ctc_loss=0.09977, cr_loss=0.3344, attn_decoder_loss=0.2272, over 29507.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1104, cr_loss=0.35, attn_decoder_loss=0.2383, over 5761617.18 frames. ], batch size: 77, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:53:33,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=789780.0, ans=0.125 2024-09-20 01:53:53,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=789820.0, ans=0.125 2024-09-20 01:54:03,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=789860.0, ans=0.0 2024-09-20 01:54:13,952 INFO [train.py:1198] (1/2) Epoch 44, batch 2900, loss[loss=0.2296, ctc_loss=0.1126, cr_loss=0.3644, attn_decoder_loss=0.2345, over 29422.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1108, cr_loss=0.3511, attn_decoder_loss=0.2392, over 5787416.14 frames. ], batch size: 79, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:54:30,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=789940.0, ans=0.0 2024-09-20 01:54:45,222 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:54:46,279 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.571e+01 8.849e+01 9.680e+01 1.963e+02, threshold=1.770e+02, percent-clipped=2.0 2024-09-20 01:54:51,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.79 vs. limit=12.0 2024-09-20 01:55:18,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=790060.0, ans=0.0 2024-09-20 01:55:31,609 INFO [train.py:1198] (1/2) Epoch 44, batch 2950, loss[loss=0.2296, ctc_loss=0.1057, cr_loss=0.3409, attn_decoder_loss=0.2357, over 29534.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1096, cr_loss=0.3483, attn_decoder_loss=0.2379, over 5784088.59 frames. ], batch size: 75, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:55:53,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=22.5 2024-09-20 01:55:54,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=790140.0, ans=10.0 2024-09-20 01:56:10,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=790180.0, ans=0.0 2024-09-20 01:56:28,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=790220.0, ans=0.125 2024-09-20 01:56:49,765 INFO [train.py:1198] (1/2) Epoch 44, batch 3000, loss[loss=0.2357, ctc_loss=0.1148, cr_loss=0.3625, attn_decoder_loss=0.2411, over 29763.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1093, cr_loss=0.3477, attn_decoder_loss=0.2377, over 5783879.94 frames. ], batch size: 81, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:56:49,765 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 01:57:08,098 INFO [train.py:1230] (1/2) Epoch 44, validation: loss=0.2127, ctc_loss=0.03705, cr_loss=7.369e-15, attn_decoder_loss=0.2322, over 944034.00 frames. 2024-09-20 01:57:08,099 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 01:57:17,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=790300.0, ans=0.0 2024-09-20 01:57:26,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=790340.0, ans=0.025 2024-09-20 01:57:29,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=790340.0, ans=0.125 2024-09-20 01:57:30,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.37 vs. limit=15.0 2024-09-20 01:57:32,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=790340.0, ans=0.125 2024-09-20 01:57:38,252 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.680e+01 9.147e+01 9.757e+01 3.916e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-20 01:57:40,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=790380.0, ans=0.05 2024-09-20 01:58:00,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.07 vs. limit=10.0 2024-09-20 01:58:03,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.92 vs. limit=15.0 2024-09-20 01:58:14,775 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:58:26,516 INFO [train.py:1198] (1/2) Epoch 44, batch 3050, loss[loss=0.2327, ctc_loss=0.112, cr_loss=0.3502, attn_decoder_loss=0.2383, over 29525.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1099, cr_loss=0.3491, attn_decoder_loss=0.2384, over 5778749.02 frames. ], batch size: 76, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:58:40,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=790540.0, ans=0.0 2024-09-20 01:58:49,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=790540.0, ans=0.125 2024-09-20 01:58:52,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=790540.0, ans=0.0 2024-09-20 01:59:01,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=790580.0, ans=0.125 2024-09-20 01:59:12,251 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:59:22,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=790620.0, ans=0.0 2024-09-20 01:59:22,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.40 vs. limit=22.5 2024-09-20 01:59:24,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-20 01:59:40,807 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:59:41,916 INFO [train.py:1198] (1/2) Epoch 44, batch 3100, loss[loss=0.2428, ctc_loss=0.111, cr_loss=0.3582, attn_decoder_loss=0.2495, over 29261.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.11, cr_loss=0.3494, attn_decoder_loss=0.2384, over 5778818.51 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:59:43,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=790700.0, ans=0.125 2024-09-20 01:59:46,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=790700.0, ans=0.025 2024-09-20 01:59:50,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.89 vs. limit=15.0 2024-09-20 02:00:11,909 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.501e+01 8.989e+01 9.639e+01 2.477e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-20 02:00:19,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=790780.0, ans=0.125 2024-09-20 02:00:47,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=790860.0, ans=0.125 2024-09-20 02:00:51,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=790860.0, ans=0.09899494936611666 2024-09-20 02:00:58,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=790900.0, ans=0.125 2024-09-20 02:00:59,875 INFO [train.py:1198] (1/2) Epoch 44, batch 3150, loss[loss=0.2451, ctc_loss=0.1245, cr_loss=0.3698, attn_decoder_loss=0.2503, over 28826.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1099, cr_loss=0.3485, attn_decoder_loss=0.2382, over 5783795.32 frames. ], batch size: 104, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 02:01:23,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.25 vs. limit=15.0 2024-09-20 02:01:24,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790940.0, ans=0.1 2024-09-20 02:01:46,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=791020.0, ans=0.015 2024-09-20 02:02:17,227 INFO [train.py:1198] (1/2) Epoch 44, batch 3200, loss[loss=0.2272, ctc_loss=0.106, cr_loss=0.3297, attn_decoder_loss=0.2333, over 29395.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1098, cr_loss=0.3487, attn_decoder_loss=0.2378, over 5792395.42 frames. ], batch size: 79, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 02:02:17,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=791100.0, ans=0.125 2024-09-20 02:02:21,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=791100.0, ans=0.0 2024-09-20 02:02:25,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=791100.0, ans=0.0 2024-09-20 02:02:46,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=791180.0, ans=0.125 2024-09-20 02:02:47,456 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.647e+01 9.072e+01 9.601e+01 1.731e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 02:03:16,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=791260.0, ans=0.125 2024-09-20 02:03:22,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=791260.0, ans=0.125 2024-09-20 02:03:33,117 INFO [train.py:1198] (1/2) Epoch 44, batch 3250, loss[loss=0.2354, ctc_loss=0.1047, cr_loss=0.3406, attn_decoder_loss=0.2423, over 29697.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1101, cr_loss=0.3491, attn_decoder_loss=0.2382, over 5799170.43 frames. ], batch size: 84, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:03:42,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=791300.0, ans=0.0 2024-09-20 02:03:50,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=791340.0, ans=0.125 2024-09-20 02:04:50,970 INFO [train.py:1198] (1/2) Epoch 44, batch 3300, loss[loss=0.2425, ctc_loss=0.113, cr_loss=0.3351, attn_decoder_loss=0.2495, over 28495.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1094, cr_loss=0.3473, attn_decoder_loss=0.237, over 5797422.32 frames. ], batch size: 112, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:04:54,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=791500.0, ans=0.125 2024-09-20 02:05:19,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791580.0, ans=0.1 2024-09-20 02:05:21,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=791580.0, ans=0.125 2024-09-20 02:05:22,611 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.134e+01 8.597e+01 9.177e+01 9.695e+01 2.585e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-20 02:05:24,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=791580.0, ans=0.125 2024-09-20 02:05:28,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=791580.0, ans=0.0 2024-09-20 02:05:33,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=791580.0, ans=0.1 2024-09-20 02:05:43,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=791620.0, ans=0.2 2024-09-20 02:05:48,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=791620.0, ans=0.0 2024-09-20 02:05:59,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.57 vs. limit=15.0 2024-09-20 02:06:03,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=791660.0, ans=0.0 2024-09-20 02:06:07,998 INFO [train.py:1198] (1/2) Epoch 44, batch 3350, loss[loss=0.2473, ctc_loss=0.1236, cr_loss=0.3758, attn_decoder_loss=0.2527, over 28845.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1098, cr_loss=0.3484, attn_decoder_loss=0.2377, over 5775575.45 frames. ], batch size: 104, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:06:49,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=791780.0, ans=0.02 2024-09-20 02:07:18,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=791860.0, ans=0.125 2024-09-20 02:07:22,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=791900.0, ans=0.1 2024-09-20 02:07:23,727 INFO [train.py:1198] (1/2) Epoch 44, batch 3400, loss[loss=0.2121, ctc_loss=0.09895, cr_loss=0.3425, attn_decoder_loss=0.2171, over 29302.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1104, cr_loss=0.3494, attn_decoder_loss=0.238, over 5768693.98 frames. ], batch size: 67, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:07:24,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=791900.0, ans=10.0 2024-09-20 02:07:55,384 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.682e+01 9.111e+01 9.724e+01 2.135e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-20 02:08:15,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=792020.0, ans=0.125 2024-09-20 02:08:41,413 INFO [train.py:1198] (1/2) Epoch 44, batch 3450, loss[loss=0.2337, ctc_loss=0.1003, cr_loss=0.3282, attn_decoder_loss=0.2412, over 28314.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1103, cr_loss=0.3493, attn_decoder_loss=0.2382, over 5775054.91 frames. ], batch size: 111, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:09:05,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=792140.0, ans=0.07 2024-09-20 02:09:44,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=792260.0, ans=0.125 2024-09-20 02:09:58,571 INFO [train.py:1198] (1/2) Epoch 44, batch 3500, loss[loss=0.2065, ctc_loss=0.1003, cr_loss=0.3348, attn_decoder_loss=0.2108, over 29319.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1101, cr_loss=0.3493, attn_decoder_loss=0.2378, over 5776672.36 frames. ], batch size: 71, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:10:07,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-20 02:10:30,312 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.582e+01 8.980e+01 9.639e+01 1.678e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-20 02:10:57,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=792460.0, ans=0.0 2024-09-20 02:11:00,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=792460.0, ans=0.2 2024-09-20 02:11:03,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=792460.0, ans=0.0 2024-09-20 02:11:13,199 INFO [train.py:1198] (1/2) Epoch 44, batch 3550, loss[loss=0.2391, ctc_loss=0.1039, cr_loss=0.3516, attn_decoder_loss=0.2463, over 29690.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1099, cr_loss=0.3486, attn_decoder_loss=0.2377, over 5783112.36 frames. ], batch size: 89, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:11:13,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=792500.0, ans=0.125 2024-09-20 02:11:37,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-20 02:11:59,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.50 vs. limit=22.5 2024-09-20 02:12:10,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=792660.0, ans=0.025 2024-09-20 02:12:26,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-20 02:12:26,911 INFO [train.py:1198] (1/2) Epoch 44, batch 3600, loss[loss=0.2233, ctc_loss=0.1045, cr_loss=0.3255, attn_decoder_loss=0.2292, over 29523.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1097, cr_loss=0.3482, attn_decoder_loss=0.2377, over 5792254.27 frames. ], batch size: 77, lr: 2.49e-03, grad_scale: 16.0 2024-09-20 02:12:46,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-20 02:12:58,441 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.550e+01 9.094e+01 9.613e+01 3.759e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-20 02:13:06,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=792780.0, ans=0.0 2024-09-20 02:13:11,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=792820.0, ans=0.125 2024-09-20 02:13:12,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.70 vs. limit=10.0 2024-09-20 02:13:17,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=792820.0, ans=0.2 2024-09-20 02:13:21,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792820.0, ans=0.1 2024-09-20 02:13:29,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-20 02:13:41,890 INFO [train.py:1198] (1/2) Epoch 44, batch 3650, loss[loss=0.2446, ctc_loss=0.1164, cr_loss=0.3485, attn_decoder_loss=0.2511, over 29518.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1093, cr_loss=0.3474, attn_decoder_loss=0.2372, over 5793758.19 frames. ], batch size: 90, lr: 2.49e-03, grad_scale: 16.0 2024-09-20 02:13:48,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=792900.0, ans=0.2 2024-09-20 02:14:00,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=792940.0, ans=0.125 2024-09-20 02:14:10,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=792940.0, ans=0.125 2024-09-20 02:14:22,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=792980.0, ans=0.125 2024-09-20 02:14:46,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=793060.0, ans=0.2 2024-09-20 02:14:55,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=15.0 2024-09-20 02:14:58,108 INFO [train.py:1198] (1/2) Epoch 44, batch 3700, loss[loss=0.2402, ctc_loss=0.1059, cr_loss=0.3357, attn_decoder_loss=0.2476, over 29725.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1095, cr_loss=0.3484, attn_decoder_loss=0.2375, over 5803890.66 frames. ], batch size: 84, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:15:01,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=793100.0, ans=0.0 2024-09-20 02:15:13,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=793140.0, ans=0.2 2024-09-20 02:15:14,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.67 vs. limit=10.0 2024-09-20 02:15:16,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=793140.0, ans=0.1 2024-09-20 02:15:16,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=793140.0, ans=0.125 2024-09-20 02:15:32,678 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.629e+01 9.056e+01 9.534e+01 1.565e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-20 02:15:38,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-09-20 02:15:54,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.96 vs. limit=10.0 2024-09-20 02:16:05,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-09-20 02:16:14,379 INFO [train.py:1198] (1/2) Epoch 44, batch 3750, loss[loss=0.2092, ctc_loss=0.09821, cr_loss=0.3158, attn_decoder_loss=0.2145, over 29332.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1097, cr_loss=0.3491, attn_decoder_loss=0.2374, over 5807683.14 frames. ], batch size: 67, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:16:24,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=793300.0, ans=0.1 2024-09-20 02:16:32,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=793340.0, ans=0.1 2024-09-20 02:16:38,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=793340.0, ans=0.125 2024-09-20 02:16:45,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=793380.0, ans=0.09899494936611666 2024-09-20 02:16:51,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=793380.0, ans=0.2 2024-09-20 02:16:59,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2024-09-20 02:17:13,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793460.0, ans=0.1 2024-09-20 02:17:14,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.78 vs. limit=22.5 2024-09-20 02:17:21,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.99 vs. limit=10.0 2024-09-20 02:17:27,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=793500.0, ans=0.2 2024-09-20 02:17:28,372 INFO [train.py:1198] (1/2) Epoch 44, batch 3800, loss[loss=0.2408, ctc_loss=0.1126, cr_loss=0.3657, attn_decoder_loss=0.247, over 29612.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1096, cr_loss=0.3485, attn_decoder_loss=0.2371, over 5797883.05 frames. ], batch size: 86, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:17:46,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2024-09-20 02:17:59,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=793580.0, ans=0.125 2024-09-20 02:18:00,998 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.351e+01 9.103e+01 9.836e+01 3.154e+02, threshold=1.821e+02, percent-clipped=2.0 2024-09-20 02:18:40,153 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:18:42,686 INFO [train.py:1198] (1/2) Epoch 44, batch 3850, loss[loss=0.2452, ctc_loss=0.1241, cr_loss=0.3862, attn_decoder_loss=0.2501, over 29200.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1093, cr_loss=0.3487, attn_decoder_loss=0.2372, over 5811349.55 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:18:49,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=12.0 2024-09-20 02:19:06,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=793740.0, ans=0.125 2024-09-20 02:19:08,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=793740.0, ans=0.125 2024-09-20 02:19:10,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=15.0 2024-09-20 02:19:11,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=793780.0, ans=0.02 2024-09-20 02:19:21,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=793780.0, ans=0.0 2024-09-20 02:19:52,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=793860.0, ans=0.1 2024-09-20 02:19:57,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2024-09-20 02:19:58,446 INFO [train.py:1198] (1/2) Epoch 44, batch 3900, loss[loss=0.2366, ctc_loss=0.1076, cr_loss=0.3394, attn_decoder_loss=0.2434, over 29647.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1097, cr_loss=0.3493, attn_decoder_loss=0.2375, over 5815762.56 frames. ], batch size: 86, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:20:06,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=793900.0, ans=0.0 2024-09-20 02:20:12,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-09-20 02:20:19,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=793940.0, ans=0.2 2024-09-20 02:20:31,105 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 8.637e+01 9.253e+01 9.637e+01 1.224e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-20 02:20:36,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=793980.0, ans=0.125 2024-09-20 02:20:42,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.29 vs. limit=15.0 2024-09-20 02:21:08,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=794060.0, ans=0.0 2024-09-20 02:21:12,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794100.0, ans=0.1 2024-09-20 02:21:14,111 INFO [train.py:1198] (1/2) Epoch 44, batch 3950, loss[loss=0.2531, ctc_loss=0.1233, cr_loss=0.3747, attn_decoder_loss=0.2591, over 29480.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1094, cr_loss=0.3495, attn_decoder_loss=0.2374, over 5835136.55 frames. ], batch size: 97, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:21:23,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=794100.0, ans=0.125 2024-09-20 02:21:31,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-09-20 02:21:32,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=794140.0, ans=0.125 2024-09-20 02:21:49,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=794180.0, ans=0.125 2024-09-20 02:21:51,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=794180.0, ans=0.125 2024-09-20 02:22:11,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794260.0, ans=0.1 2024-09-20 02:22:12,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=794260.0, ans=0.125 2024-09-20 02:22:27,457 INFO [train.py:1198] (1/2) Epoch 44, batch 4000, loss[loss=0.2218, ctc_loss=0.09676, cr_loss=0.3126, attn_decoder_loss=0.2287, over 29498.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1097, cr_loss=0.3495, attn_decoder_loss=0.2376, over 5812776.30 frames. ], batch size: 74, lr: 2.49e-03, grad_scale: 16.0 2024-09-20 02:22:36,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=794300.0, ans=0.1 2024-09-20 02:22:43,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=794340.0, ans=0.5 2024-09-20 02:22:48,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.57 vs. limit=22.5 2024-09-20 02:22:51,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=794340.0, ans=0.125 2024-09-20 02:22:58,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=794380.0, ans=0.0 2024-09-20 02:23:01,247 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.690e+01 9.242e+01 9.635e+01 1.653e+02, threshold=1.848e+02, percent-clipped=0.0 2024-09-20 02:23:16,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=794420.0, ans=0.125 2024-09-20 02:23:16,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794420.0, ans=0.1 2024-09-20 02:23:28,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2024-09-20 02:23:41,485 INFO [train.py:1198] (1/2) Epoch 44, batch 4050, loss[loss=0.2507, ctc_loss=0.1383, cr_loss=0.3814, attn_decoder_loss=0.2547, over 19797.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1098, cr_loss=0.3492, attn_decoder_loss=0.2375, over 5796288.93 frames. ], batch size: 210, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:23:42,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=22.5 2024-09-20 02:23:43,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=22.5 2024-09-20 02:24:01,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-20 02:24:07,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=794540.0, ans=0.125 2024-09-20 02:24:09,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=794580.0, ans=0.125 2024-09-20 02:24:21,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=794580.0, ans=0.0 2024-09-20 02:24:24,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=794620.0, ans=0.0 2024-09-20 02:24:24,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=12.0 2024-09-20 02:24:28,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=794620.0, ans=0.1 2024-09-20 02:24:39,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=794660.0, ans=0.125 2024-09-20 02:24:50,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=794660.0, ans=0.2 2024-09-20 02:24:56,006 INFO [train.py:1198] (1/2) Epoch 44, batch 4100, loss[loss=0.2456, ctc_loss=0.1188, cr_loss=0.3768, attn_decoder_loss=0.2513, over 29509.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1101, cr_loss=0.35, attn_decoder_loss=0.2377, over 5792027.87 frames. ], batch size: 90, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:24:56,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=794700.0, ans=0.0 2024-09-20 02:24:57,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=794700.0, ans=0.125 2024-09-20 02:25:10,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=794740.0, ans=0.0 2024-09-20 02:25:23,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=794740.0, ans=0.125 2024-09-20 02:25:30,720 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.703e+01 9.227e+01 9.918e+01 1.839e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-20 02:25:32,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=794780.0, ans=0.0 2024-09-20 02:25:54,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=794860.0, ans=0.2 2024-09-20 02:25:54,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.66 vs. limit=15.0 2024-09-20 02:25:58,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=794860.0, ans=0.07 2024-09-20 02:26:07,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=794860.0, ans=0.0 2024-09-20 02:26:10,188 INFO [train.py:1198] (1/2) Epoch 44, batch 4150, loss[loss=0.2242, ctc_loss=0.1093, cr_loss=0.3378, attn_decoder_loss=0.2295, over 29480.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.11, cr_loss=0.3501, attn_decoder_loss=0.2375, over 5797517.99 frames. ], batch size: 77, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:26:17,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=794900.0, ans=0.125 2024-09-20 02:26:27,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-20 02:26:29,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794940.0, ans=0.1 2024-09-20 02:26:33,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=794940.0, ans=0.1 2024-09-20 02:26:40,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-09-20 02:26:48,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.97 vs. limit=22.5 2024-09-20 02:26:55,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=795020.0, ans=0.125 2024-09-20 02:26:57,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=795020.0, ans=0.1 2024-09-20 02:27:23,537 INFO [train.py:1198] (1/2) Epoch 44, batch 4200, loss[loss=0.2539, ctc_loss=0.1361, cr_loss=0.4031, attn_decoder_loss=0.258, over 29525.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1101, cr_loss=0.3501, attn_decoder_loss=0.2376, over 5799826.37 frames. ], batch size: 90, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:27:31,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=795100.0, ans=0.2 2024-09-20 02:27:55,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-09-20 02:27:57,346 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 8.655e+01 9.286e+01 9.774e+01 5.497e+02, threshold=1.857e+02, percent-clipped=1.0 2024-09-20 02:28:08,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=795220.0, ans=0.125 2024-09-20 02:28:34,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=795260.0, ans=0.125 2024-09-20 02:28:34,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=795260.0, ans=0.125 2024-09-20 02:28:38,194 INFO [train.py:1198] (1/2) Epoch 44, batch 4250, loss[loss=0.2217, ctc_loss=0.101, cr_loss=0.3323, attn_decoder_loss=0.2277, over 29513.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1101, cr_loss=0.3503, attn_decoder_loss=0.2379, over 5805715.72 frames. ], batch size: 74, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:28:48,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=795300.0, ans=0.125 2024-09-20 02:29:01,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=795340.0, ans=0.1 2024-09-20 02:29:12,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=795380.0, ans=0.0 2024-09-20 02:29:14,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=795380.0, ans=0.2 2024-09-20 02:29:24,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=795420.0, ans=0.0 2024-09-20 02:29:25,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=795420.0, ans=0.025 2024-09-20 02:29:33,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=795420.0, ans=0.125 2024-09-20 02:29:45,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=795460.0, ans=0.0 2024-09-20 02:29:51,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=795500.0, ans=0.125 2024-09-20 02:29:52,308 INFO [train.py:1198] (1/2) Epoch 44, batch 4300, loss[loss=0.2304, ctc_loss=0.09845, cr_loss=0.3334, attn_decoder_loss=0.2376, over 29541.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1097, cr_loss=0.3491, attn_decoder_loss=0.238, over 5794882.27 frames. ], batch size: 87, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:30:15,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.89 vs. limit=12.0 2024-09-20 02:30:16,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=795540.0, ans=0.125 2024-09-20 02:30:26,383 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.755e+01 9.251e+01 9.683e+01 2.005e+02, threshold=1.850e+02, percent-clipped=1.0 2024-09-20 02:30:27,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.38 vs. limit=12.0 2024-09-20 02:30:50,387 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:30:50,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2024-09-20 02:31:06,395 INFO [train.py:1198] (1/2) Epoch 44, batch 4350, loss[loss=0.2414, ctc_loss=0.1159, cr_loss=0.3584, attn_decoder_loss=0.2474, over 29493.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1121, cr_loss=0.3548, attn_decoder_loss=0.2411, over 5796912.02 frames. ], batch size: 97, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:31:45,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=795780.0, ans=0.2 2024-09-20 02:32:10,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=795860.0, ans=0.0 2024-09-20 02:32:20,758 INFO [train.py:1198] (1/2) Epoch 44, batch 4400, loss[loss=0.2468, ctc_loss=0.1238, cr_loss=0.3827, attn_decoder_loss=0.252, over 27386.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1131, cr_loss=0.3569, attn_decoder_loss=0.243, over 5765671.91 frames. ], batch size: 124, lr: 2.49e-03, grad_scale: 16.0 2024-09-20 02:32:23,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=795900.0, ans=0.125 2024-09-20 02:32:31,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=795900.0, ans=0.125 2024-09-20 02:32:40,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=795940.0, ans=0.125 2024-09-20 02:32:54,482 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.213e+01 9.042e+01 9.394e+01 9.819e+01 2.193e+02, threshold=1.879e+02, percent-clipped=1.0 2024-09-20 02:32:56,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=795980.0, ans=0.2 2024-09-20 02:33:09,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=796020.0, ans=0.125 2024-09-20 02:33:12,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=796020.0, ans=0.1 2024-09-20 02:33:15,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=796020.0, ans=0.125 2024-09-20 02:33:22,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=796060.0, ans=0.0 2024-09-20 02:33:34,340 INFO [train.py:1198] (1/2) Epoch 44, batch 4450, loss[loss=0.2555, ctc_loss=0.1412, cr_loss=0.396, attn_decoder_loss=0.2594, over 19589.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1167, cr_loss=0.3621, attn_decoder_loss=0.2452, over 5570449.37 frames. ], batch size: 209, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:34:04,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796180.0, ans=0.1 2024-09-20 02:34:07,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=796180.0, ans=0.125 2024-09-20 02:34:35,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=796260.0, ans=0.125 2024-09-20 02:34:44,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=796260.0, ans=0.125 2024-09-20 02:34:49,807 INFO [train.py:1198] (1/2) Epoch 44, batch 4500, loss[loss=0.2507, ctc_loss=0.1326, cr_loss=0.385, attn_decoder_loss=0.2553, over 20655.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.12, cr_loss=0.3652, attn_decoder_loss=0.247, over 5228368.23 frames. ], batch size: 210, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:34:50,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=796300.0, ans=0.1 2024-09-20 02:34:51,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=796300.0, ans=0.125 2024-09-20 02:34:56,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=796300.0, ans=0.2 2024-09-20 02:34:56,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-09-20 02:34:57,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=796300.0, ans=0.0 2024-09-20 02:35:19,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2024-09-20 02:35:26,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.915e+01 1.070e+02 1.147e+02 1.258e+02 2.122e+02, threshold=2.294e+02, percent-clipped=1.0 2024-09-20 02:36:17,463 INFO [train.py:1198] (1/2) Epoch 45, batch 0, loss[loss=0.219, ctc_loss=0.09591, cr_loss=0.3232, attn_decoder_loss=0.2255, over 29611.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.09591, cr_loss=0.3232, attn_decoder_loss=0.2255, over 29611.00 frames. ], batch size: 73, lr: 2.46e-03, grad_scale: 16.0 2024-09-20 02:36:17,463 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 02:36:35,781 INFO [train.py:1230] (1/2) Epoch 45, validation: loss=0.2126, ctc_loss=0.03577, cr_loss=6.589e-15, attn_decoder_loss=0.2323, over 944034.00 frames. 2024-09-20 02:36:35,781 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 02:36:42,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=796400.0, ans=0.125 2024-09-20 02:36:42,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2024-09-20 02:37:02,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=796440.0, ans=0.0 2024-09-20 02:37:05,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=796440.0, ans=0.125 2024-09-20 02:37:11,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=796480.0, ans=0.2 2024-09-20 02:37:14,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=796480.0, ans=0.0 2024-09-20 02:37:53,208 INFO [train.py:1198] (1/2) Epoch 45, batch 50, loss[loss=0.2089, ctc_loss=0.09461, cr_loss=0.3195, attn_decoder_loss=0.2145, over 29455.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1098, cr_loss=0.349, attn_decoder_loss=0.2381, over 1269172.78 frames. ], batch size: 70, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:37:58,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=796600.0, ans=0.0 2024-09-20 02:38:08,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=796640.0, ans=0.0 2024-09-20 02:38:11,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=796640.0, ans=0.125 2024-09-20 02:38:25,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=796680.0, ans=0.125 2024-09-20 02:38:33,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2024-09-20 02:38:39,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=796720.0, ans=0.0 2024-09-20 02:38:40,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=796720.0, ans=0.125 2024-09-20 02:38:42,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=796720.0, ans=0.125 2024-09-20 02:38:45,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-09-20 02:39:03,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=796760.0, ans=0.125 2024-09-20 02:39:09,103 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.517e+01 8.971e+01 9.670e+01 3.092e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-20 02:39:09,124 INFO [train.py:1198] (1/2) Epoch 45, batch 100, loss[loss=0.2223, ctc_loss=0.09855, cr_loss=0.3215, attn_decoder_loss=0.2289, over 29531.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1113, cr_loss=0.3523, attn_decoder_loss=0.2403, over 2254328.17 frames. ], batch size: 76, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:39:31,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=796840.0, ans=0.1 2024-09-20 02:39:47,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=796880.0, ans=0.025 2024-09-20 02:40:01,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-09-20 02:40:15,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796960.0, ans=0.1 2024-09-20 02:40:15,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=796960.0, ans=0.125 2024-09-20 02:40:24,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=797000.0, ans=0.0 2024-09-20 02:40:25,365 INFO [train.py:1198] (1/2) Epoch 45, batch 150, loss[loss=0.2144, ctc_loss=0.09842, cr_loss=0.3261, attn_decoder_loss=0.2201, over 29398.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1091, cr_loss=0.3481, attn_decoder_loss=0.238, over 3049078.97 frames. ], batch size: 70, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:40:34,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=797000.0, ans=0.0 2024-09-20 02:40:39,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=797040.0, ans=0.125 2024-09-20 02:40:47,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-20 02:40:59,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=797080.0, ans=0.0 2024-09-20 02:41:10,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.82 vs. limit=6.0 2024-09-20 02:41:11,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=797120.0, ans=10.0 2024-09-20 02:41:36,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797160.0, ans=0.1 2024-09-20 02:41:42,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.414e+01 8.795e+01 9.375e+01 1.270e+02, threshold=1.759e+02, percent-clipped=0.0 2024-09-20 02:41:42,451 INFO [train.py:1198] (1/2) Epoch 45, batch 200, loss[loss=0.2466, ctc_loss=0.1143, cr_loss=0.3592, attn_decoder_loss=0.2534, over 27304.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1086, cr_loss=0.3467, attn_decoder_loss=0.237, over 3659275.26 frames. ], batch size: 124, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:41:45,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=797200.0, ans=0.125 2024-09-20 02:41:52,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-09-20 02:41:54,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=797200.0, ans=0.0 2024-09-20 02:41:56,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=797240.0, ans=0.125 2024-09-20 02:42:20,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=797280.0, ans=0.125 2024-09-20 02:42:26,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=797320.0, ans=0.125 2024-09-20 02:42:45,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.30 vs. limit=12.0 2024-09-20 02:42:46,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=797360.0, ans=0.2 2024-09-20 02:42:49,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=797360.0, ans=0.125 2024-09-20 02:42:52,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797360.0, ans=0.1 2024-09-20 02:42:58,110 INFO [train.py:1198] (1/2) Epoch 45, batch 250, loss[loss=0.2475, ctc_loss=0.1143, cr_loss=0.3599, attn_decoder_loss=0.2543, over 29245.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1084, cr_loss=0.346, attn_decoder_loss=0.2371, over 4141352.84 frames. ], batch size: 100, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:43:01,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=797400.0, ans=0.125 2024-09-20 02:43:18,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=797440.0, ans=0.125 2024-09-20 02:43:53,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=797520.0, ans=0.125 2024-09-20 02:44:16,168 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.464e+01 8.917e+01 9.593e+01 1.535e+02, threshold=1.783e+02, percent-clipped=0.0 2024-09-20 02:44:16,190 INFO [train.py:1198] (1/2) Epoch 45, batch 300, loss[loss=0.249, ctc_loss=0.1149, cr_loss=0.351, attn_decoder_loss=0.2561, over 29494.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1082, cr_loss=0.3457, attn_decoder_loss=0.2368, over 4509788.25 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:44:48,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=797680.0, ans=0.0 2024-09-20 02:45:14,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=797720.0, ans=0.2 2024-09-20 02:45:33,891 INFO [train.py:1198] (1/2) Epoch 45, batch 350, loss[loss=0.211, ctc_loss=0.09443, cr_loss=0.3245, attn_decoder_loss=0.2167, over 29337.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1087, cr_loss=0.3473, attn_decoder_loss=0.2371, over 4795555.80 frames. ], batch size: 71, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:45:43,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=797800.0, ans=0.125 2024-09-20 02:45:44,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=797800.0, ans=0.0 2024-09-20 02:45:52,129 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:46:01,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=797840.0, ans=0.0 2024-09-20 02:46:01,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=797840.0, ans=0.2 2024-09-20 02:46:02,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=797880.0, ans=0.125 2024-09-20 02:46:04,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=12.0 2024-09-20 02:46:22,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.60 vs. limit=10.0 2024-09-20 02:46:23,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=797920.0, ans=0.125 2024-09-20 02:46:35,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=797960.0, ans=0.1 2024-09-20 02:46:40,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=797960.0, ans=0.125 2024-09-20 02:46:44,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=797960.0, ans=0.05 2024-09-20 02:46:48,812 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.541e+01 8.980e+01 9.725e+01 1.224e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-20 02:46:48,838 INFO [train.py:1198] (1/2) Epoch 45, batch 400, loss[loss=0.2365, ctc_loss=0.109, cr_loss=0.348, attn_decoder_loss=0.243, over 29685.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1085, cr_loss=0.3468, attn_decoder_loss=0.2368, over 5025545.60 frames. ], batch size: 82, lr: 2.46e-03, grad_scale: 16.0 2024-09-20 02:46:50,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=798000.0, ans=0.2 2024-09-20 02:46:58,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=798000.0, ans=0.2 2024-09-20 02:47:07,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-09-20 02:47:20,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=798080.0, ans=0.0 2024-09-20 02:47:33,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=798080.0, ans=0.0 2024-09-20 02:47:47,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=798120.0, ans=0.125 2024-09-20 02:47:54,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=798160.0, ans=0.0 2024-09-20 02:48:06,792 INFO [train.py:1198] (1/2) Epoch 45, batch 450, loss[loss=0.238, ctc_loss=0.1099, cr_loss=0.3485, attn_decoder_loss=0.2445, over 29692.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1086, cr_loss=0.3469, attn_decoder_loss=0.2369, over 5186927.60 frames. ], batch size: 83, lr: 2.46e-03, grad_scale: 16.0 2024-09-20 02:48:28,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=798240.0, ans=0.0 2024-09-20 02:48:34,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=798240.0, ans=0.125 2024-09-20 02:48:48,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=798280.0, ans=0.0 2024-09-20 02:48:50,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=798280.0, ans=0.2 2024-09-20 02:48:53,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=22.5 2024-09-20 02:48:56,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=798320.0, ans=0.125 2024-09-20 02:48:58,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=798320.0, ans=15.0 2024-09-20 02:49:24,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.412e+01 8.859e+01 9.470e+01 4.425e+02, threshold=1.772e+02, percent-clipped=1.0 2024-09-20 02:49:24,790 INFO [train.py:1198] (1/2) Epoch 45, batch 500, loss[loss=0.247, ctc_loss=0.1241, cr_loss=0.3907, attn_decoder_loss=0.252, over 29403.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1087, cr_loss=0.3475, attn_decoder_loss=0.2363, over 5329245.42 frames. ], batch size: 94, lr: 2.46e-03, grad_scale: 16.0 2024-09-20 02:49:43,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2024-09-20 02:49:47,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=798440.0, ans=0.125 2024-09-20 02:49:58,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=798480.0, ans=0.125 2024-09-20 02:50:01,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=798480.0, ans=0.1 2024-09-20 02:50:40,204 INFO [train.py:1198] (1/2) Epoch 45, batch 550, loss[loss=0.2436, ctc_loss=0.1159, cr_loss=0.3581, attn_decoder_loss=0.2498, over 28772.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1093, cr_loss=0.3482, attn_decoder_loss=0.2369, over 5422194.96 frames. ], batch size: 104, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:51:03,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=798640.0, ans=0.0 2024-09-20 02:51:07,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=798640.0, ans=0.1 2024-09-20 02:51:29,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=798720.0, ans=0.0 2024-09-20 02:51:53,787 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:51:57,988 INFO [train.py:1198] (1/2) Epoch 45, batch 600, loss[loss=0.2502, ctc_loss=0.1233, cr_loss=0.3758, attn_decoder_loss=0.256, over 29284.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1093, cr_loss=0.3479, attn_decoder_loss=0.2372, over 5509726.45 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:51:59,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.595e+01 9.062e+01 9.748e+01 3.862e+02, threshold=1.812e+02, percent-clipped=2.0 2024-09-20 02:52:19,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2024-09-20 02:53:02,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=798960.0, ans=0.125 2024-09-20 02:53:07,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=798960.0, ans=0.2 2024-09-20 02:53:09,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=798960.0, ans=0.125 2024-09-20 02:53:14,709 INFO [train.py:1198] (1/2) Epoch 45, batch 650, loss[loss=0.2292, ctc_loss=0.1054, cr_loss=0.3336, attn_decoder_loss=0.2355, over 29773.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1087, cr_loss=0.3466, attn_decoder_loss=0.2364, over 5586456.74 frames. ], batch size: 81, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:53:21,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=799000.0, ans=0.1 2024-09-20 02:53:21,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=799000.0, ans=0.125 2024-09-20 02:53:45,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2024-09-20 02:53:54,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=799080.0, ans=0.04949747468305833 2024-09-20 02:53:59,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=799120.0, ans=0.0 2024-09-20 02:54:07,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=22.5 2024-09-20 02:54:30,730 INFO [train.py:1198] (1/2) Epoch 45, batch 700, loss[loss=0.2237, ctc_loss=0.1103, cr_loss=0.3704, attn_decoder_loss=0.2281, over 29528.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1093, cr_loss=0.3483, attn_decoder_loss=0.2373, over 5636799.99 frames. ], batch size: 76, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:54:32,187 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.994e+01 8.670e+01 9.106e+01 9.852e+01 1.537e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-20 02:54:39,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.62 vs. limit=15.0 2024-09-20 02:54:41,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=799200.0, ans=0.2 2024-09-20 02:55:13,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=799280.0, ans=0.125 2024-09-20 02:55:27,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=799320.0, ans=0.0 2024-09-20 02:55:29,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=799320.0, ans=0.0 2024-09-20 02:55:36,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=799360.0, ans=0.0 2024-09-20 02:55:48,584 INFO [train.py:1198] (1/2) Epoch 45, batch 750, loss[loss=0.2385, ctc_loss=0.1136, cr_loss=0.3687, attn_decoder_loss=0.2442, over 29715.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1092, cr_loss=0.348, attn_decoder_loss=0.2369, over 5675097.99 frames. ], batch size: 82, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:56:16,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.47 vs. limit=15.0 2024-09-20 02:56:16,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=12.0 2024-09-20 02:56:19,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=22.5 2024-09-20 02:56:31,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=799480.0, ans=0.2 2024-09-20 02:56:35,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799520.0, ans=0.1 2024-09-20 02:56:54,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=799560.0, ans=0.05 2024-09-20 02:56:55,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.00 vs. limit=15.0 2024-09-20 02:57:06,227 INFO [train.py:1198] (1/2) Epoch 45, batch 800, loss[loss=0.216, ctc_loss=0.08939, cr_loss=0.3012, attn_decoder_loss=0.2233, over 29593.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1099, cr_loss=0.3497, attn_decoder_loss=0.2374, over 5706017.00 frames. ], batch size: 73, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 02:57:07,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 8.615e+01 9.052e+01 9.760e+01 1.570e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-20 02:57:57,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.05 vs. limit=22.5 2024-09-20 02:58:00,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-09-20 02:58:15,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=799760.0, ans=0.125 2024-09-20 02:58:18,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=799760.0, ans=0.0 2024-09-20 02:58:21,372 INFO [train.py:1198] (1/2) Epoch 45, batch 850, loss[loss=0.2508, ctc_loss=0.1214, cr_loss=0.3733, attn_decoder_loss=0.2568, over 29700.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1096, cr_loss=0.3487, attn_decoder_loss=0.2372, over 5734686.71 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:58:21,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=799800.0, ans=0.2 2024-09-20 02:58:36,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=799840.0, ans=0.0 2024-09-20 02:58:39,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799840.0, ans=0.1 2024-09-20 02:58:40,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=799840.0, ans=0.2 2024-09-20 02:58:56,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-09-20 02:59:09,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.77 vs. limit=5.0 2024-09-20 02:59:46,329 INFO [train.py:1198] (1/2) Epoch 45, batch 900, loss[loss=0.2225, ctc_loss=0.107, cr_loss=0.3396, attn_decoder_loss=0.2278, over 29606.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1098, cr_loss=0.3489, attn_decoder_loss=0.2375, over 5739802.06 frames. ], batch size: 73, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:59:47,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.56 vs. limit=22.5 2024-09-20 02:59:49,262 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.442e+01 9.122e+01 9.676e+01 4.269e+02, threshold=1.824e+02, percent-clipped=2.0 2024-09-20 02:59:55,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=800000.0, ans=0.09899494936611666 2024-09-20 03:00:04,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=800040.0, ans=0.125 2024-09-20 03:00:09,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=800040.0, ans=10.0 2024-09-20 03:00:17,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=800080.0, ans=0.125 2024-09-20 03:01:03,277 INFO [train.py:1198] (1/2) Epoch 45, batch 950, loss[loss=0.2093, ctc_loss=0.09037, cr_loss=0.304, attn_decoder_loss=0.2158, over 29493.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1096, cr_loss=0.3485, attn_decoder_loss=0.2374, over 5742582.79 frames. ], batch size: 74, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:01:38,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=800280.0, ans=0.0 2024-09-20 03:01:49,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.60 vs. limit=10.0 2024-09-20 03:02:14,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2024-09-20 03:02:18,168 INFO [train.py:1198] (1/2) Epoch 45, batch 1000, loss[loss=0.2219, ctc_loss=0.1022, cr_loss=0.3218, attn_decoder_loss=0.228, over 29520.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1101, cr_loss=0.3497, attn_decoder_loss=0.2378, over 5737749.09 frames. ], batch size: 77, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:02:21,232 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.681e+01 9.118e+01 9.953e+01 2.174e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-20 03:02:38,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=800440.0, ans=0.0 2024-09-20 03:02:42,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=800440.0, ans=0.1 2024-09-20 03:03:07,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800520.0, ans=0.1 2024-09-20 03:03:08,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=800520.0, ans=0.0 2024-09-20 03:03:21,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-20 03:03:22,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=800560.0, ans=0.125 2024-09-20 03:03:35,499 INFO [train.py:1198] (1/2) Epoch 45, batch 1050, loss[loss=0.2468, ctc_loss=0.1203, cr_loss=0.3693, attn_decoder_loss=0.2526, over 29685.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1099, cr_loss=0.3491, attn_decoder_loss=0.2372, over 5746219.65 frames. ], batch size: 85, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:03:44,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=800600.0, ans=0.125 2024-09-20 03:03:51,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=800640.0, ans=0.2 2024-09-20 03:03:58,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=800640.0, ans=0.125 2024-09-20 03:04:06,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=800680.0, ans=0.125 2024-09-20 03:04:06,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=800680.0, ans=0.0 2024-09-20 03:04:07,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=800680.0, ans=0.2 2024-09-20 03:04:13,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800680.0, ans=0.1 2024-09-20 03:04:53,633 INFO [train.py:1198] (1/2) Epoch 45, batch 1100, loss[loss=0.2243, ctc_loss=0.1024, cr_loss=0.3311, attn_decoder_loss=0.2305, over 29464.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1098, cr_loss=0.3488, attn_decoder_loss=0.2371, over 5757902.63 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:04:56,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.469e+01 8.955e+01 9.647e+01 1.370e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-20 03:05:06,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.36 vs. limit=12.0 2024-09-20 03:05:17,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=800840.0, ans=0.2 2024-09-20 03:05:40,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800920.0, ans=0.1 2024-09-20 03:05:40,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=800920.0, ans=0.125 2024-09-20 03:06:09,166 INFO [train.py:1198] (1/2) Epoch 45, batch 1150, loss[loss=0.2268, ctc_loss=0.1112, cr_loss=0.3422, attn_decoder_loss=0.2321, over 29442.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1097, cr_loss=0.3486, attn_decoder_loss=0.237, over 5757522.60 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:06:23,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=801040.0, ans=0.125 2024-09-20 03:06:38,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=801080.0, ans=0.125 2024-09-20 03:06:41,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=801080.0, ans=0.125 2024-09-20 03:06:50,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=801080.0, ans=0.125 2024-09-20 03:06:57,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=801120.0, ans=0.0 2024-09-20 03:07:10,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=801160.0, ans=0.125 2024-09-20 03:07:17,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=12.0 2024-09-20 03:07:18,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=801160.0, ans=0.015 2024-09-20 03:07:26,998 INFO [train.py:1198] (1/2) Epoch 45, batch 1200, loss[loss=0.2498, ctc_loss=0.1204, cr_loss=0.3758, attn_decoder_loss=0.2558, over 29687.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1102, cr_loss=0.3494, attn_decoder_loss=0.2378, over 5749581.96 frames. ], batch size: 85, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:07:29,985 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.448e+01 9.125e+01 9.558e+01 3.990e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-20 03:07:55,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=801280.0, ans=0.0 2024-09-20 03:07:57,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=801280.0, ans=0.125 2024-09-20 03:07:57,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.97 vs. limit=22.5 2024-09-20 03:08:32,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=801360.0, ans=0.125 2024-09-20 03:08:44,560 INFO [train.py:1198] (1/2) Epoch 45, batch 1250, loss[loss=0.2443, ctc_loss=0.1227, cr_loss=0.3824, attn_decoder_loss=0.2493, over 29503.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1107, cr_loss=0.3511, attn_decoder_loss=0.2385, over 5777240.23 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:08:57,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=801400.0, ans=0.2 2024-09-20 03:09:04,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=801440.0, ans=15.0 2024-09-20 03:09:36,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=801520.0, ans=0.125 2024-09-20 03:10:00,503 INFO [train.py:1198] (1/2) Epoch 45, batch 1300, loss[loss=0.2396, ctc_loss=0.1117, cr_loss=0.3659, attn_decoder_loss=0.2457, over 28365.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1103, cr_loss=0.3503, attn_decoder_loss=0.2379, over 5780777.38 frames. ], batch size: 111, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:10:03,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.720e+01 9.060e+01 9.963e+01 1.314e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-20 03:10:05,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=801600.0, ans=0.125 2024-09-20 03:10:23,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=801640.0, ans=10.0 2024-09-20 03:10:40,176 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:10:50,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801720.0, ans=0.1 2024-09-20 03:11:18,208 INFO [train.py:1198] (1/2) Epoch 45, batch 1350, loss[loss=0.2347, ctc_loss=0.1107, cr_loss=0.3637, attn_decoder_loss=0.2404, over 29750.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1098, cr_loss=0.3495, attn_decoder_loss=0.2379, over 5795982.08 frames. ], batch size: 81, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:11:25,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=801800.0, ans=0.0 2024-09-20 03:11:30,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=801800.0, ans=0.1 2024-09-20 03:11:50,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=801880.0, ans=0.0 2024-09-20 03:12:19,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801960.0, ans=0.1 2024-09-20 03:12:35,462 INFO [train.py:1198] (1/2) Epoch 45, batch 1400, loss[loss=0.2018, ctc_loss=0.08961, cr_loss=0.2968, attn_decoder_loss=0.2077, over 29603.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1097, cr_loss=0.3493, attn_decoder_loss=0.2377, over 5807730.63 frames. ], batch size: 69, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:12:39,955 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.353e+01 8.806e+01 9.318e+01 1.165e+02, threshold=1.761e+02, percent-clipped=0.0 2024-09-20 03:12:46,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=802000.0, ans=0.0 2024-09-20 03:12:47,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=802000.0, ans=0.125 2024-09-20 03:12:47,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=802000.0, ans=0.125 2024-09-20 03:12:59,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=802040.0, ans=0.0 2024-09-20 03:12:59,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=802040.0, ans=0.125 2024-09-20 03:13:05,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=10.0 2024-09-20 03:13:24,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=12.0 2024-09-20 03:13:28,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=802120.0, ans=0.0 2024-09-20 03:13:46,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=802160.0, ans=0.125 2024-09-20 03:13:50,619 INFO [train.py:1198] (1/2) Epoch 45, batch 1450, loss[loss=0.2472, ctc_loss=0.1226, cr_loss=0.3748, attn_decoder_loss=0.2527, over 29465.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1097, cr_loss=0.3494, attn_decoder_loss=0.2379, over 5804096.50 frames. ], batch size: 94, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:13:56,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=802200.0, ans=0.0 2024-09-20 03:13:58,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=802200.0, ans=0.2 2024-09-20 03:13:58,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=802200.0, ans=0.0 2024-09-20 03:15:02,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=802360.0, ans=0.2 2024-09-20 03:15:05,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=802360.0, ans=0.125 2024-09-20 03:15:08,090 INFO [train.py:1198] (1/2) Epoch 45, batch 1500, loss[loss=0.2338, ctc_loss=0.107, cr_loss=0.3548, attn_decoder_loss=0.24, over 29631.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1096, cr_loss=0.3494, attn_decoder_loss=0.2381, over 5805061.55 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:15:12,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 8.707e+01 9.148e+01 9.626e+01 3.931e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-20 03:15:55,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=802520.0, ans=0.125 2024-09-20 03:15:55,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=802520.0, ans=0.125 2024-09-20 03:16:16,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=802560.0, ans=0.0 2024-09-20 03:16:19,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=802560.0, ans=0.125 2024-09-20 03:16:23,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=802560.0, ans=0.0 2024-09-20 03:16:25,837 INFO [train.py:1198] (1/2) Epoch 45, batch 1550, loss[loss=0.2494, ctc_loss=0.1282, cr_loss=0.3914, attn_decoder_loss=0.2541, over 29489.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1098, cr_loss=0.3501, attn_decoder_loss=0.2383, over 5781093.30 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:16:30,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=802600.0, ans=0.125 2024-09-20 03:16:44,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=802640.0, ans=0.2 2024-09-20 03:16:48,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=802640.0, ans=0.125 2024-09-20 03:16:54,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=802680.0, ans=0.1 2024-09-20 03:17:04,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=802680.0, ans=0.1 2024-09-20 03:17:24,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=802760.0, ans=0.125 2024-09-20 03:17:30,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=802760.0, ans=0.125 2024-09-20 03:17:40,715 INFO [train.py:1198] (1/2) Epoch 45, batch 1600, loss[loss=0.2437, ctc_loss=0.1068, cr_loss=0.3416, attn_decoder_loss=0.2513, over 29677.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1099, cr_loss=0.35, attn_decoder_loss=0.2382, over 5765195.57 frames. ], batch size: 85, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:17:42,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=802800.0, ans=0.05 2024-09-20 03:17:45,054 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.517e+01 9.021e+01 9.788e+01 6.298e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-20 03:18:06,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=802840.0, ans=0.125 2024-09-20 03:18:16,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2024-09-20 03:18:29,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=802920.0, ans=0.125 2024-09-20 03:18:58,031 INFO [train.py:1198] (1/2) Epoch 45, batch 1650, loss[loss=0.2388, ctc_loss=0.1124, cr_loss=0.3528, attn_decoder_loss=0.245, over 29707.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1097, cr_loss=0.3494, attn_decoder_loss=0.238, over 5759659.92 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:18:59,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=803000.0, ans=10.0 2024-09-20 03:19:22,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=803040.0, ans=0.0 2024-09-20 03:19:46,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=803120.0, ans=0.125 2024-09-20 03:19:53,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=803120.0, ans=0.0 2024-09-20 03:20:04,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=803160.0, ans=0.0 2024-09-20 03:20:07,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=803160.0, ans=0.125 2024-09-20 03:20:15,552 INFO [train.py:1198] (1/2) Epoch 45, batch 1700, loss[loss=0.2073, ctc_loss=0.08758, cr_loss=0.2981, attn_decoder_loss=0.214, over 29593.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1096, cr_loss=0.3492, attn_decoder_loss=0.2378, over 5781341.52 frames. ], batch size: 69, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:20:19,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-20 03:20:21,493 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.570e+01 9.061e+01 9.508e+01 1.721e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-20 03:20:21,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=803200.0, ans=0.09899494936611666 2024-09-20 03:20:24,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=803200.0, ans=0.0 2024-09-20 03:20:56,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=803280.0, ans=0.125 2024-09-20 03:20:56,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=803280.0, ans=0.0 2024-09-20 03:21:23,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=803360.0, ans=0.125 2024-09-20 03:21:30,947 INFO [train.py:1198] (1/2) Epoch 45, batch 1750, loss[loss=0.2138, ctc_loss=0.1013, cr_loss=0.325, attn_decoder_loss=0.219, over 29336.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1094, cr_loss=0.3487, attn_decoder_loss=0.2375, over 5788872.29 frames. ], batch size: 67, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:21:47,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=803440.0, ans=0.2 2024-09-20 03:21:53,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=803440.0, ans=0.125 2024-09-20 03:22:17,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=803520.0, ans=0.125 2024-09-20 03:22:40,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=803560.0, ans=0.2 2024-09-20 03:22:47,894 INFO [train.py:1198] (1/2) Epoch 45, batch 1800, loss[loss=0.2294, ctc_loss=0.09519, cr_loss=0.3208, attn_decoder_loss=0.2372, over 29677.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1096, cr_loss=0.3493, attn_decoder_loss=0.2377, over 5791657.74 frames. ], batch size: 83, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:22:53,956 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.492e+01 8.891e+01 9.479e+01 1.445e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-20 03:23:15,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=803640.0, ans=0.125 2024-09-20 03:23:25,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=803680.0, ans=0.1 2024-09-20 03:23:26,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=10.0 2024-09-20 03:23:27,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=803680.0, ans=0.0 2024-09-20 03:23:29,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=803680.0, ans=0.125 2024-09-20 03:23:53,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803760.0, ans=0.1 2024-09-20 03:24:03,352 INFO [train.py:1198] (1/2) Epoch 45, batch 1850, loss[loss=0.2321, ctc_loss=0.1047, cr_loss=0.3354, attn_decoder_loss=0.2388, over 29629.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1092, cr_loss=0.3487, attn_decoder_loss=0.2372, over 5797925.44 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:24:05,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.36 vs. limit=10.0 2024-09-20 03:24:06,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=803800.0, ans=0.125 2024-09-20 03:24:08,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=803800.0, ans=0.0 2024-09-20 03:24:10,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-20 03:24:48,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2024-09-20 03:24:51,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=803920.0, ans=0.2 2024-09-20 03:24:55,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=803920.0, ans=0.0 2024-09-20 03:25:17,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=803960.0, ans=0.125 2024-09-20 03:25:20,994 INFO [train.py:1198] (1/2) Epoch 45, batch 1900, loss[loss=0.2357, ctc_loss=0.1107, cr_loss=0.3635, attn_decoder_loss=0.2415, over 29707.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1094, cr_loss=0.3497, attn_decoder_loss=0.2378, over 5804538.58 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:25:27,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.514e+01 9.088e+01 9.657e+01 1.546e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-20 03:25:50,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=804080.0, ans=0.125 2024-09-20 03:25:59,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=804080.0, ans=0.0 2024-09-20 03:25:59,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=804080.0, ans=0.125 2024-09-20 03:26:17,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-09-20 03:26:28,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=804160.0, ans=0.0 2024-09-20 03:26:38,939 INFO [train.py:1198] (1/2) Epoch 45, batch 1950, loss[loss=0.2351, ctc_loss=0.1191, cr_loss=0.3676, attn_decoder_loss=0.2398, over 29441.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1098, cr_loss=0.3507, attn_decoder_loss=0.2386, over 5818624.97 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:26:45,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=804200.0, ans=0.125 2024-09-20 03:26:57,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=804240.0, ans=0.2 2024-09-20 03:27:25,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=804320.0, ans=0.0 2024-09-20 03:27:28,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=804320.0, ans=0.125 2024-09-20 03:27:54,316 INFO [train.py:1198] (1/2) Epoch 45, batch 2000, loss[loss=0.2068, ctc_loss=0.08535, cr_loss=0.3096, attn_decoder_loss=0.2135, over 29346.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1104, cr_loss=0.3516, attn_decoder_loss=0.2392, over 5797248.72 frames. ], batch size: 67, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:28:00,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.885e+01 8.761e+01 9.181e+01 9.636e+01 2.089e+02, threshold=1.836e+02, percent-clipped=2.0 2024-09-20 03:28:00,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=804400.0, ans=0.0 2024-09-20 03:28:08,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=804400.0, ans=0.025 2024-09-20 03:28:46,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=12.0 2024-09-20 03:28:57,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=804560.0, ans=0.2 2024-09-20 03:29:00,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=804560.0, ans=0.09899494936611666 2024-09-20 03:29:11,950 INFO [train.py:1198] (1/2) Epoch 45, batch 2050, loss[loss=0.2098, ctc_loss=0.09532, cr_loss=0.3067, attn_decoder_loss=0.2157, over 29436.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1099, cr_loss=0.3504, attn_decoder_loss=0.2381, over 5789158.59 frames. ], batch size: 70, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:29:25,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=804640.0, ans=0.2 2024-09-20 03:29:46,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=804680.0, ans=0.125 2024-09-20 03:29:57,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=804720.0, ans=0.125 2024-09-20 03:30:11,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=804720.0, ans=0.125 2024-09-20 03:30:29,645 INFO [train.py:1198] (1/2) Epoch 45, batch 2100, loss[loss=0.2325, ctc_loss=0.1123, cr_loss=0.3389, attn_decoder_loss=0.2383, over 29756.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1095, cr_loss=0.3492, attn_decoder_loss=0.2375, over 5800711.82 frames. ], batch size: 81, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:30:32,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-09-20 03:30:35,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.482e+01 9.039e+01 9.529e+01 1.230e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-20 03:30:35,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=804800.0, ans=0.125 2024-09-20 03:31:01,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-09-20 03:31:17,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804920.0, ans=0.1 2024-09-20 03:31:35,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=804960.0, ans=0.125 2024-09-20 03:31:43,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2024-09-20 03:31:44,355 INFO [train.py:1198] (1/2) Epoch 45, batch 2150, loss[loss=0.237, ctc_loss=0.1173, cr_loss=0.3811, attn_decoder_loss=0.2419, over 29430.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1089, cr_loss=0.3478, attn_decoder_loss=0.2369, over 5815436.60 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:32:15,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=805080.0, ans=0.125 2024-09-20 03:32:18,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=805080.0, ans=0.0 2024-09-20 03:32:21,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=805080.0, ans=0.0 2024-09-20 03:32:23,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=805080.0, ans=0.0 2024-09-20 03:32:28,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=805080.0, ans=0.05 2024-09-20 03:32:56,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=805160.0, ans=0.125 2024-09-20 03:32:58,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=12.0 2024-09-20 03:33:01,958 INFO [train.py:1198] (1/2) Epoch 45, batch 2200, loss[loss=0.2429, ctc_loss=0.1181, cr_loss=0.3593, attn_decoder_loss=0.2487, over 29633.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.109, cr_loss=0.3477, attn_decoder_loss=0.2371, over 5812639.04 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:33:05,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-09-20 03:33:09,444 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.623e+01 8.976e+01 9.604e+01 3.634e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-20 03:33:11,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=805200.0, ans=0.0 2024-09-20 03:33:32,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=805280.0, ans=0.0 2024-09-20 03:33:36,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-09-20 03:34:19,512 INFO [train.py:1198] (1/2) Epoch 45, batch 2250, loss[loss=0.2415, ctc_loss=0.1092, cr_loss=0.3646, attn_decoder_loss=0.2481, over 29703.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1087, cr_loss=0.3472, attn_decoder_loss=0.237, over 5811169.23 frames. ], batch size: 82, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:35:07,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=805520.0, ans=0.0 2024-09-20 03:35:25,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=805560.0, ans=0.125 2024-09-20 03:35:34,906 INFO [train.py:1198] (1/2) Epoch 45, batch 2300, loss[loss=0.2043, ctc_loss=0.08986, cr_loss=0.3033, attn_decoder_loss=0.2103, over 29297.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1082, cr_loss=0.3465, attn_decoder_loss=0.2361, over 5799991.29 frames. ], batch size: 71, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:35:42,373 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.561e+01 9.011e+01 9.517e+01 1.725e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-20 03:35:47,151 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:36:00,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=805640.0, ans=0.125 2024-09-20 03:36:15,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=805680.0, ans=0.125 2024-09-20 03:36:27,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=805720.0, ans=0.0 2024-09-20 03:36:41,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=805760.0, ans=0.125 2024-09-20 03:36:52,629 INFO [train.py:1198] (1/2) Epoch 45, batch 2350, loss[loss=0.2398, ctc_loss=0.1141, cr_loss=0.3501, attn_decoder_loss=0.246, over 29707.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1086, cr_loss=0.3472, attn_decoder_loss=0.2364, over 5805031.49 frames. ], batch size: 83, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:37:00,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=805800.0, ans=0.125 2024-09-20 03:37:36,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=805920.0, ans=0.025 2024-09-20 03:37:45,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=805920.0, ans=0.2 2024-09-20 03:37:48,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=805920.0, ans=0.1 2024-09-20 03:37:52,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=805920.0, ans=0.125 2024-09-20 03:38:01,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.72 vs. limit=15.0 2024-09-20 03:38:10,108 INFO [train.py:1198] (1/2) Epoch 45, batch 2400, loss[loss=0.2228, ctc_loss=0.1071, cr_loss=0.3634, attn_decoder_loss=0.2276, over 29548.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1089, cr_loss=0.3477, attn_decoder_loss=0.2368, over 5808677.84 frames. ], batch size: 76, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:38:17,562 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.673e+01 8.961e+01 9.495e+01 1.491e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-20 03:38:25,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=806040.0, ans=22.5 2024-09-20 03:38:27,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=806040.0, ans=0.025 2024-09-20 03:38:34,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=806040.0, ans=0.125 2024-09-20 03:38:46,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806080.0, ans=0.1 2024-09-20 03:39:03,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=806120.0, ans=0.1 2024-09-20 03:39:04,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=806120.0, ans=0.025 2024-09-20 03:39:09,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-09-20 03:39:15,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=806160.0, ans=0.125 2024-09-20 03:39:24,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=806200.0, ans=0.125 2024-09-20 03:39:25,797 INFO [train.py:1198] (1/2) Epoch 45, batch 2450, loss[loss=0.2394, ctc_loss=0.1183, cr_loss=0.3827, attn_decoder_loss=0.2443, over 29711.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1094, cr_loss=0.3487, attn_decoder_loss=0.2375, over 5785277.37 frames. ], batch size: 82, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:39:43,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=806240.0, ans=0.125 2024-09-20 03:39:45,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=806240.0, ans=0.04949747468305833 2024-09-20 03:39:51,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=806240.0, ans=0.2 2024-09-20 03:40:10,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=806280.0, ans=0.125 2024-09-20 03:40:37,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=806360.0, ans=0.125 2024-09-20 03:40:43,786 INFO [train.py:1198] (1/2) Epoch 45, batch 2500, loss[loss=0.2447, ctc_loss=0.1145, cr_loss=0.353, attn_decoder_loss=0.2513, over 29641.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1096, cr_loss=0.3488, attn_decoder_loss=0.2376, over 5795177.44 frames. ], batch size: 86, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:40:51,307 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.641e+01 9.220e+01 9.804e+01 1.997e+02, threshold=1.844e+02, percent-clipped=2.0 2024-09-20 03:40:54,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=806400.0, ans=0.125 2024-09-20 03:41:01,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2024-09-20 03:41:08,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=806440.0, ans=0.0 2024-09-20 03:41:12,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=806480.0, ans=0.1 2024-09-20 03:41:19,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=806480.0, ans=0.125 2024-09-20 03:41:28,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=15.0 2024-09-20 03:41:58,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=806560.0, ans=0.125 2024-09-20 03:42:01,677 INFO [train.py:1198] (1/2) Epoch 45, batch 2550, loss[loss=0.2062, ctc_loss=0.09508, cr_loss=0.3254, attn_decoder_loss=0.2113, over 29403.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1094, cr_loss=0.3488, attn_decoder_loss=0.2377, over 5799679.42 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:42:15,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.99 vs. limit=22.5 2024-09-20 03:42:22,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=806640.0, ans=0.125 2024-09-20 03:42:44,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=806680.0, ans=0.125 2024-09-20 03:42:44,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-09-20 03:43:17,314 INFO [train.py:1198] (1/2) Epoch 45, batch 2600, loss[loss=0.2235, ctc_loss=0.09919, cr_loss=0.3318, attn_decoder_loss=0.23, over 29450.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1094, cr_loss=0.3485, attn_decoder_loss=0.2379, over 5795539.34 frames. ], batch size: 78, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:43:23,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2024-09-20 03:43:25,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=806800.0, ans=0.05 2024-09-20 03:43:26,234 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.746e+01 8.807e+01 9.340e+01 9.891e+01 1.748e+02, threshold=1.868e+02, percent-clipped=0.0 2024-09-20 03:43:50,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=806880.0, ans=0.125 2024-09-20 03:43:54,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=806880.0, ans=0.1 2024-09-20 03:44:00,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=806880.0, ans=0.0 2024-09-20 03:44:03,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=806920.0, ans=0.125 2024-09-20 03:44:03,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-09-20 03:44:09,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=806920.0, ans=0.0 2024-09-20 03:44:16,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=806920.0, ans=0.125 2024-09-20 03:44:27,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=806960.0, ans=0.125 2024-09-20 03:44:30,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=806960.0, ans=0.125 2024-09-20 03:44:34,362 INFO [train.py:1198] (1/2) Epoch 45, batch 2650, loss[loss=0.2477, ctc_loss=0.1226, cr_loss=0.377, attn_decoder_loss=0.2532, over 29288.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1095, cr_loss=0.3488, attn_decoder_loss=0.2381, over 5800867.75 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:44:45,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=807000.0, ans=0.0 2024-09-20 03:44:51,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=807040.0, ans=0.1 2024-09-20 03:45:03,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=807080.0, ans=0.07 2024-09-20 03:45:03,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.54 vs. limit=15.0 2024-09-20 03:45:06,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2024-09-20 03:45:27,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=807120.0, ans=0.125 2024-09-20 03:45:32,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=807120.0, ans=0.07 2024-09-20 03:45:42,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=807160.0, ans=0.0 2024-09-20 03:45:43,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.43 vs. limit=22.5 2024-09-20 03:45:50,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.64 vs. limit=15.0 2024-09-20 03:45:51,977 INFO [train.py:1198] (1/2) Epoch 45, batch 2700, loss[loss=0.2438, ctc_loss=0.1141, cr_loss=0.3688, attn_decoder_loss=0.25, over 29536.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.11, cr_loss=0.3502, attn_decoder_loss=0.2384, over 5796404.86 frames. ], batch size: 87, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:46:01,050 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.586e+01 9.065e+01 9.630e+01 2.449e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-20 03:46:10,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.97 vs. limit=15.0 2024-09-20 03:47:03,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=807360.0, ans=0.125 2024-09-20 03:47:07,353 INFO [train.py:1198] (1/2) Epoch 45, batch 2750, loss[loss=0.2268, ctc_loss=0.1086, cr_loss=0.3438, attn_decoder_loss=0.2322, over 29529.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1093, cr_loss=0.3484, attn_decoder_loss=0.2373, over 5794808.14 frames. ], batch size: 75, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:47:09,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=807400.0, ans=0.125 2024-09-20 03:47:19,814 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:47:21,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807440.0, ans=0.1 2024-09-20 03:47:44,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=807480.0, ans=0.125 2024-09-20 03:47:57,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=807520.0, ans=0.125 2024-09-20 03:48:05,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=807520.0, ans=0.0 2024-09-20 03:48:11,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.35 vs. limit=15.0 2024-09-20 03:48:22,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=807560.0, ans=0.125 2024-09-20 03:48:24,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=807600.0, ans=0.125 2024-09-20 03:48:25,188 INFO [train.py:1198] (1/2) Epoch 45, batch 2800, loss[loss=0.2557, ctc_loss=0.144, cr_loss=0.3752, attn_decoder_loss=0.2597, over 20324.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1098, cr_loss=0.3491, attn_decoder_loss=0.2375, over 5776595.44 frames. ], batch size: 210, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:48:28,531 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:48:28,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=807600.0, ans=0.125 2024-09-20 03:48:34,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.581e+01 9.021e+01 9.905e+01 2.529e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-20 03:48:48,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=807640.0, ans=0.125 2024-09-20 03:48:59,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.61 vs. limit=15.0 2024-09-20 03:49:15,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807720.0, ans=0.1 2024-09-20 03:49:24,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=807720.0, ans=0.125 2024-09-20 03:49:27,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=22.5 2024-09-20 03:49:36,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=807760.0, ans=0.125 2024-09-20 03:49:39,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=807760.0, ans=0.0 2024-09-20 03:49:42,381 INFO [train.py:1198] (1/2) Epoch 45, batch 2850, loss[loss=0.2222, ctc_loss=0.107, cr_loss=0.343, attn_decoder_loss=0.2273, over 29501.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1103, cr_loss=0.3506, attn_decoder_loss=0.2379, over 5759826.70 frames. ], batch size: 77, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:50:03,859 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:50:29,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=807920.0, ans=0.125 2024-09-20 03:50:46,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=807960.0, ans=0.0 2024-09-20 03:50:58,366 INFO [train.py:1198] (1/2) Epoch 45, batch 2900, loss[loss=0.2316, ctc_loss=0.1098, cr_loss=0.3463, attn_decoder_loss=0.2374, over 29431.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1109, cr_loss=0.352, attn_decoder_loss=0.2391, over 5786217.84 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:51:02,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=808000.0, ans=0.125 2024-09-20 03:51:07,245 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.821e+01 8.668e+01 9.103e+01 9.766e+01 1.431e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-20 03:51:13,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808040.0, ans=0.1 2024-09-20 03:51:27,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=808080.0, ans=0.125 2024-09-20 03:51:34,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=808080.0, ans=0.125 2024-09-20 03:52:15,546 INFO [train.py:1198] (1/2) Epoch 45, batch 2950, loss[loss=0.2313, ctc_loss=0.1192, cr_loss=0.3739, attn_decoder_loss=0.2355, over 29513.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1101, cr_loss=0.35, attn_decoder_loss=0.2378, over 5781169.85 frames. ], batch size: 75, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:52:26,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=808200.0, ans=0.0 2024-09-20 03:52:48,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.59 vs. limit=15.0 2024-09-20 03:52:53,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=808280.0, ans=0.2 2024-09-20 03:53:18,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=808360.0, ans=0.125 2024-09-20 03:53:21,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808360.0, ans=0.1 2024-09-20 03:53:33,024 INFO [train.py:1198] (1/2) Epoch 45, batch 3000, loss[loss=0.2386, ctc_loss=0.1119, cr_loss=0.3542, attn_decoder_loss=0.2448, over 29768.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1101, cr_loss=0.3503, attn_decoder_loss=0.2377, over 5782223.61 frames. ], batch size: 81, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:53:33,024 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 03:53:51,273 INFO [train.py:1230] (1/2) Epoch 45, validation: loss=0.213, ctc_loss=0.0366, cr_loss=6.956e-15, attn_decoder_loss=0.2326, over 944034.00 frames. 2024-09-20 03:53:51,273 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 03:54:00,589 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.498e+01 9.089e+01 9.590e+01 3.857e+02, threshold=1.818e+02, percent-clipped=2.0 2024-09-20 03:54:10,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.20 vs. limit=15.0 2024-09-20 03:54:37,053 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:54:40,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=808520.0, ans=0.5 2024-09-20 03:54:59,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=808560.0, ans=0.0 2024-09-20 03:55:01,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=12.0 2024-09-20 03:55:06,910 INFO [train.py:1198] (1/2) Epoch 45, batch 3050, loss[loss=0.2075, ctc_loss=0.08796, cr_loss=0.2978, attn_decoder_loss=0.2142, over 29531.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1106, cr_loss=0.3514, attn_decoder_loss=0.2385, over 5777600.86 frames. ], batch size: 76, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:55:33,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=808640.0, ans=0.125 2024-09-20 03:55:42,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=808680.0, ans=0.05 2024-09-20 03:55:50,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.42 vs. limit=12.0 2024-09-20 03:56:24,620 INFO [train.py:1198] (1/2) Epoch 45, batch 3100, loss[loss=0.2508, ctc_loss=0.1277, cr_loss=0.3769, attn_decoder_loss=0.2561, over 29246.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1104, cr_loss=0.3505, attn_decoder_loss=0.2383, over 5777931.01 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:56:26,561 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:56:32,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=808800.0, ans=0.07 2024-09-20 03:56:35,118 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.685e+01 9.291e+01 9.894e+01 1.991e+02, threshold=1.858e+02, percent-clipped=1.0 2024-09-20 03:56:43,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=808840.0, ans=0.0 2024-09-20 03:56:44,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=808840.0, ans=0.125 2024-09-20 03:56:51,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=808840.0, ans=0.125 2024-09-20 03:57:10,609 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:57:16,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=808920.0, ans=0.025 2024-09-20 03:57:21,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=808920.0, ans=0.125 2024-09-20 03:57:42,013 INFO [train.py:1198] (1/2) Epoch 45, batch 3150, loss[loss=0.2515, ctc_loss=0.1263, cr_loss=0.3854, attn_decoder_loss=0.2568, over 28860.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1102, cr_loss=0.35, attn_decoder_loss=0.2382, over 5784215.17 frames. ], batch size: 104, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:57:46,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=809000.0, ans=0.125 2024-09-20 03:57:57,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=809040.0, ans=0.125 2024-09-20 03:58:10,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=809080.0, ans=0.125 2024-09-20 03:58:12,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=809080.0, ans=0.0 2024-09-20 03:58:21,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=809080.0, ans=0.0 2024-09-20 03:58:39,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-09-20 03:58:52,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=809160.0, ans=0.125 2024-09-20 03:58:56,889 INFO [train.py:1198] (1/2) Epoch 45, batch 3200, loss[loss=0.2219, ctc_loss=0.09486, cr_loss=0.3035, attn_decoder_loss=0.2293, over 29409.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1096, cr_loss=0.3494, attn_decoder_loss=0.2375, over 5793954.25 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:59:07,501 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.632e+01 9.218e+01 9.587e+01 1.920e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-20 03:59:10,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809240.0, ans=0.1 2024-09-20 03:59:28,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=809280.0, ans=0.125 2024-09-20 03:59:32,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=809280.0, ans=0.0 2024-09-20 03:59:42,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-20 03:59:53,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=809320.0, ans=0.125 2024-09-20 03:59:53,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=809320.0, ans=0.05 2024-09-20 04:00:14,530 INFO [train.py:1198] (1/2) Epoch 45, batch 3250, loss[loss=0.2445, ctc_loss=0.1182, cr_loss=0.3752, attn_decoder_loss=0.2502, over 29729.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1098, cr_loss=0.3499, attn_decoder_loss=0.238, over 5800359.47 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:00:16,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.50 vs. limit=22.5 2024-09-20 04:00:28,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=809440.0, ans=0.2 2024-09-20 04:00:32,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2024-09-20 04:01:16,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=809560.0, ans=0.125 2024-09-20 04:01:22,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=809560.0, ans=0.0 2024-09-20 04:01:23,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=809560.0, ans=0.0 2024-09-20 04:01:32,411 INFO [train.py:1198] (1/2) Epoch 45, batch 3300, loss[loss=0.232, ctc_loss=0.1011, cr_loss=0.3186, attn_decoder_loss=0.2394, over 28205.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1089, cr_loss=0.3478, attn_decoder_loss=0.2367, over 5797213.36 frames. ], batch size: 111, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:01:33,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-09-20 04:01:37,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=809600.0, ans=0.125 2024-09-20 04:01:42,958 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.585e+01 9.187e+01 9.677e+01 1.727e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-20 04:01:55,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=809640.0, ans=0.125 2024-09-20 04:02:14,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809680.0, ans=0.1 2024-09-20 04:02:17,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=809720.0, ans=0.125 2024-09-20 04:02:26,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=809720.0, ans=0.0 2024-09-20 04:02:32,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=809760.0, ans=0.0 2024-09-20 04:02:40,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=809760.0, ans=0.125 2024-09-20 04:02:41,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=809760.0, ans=0.0 2024-09-20 04:02:47,606 INFO [train.py:1198] (1/2) Epoch 45, batch 3350, loss[loss=0.2449, ctc_loss=0.1133, cr_loss=0.3578, attn_decoder_loss=0.2516, over 28856.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1096, cr_loss=0.3491, attn_decoder_loss=0.2378, over 5773888.64 frames. ], batch size: 104, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:03:03,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=809840.0, ans=0.2 2024-09-20 04:03:23,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=809880.0, ans=0.0 2024-09-20 04:03:31,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=809880.0, ans=0.125 2024-09-20 04:03:50,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=809960.0, ans=0.025 2024-09-20 04:04:05,668 INFO [train.py:1198] (1/2) Epoch 45, batch 3400, loss[loss=0.2003, ctc_loss=0.08873, cr_loss=0.2975, attn_decoder_loss=0.2061, over 29320.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.11, cr_loss=0.3497, attn_decoder_loss=0.2378, over 5765688.48 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:04:09,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=810000.0, ans=0.05 2024-09-20 04:04:10,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=810000.0, ans=0.2 2024-09-20 04:04:18,541 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.782e+01 9.254e+01 9.954e+01 2.335e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-20 04:05:18,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=810160.0, ans=0.125 2024-09-20 04:05:23,098 INFO [train.py:1198] (1/2) Epoch 45, batch 3450, loss[loss=0.2448, ctc_loss=0.1138, cr_loss=0.3572, attn_decoder_loss=0.2514, over 28316.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1102, cr_loss=0.3501, attn_decoder_loss=0.2383, over 5773568.27 frames. ], batch size: 111, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:05:35,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=810200.0, ans=0.0 2024-09-20 04:05:40,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=810240.0, ans=0.0 2024-09-20 04:05:50,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=810240.0, ans=0.2 2024-09-20 04:06:05,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=810280.0, ans=0.125 2024-09-20 04:06:16,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.00 vs. limit=12.0 2024-09-20 04:06:29,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=810360.0, ans=0.125 2024-09-20 04:06:33,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.45 vs. limit=10.0 2024-09-20 04:06:38,592 INFO [train.py:1198] (1/2) Epoch 45, batch 3500, loss[loss=0.2116, ctc_loss=0.09244, cr_loss=0.3081, attn_decoder_loss=0.2179, over 29358.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1101, cr_loss=0.3499, attn_decoder_loss=0.2378, over 5776147.48 frames. ], batch size: 71, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:06:41,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=810400.0, ans=0.0 2024-09-20 04:06:48,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=810400.0, ans=0.125 2024-09-20 04:06:49,190 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.777e+01 9.274e+01 9.867e+01 1.400e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-20 04:07:24,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.73 vs. limit=6.0 2024-09-20 04:07:37,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=810520.0, ans=0.125 2024-09-20 04:07:46,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=810560.0, ans=0.2 2024-09-20 04:07:52,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=810560.0, ans=0.2 2024-09-20 04:07:53,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=810600.0, ans=0.1 2024-09-20 04:07:55,108 INFO [train.py:1198] (1/2) Epoch 45, batch 3550, loss[loss=0.2415, ctc_loss=0.1084, cr_loss=0.3534, attn_decoder_loss=0.2484, over 29698.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1097, cr_loss=0.3495, attn_decoder_loss=0.2376, over 5781977.20 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:08:21,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=810640.0, ans=0.125 2024-09-20 04:08:35,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=810680.0, ans=0.125 2024-09-20 04:08:39,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.40 vs. limit=22.5 2024-09-20 04:08:48,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=810720.0, ans=0.0 2024-09-20 04:08:49,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=810720.0, ans=0.0 2024-09-20 04:09:05,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=810760.0, ans=0.0 2024-09-20 04:09:10,801 INFO [train.py:1198] (1/2) Epoch 45, batch 3600, loss[loss=0.2144, ctc_loss=0.09572, cr_loss=0.3135, attn_decoder_loss=0.2206, over 29514.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1102, cr_loss=0.3506, attn_decoder_loss=0.2381, over 5790856.50 frames. ], batch size: 77, lr: 2.44e-03, grad_scale: 32.0 2024-09-20 04:09:11,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=810800.0, ans=0.125 2024-09-20 04:09:22,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.599e+01 9.272e+01 9.719e+01 1.680e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-20 04:09:35,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-09-20 04:09:38,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=12.0 2024-09-20 04:09:41,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=12.0 2024-09-20 04:09:42,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=810880.0, ans=0.0 2024-09-20 04:09:55,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=810920.0, ans=0.2 2024-09-20 04:10:02,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=810920.0, ans=0.0 2024-09-20 04:10:18,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.62 vs. limit=12.0 2024-09-20 04:10:24,880 INFO [train.py:1198] (1/2) Epoch 45, batch 3650, loss[loss=0.2504, ctc_loss=0.1226, cr_loss=0.3719, attn_decoder_loss=0.2563, over 29506.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1095, cr_loss=0.3484, attn_decoder_loss=0.2375, over 5793558.77 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:10:32,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=811000.0, ans=0.125 2024-09-20 04:10:34,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=811000.0, ans=0.2 2024-09-20 04:10:35,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=811000.0, ans=0.125 2024-09-20 04:10:50,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=811040.0, ans=0.125 2024-09-20 04:11:05,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=811080.0, ans=0.0 2024-09-20 04:11:19,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.82 vs. limit=22.5 2024-09-20 04:11:19,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=12.0 2024-09-20 04:11:27,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=811160.0, ans=0.125 2024-09-20 04:11:38,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=811200.0, ans=0.125 2024-09-20 04:11:39,703 INFO [train.py:1198] (1/2) Epoch 45, batch 3700, loss[loss=0.2309, ctc_loss=0.1026, cr_loss=0.3342, attn_decoder_loss=0.2377, over 29726.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1093, cr_loss=0.3478, attn_decoder_loss=0.2374, over 5804690.12 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:11:51,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.848e+01 9.366e+01 9.775e+01 1.224e+02, threshold=1.873e+02, percent-clipped=0.0 2024-09-20 04:11:55,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=811240.0, ans=0.125 2024-09-20 04:12:30,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=811320.0, ans=0.125 2024-09-20 04:12:37,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=811360.0, ans=0.125 2024-09-20 04:12:51,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=811360.0, ans=0.0 2024-09-20 04:12:54,010 INFO [train.py:1198] (1/2) Epoch 45, batch 3750, loss[loss=0.2038, ctc_loss=0.09161, cr_loss=0.3111, attn_decoder_loss=0.2093, over 29327.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1093, cr_loss=0.3483, attn_decoder_loss=0.2373, over 5808578.46 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:13:00,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=811400.0, ans=0.0 2024-09-20 04:13:03,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-20 04:13:38,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=22.5 2024-09-20 04:13:52,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811520.0, ans=0.1 2024-09-20 04:13:52,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=811520.0, ans=0.0 2024-09-20 04:13:53,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=811560.0, ans=0.125 2024-09-20 04:14:09,793 INFO [train.py:1198] (1/2) Epoch 45, batch 3800, loss[loss=0.2415, ctc_loss=0.1158, cr_loss=0.362, attn_decoder_loss=0.2475, over 29613.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1089, cr_loss=0.3473, attn_decoder_loss=0.2366, over 5798594.34 frames. ], batch size: 86, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:14:21,616 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.556e+01 8.957e+01 9.574e+01 2.203e+02, threshold=1.791e+02, percent-clipped=1.0 2024-09-20 04:14:24,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=811640.0, ans=0.0 2024-09-20 04:14:47,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=811680.0, ans=0.0 2024-09-20 04:15:21,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=811760.0, ans=0.0 2024-09-20 04:15:24,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=811800.0, ans=0.2 2024-09-20 04:15:25,791 INFO [train.py:1198] (1/2) Epoch 45, batch 3850, loss[loss=0.2447, ctc_loss=0.1134, cr_loss=0.3669, attn_decoder_loss=0.2512, over 29279.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1091, cr_loss=0.3475, attn_decoder_loss=0.2366, over 5813659.03 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:15:49,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=811840.0, ans=0.0 2024-09-20 04:15:51,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811840.0, ans=0.1 2024-09-20 04:15:56,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.84 vs. limit=10.0 2024-09-20 04:15:59,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=811880.0, ans=0.2 2024-09-20 04:16:05,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=811880.0, ans=0.1 2024-09-20 04:16:21,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=12.0 2024-09-20 04:16:27,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2024-09-20 04:16:34,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=811960.0, ans=0.0 2024-09-20 04:16:40,293 INFO [train.py:1198] (1/2) Epoch 45, batch 3900, loss[loss=0.2389, ctc_loss=0.1144, cr_loss=0.3561, attn_decoder_loss=0.2448, over 29624.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1096, cr_loss=0.3488, attn_decoder_loss=0.2372, over 5817303.41 frames. ], batch size: 86, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:16:52,118 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.943e+01 8.765e+01 9.119e+01 9.578e+01 1.365e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-20 04:16:55,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=812040.0, ans=0.0 2024-09-20 04:17:01,617 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-20 04:17:10,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=812080.0, ans=0.125 2024-09-20 04:17:16,059 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:17:51,427 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:17:54,016 INFO [train.py:1198] (1/2) Epoch 45, batch 3950, loss[loss=0.2513, ctc_loss=0.1261, cr_loss=0.3842, attn_decoder_loss=0.2567, over 29522.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1094, cr_loss=0.348, attn_decoder_loss=0.2372, over 5836640.05 frames. ], batch size: 97, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:17:57,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=812200.0, ans=0.1 2024-09-20 04:18:28,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=812280.0, ans=0.125 2024-09-20 04:18:39,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.15 vs. limit=15.0 2024-09-20 04:18:46,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=22.5 2024-09-20 04:19:08,744 INFO [train.py:1198] (1/2) Epoch 45, batch 4000, loss[loss=0.2116, ctc_loss=0.08784, cr_loss=0.3023, attn_decoder_loss=0.2186, over 29549.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1091, cr_loss=0.3473, attn_decoder_loss=0.237, over 5814294.04 frames. ], batch size: 74, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:19:11,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=812400.0, ans=0.025 2024-09-20 04:19:21,815 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.441e+01 9.012e+01 9.623e+01 3.417e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 04:19:58,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=812520.0, ans=0.0 2024-09-20 04:20:01,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-20 04:20:08,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2024-09-20 04:20:12,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=812560.0, ans=0.125 2024-09-20 04:20:23,550 INFO [train.py:1198] (1/2) Epoch 45, batch 4050, loss[loss=0.2432, ctc_loss=0.1228, cr_loss=0.3335, attn_decoder_loss=0.2492, over 19657.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1085, cr_loss=0.3458, attn_decoder_loss=0.2366, over 5797519.82 frames. ], batch size: 209, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:20:44,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=22.5 2024-09-20 04:20:51,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=812680.0, ans=0.125 2024-09-20 04:21:22,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=812760.0, ans=0.5 2024-09-20 04:21:23,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=812760.0, ans=0.0 2024-09-20 04:21:24,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=812760.0, ans=0.125 2024-09-20 04:21:29,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=812760.0, ans=0.125 2024-09-20 04:21:33,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=812760.0, ans=0.1 2024-09-20 04:21:36,998 INFO [train.py:1198] (1/2) Epoch 45, batch 4100, loss[loss=0.243, ctc_loss=0.122, cr_loss=0.3827, attn_decoder_loss=0.2479, over 29530.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1088, cr_loss=0.3464, attn_decoder_loss=0.2368, over 5792426.39 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:21:38,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=812800.0, ans=10.0 2024-09-20 04:21:50,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=812840.0, ans=0.05 2024-09-20 04:21:51,599 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.673e+01 9.305e+01 9.853e+01 2.008e+02, threshold=1.861e+02, percent-clipped=1.0 2024-09-20 04:22:49,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=813000.0, ans=0.125 2024-09-20 04:22:50,427 INFO [train.py:1198] (1/2) Epoch 45, batch 4150, loss[loss=0.2229, ctc_loss=0.1092, cr_loss=0.3547, attn_decoder_loss=0.2277, over 29506.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1089, cr_loss=0.3467, attn_decoder_loss=0.2368, over 5797518.59 frames. ], batch size: 77, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:23:03,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=813040.0, ans=0.2 2024-09-20 04:23:04,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=813040.0, ans=0.125 2024-09-20 04:23:04,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2024-09-20 04:23:15,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=813040.0, ans=0.035 2024-09-20 04:23:16,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=813040.0, ans=0.0 2024-09-20 04:23:19,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=813080.0, ans=0.125 2024-09-20 04:23:43,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=813120.0, ans=0.125 2024-09-20 04:23:46,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=813120.0, ans=0.125 2024-09-20 04:23:55,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=813160.0, ans=0.0 2024-09-20 04:23:56,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=813160.0, ans=0.125 2024-09-20 04:24:06,129 INFO [train.py:1198] (1/2) Epoch 45, batch 4200, loss[loss=0.2519, ctc_loss=0.1327, cr_loss=0.3972, attn_decoder_loss=0.2564, over 29481.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1088, cr_loss=0.3469, attn_decoder_loss=0.2369, over 5799179.47 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:24:20,924 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.639e+01 8.983e+01 9.636e+01 1.465e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-20 04:24:29,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=813240.0, ans=0.125 2024-09-20 04:24:38,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=813280.0, ans=0.125 2024-09-20 04:24:40,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=813280.0, ans=0.0 2024-09-20 04:24:45,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2024-09-20 04:25:19,315 INFO [train.py:1198] (1/2) Epoch 45, batch 4250, loss[loss=0.2106, ctc_loss=0.08934, cr_loss=0.2951, attn_decoder_loss=0.2175, over 29489.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1087, cr_loss=0.3466, attn_decoder_loss=0.2372, over 5803836.92 frames. ], batch size: 74, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:25:26,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=813400.0, ans=0.0 2024-09-20 04:25:38,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=813440.0, ans=0.0 2024-09-20 04:25:43,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=22.5 2024-09-20 04:25:44,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=813440.0, ans=0.125 2024-09-20 04:26:09,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=813520.0, ans=0.0 2024-09-20 04:26:13,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=813520.0, ans=0.0 2024-09-20 04:26:15,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=813520.0, ans=22.5 2024-09-20 04:26:16,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=813560.0, ans=0.125 2024-09-20 04:26:17,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=813560.0, ans=0.125 2024-09-20 04:26:23,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813560.0, ans=0.1 2024-09-20 04:26:29,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=813560.0, ans=0.125 2024-09-20 04:26:32,848 INFO [train.py:1198] (1/2) Epoch 45, batch 4300, loss[loss=0.2373, ctc_loss=0.1143, cr_loss=0.365, attn_decoder_loss=0.2428, over 29517.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1087, cr_loss=0.3465, attn_decoder_loss=0.2375, over 5793035.93 frames. ], batch size: 87, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:26:43,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=813600.0, ans=0.125 2024-09-20 04:26:47,734 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.880e+01 9.464e+01 1.001e+02 2.468e+02, threshold=1.893e+02, percent-clipped=1.0 2024-09-20 04:26:59,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2024-09-20 04:27:32,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=813760.0, ans=0.125 2024-09-20 04:27:33,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=813760.0, ans=0.125 2024-09-20 04:27:36,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=813760.0, ans=0.125 2024-09-20 04:27:38,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=813760.0, ans=15.0 2024-09-20 04:27:43,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-20 04:27:48,499 INFO [train.py:1198] (1/2) Epoch 45, batch 4350, loss[loss=0.2502, ctc_loss=0.1201, cr_loss=0.3854, attn_decoder_loss=0.2561, over 29464.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1115, cr_loss=0.3531, attn_decoder_loss=0.2408, over 5795885.55 frames. ], batch size: 97, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:27:50,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=22.5 2024-09-20 04:27:59,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-09-20 04:28:14,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2024-09-20 04:28:15,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=813840.0, ans=0.125 2024-09-20 04:28:16,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813880.0, ans=0.1 2024-09-20 04:29:01,439 INFO [train.py:1198] (1/2) Epoch 45, batch 4400, loss[loss=0.2364, ctc_loss=0.1197, cr_loss=0.3589, attn_decoder_loss=0.2414, over 27583.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1127, cr_loss=0.3562, attn_decoder_loss=0.2427, over 5765878.65 frames. ], batch size: 125, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:29:04,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=814000.0, ans=0.0 2024-09-20 04:29:10,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=814000.0, ans=0.0 2024-09-20 04:29:15,734 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.358e+01 9.080e+01 9.421e+01 9.945e+01 1.972e+02, threshold=1.884e+02, percent-clipped=1.0 2024-09-20 04:29:15,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=814040.0, ans=0.0 2024-09-20 04:29:18,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=814040.0, ans=0.0 2024-09-20 04:29:26,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814040.0, ans=0.1 2024-09-20 04:29:29,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814080.0, ans=0.1 2024-09-20 04:29:51,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-20 04:29:52,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=814120.0, ans=0.125 2024-09-20 04:30:02,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=814160.0, ans=0.125 2024-09-20 04:30:05,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=814160.0, ans=0.1 2024-09-20 04:30:09,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=12.17 vs. limit=15.0 2024-09-20 04:30:15,977 INFO [train.py:1198] (1/2) Epoch 45, batch 4450, loss[loss=0.2487, ctc_loss=0.1236, cr_loss=0.3703, attn_decoder_loss=0.2544, over 20339.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1158, cr_loss=0.361, attn_decoder_loss=0.2444, over 5570164.97 frames. ], batch size: 209, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:30:47,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.75 vs. limit=5.0 2024-09-20 04:30:52,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814280.0, ans=0.1 2024-09-20 04:31:01,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=814320.0, ans=0.1 2024-09-20 04:31:06,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=814320.0, ans=0.025 2024-09-20 04:31:06,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=22.5 2024-09-20 04:31:14,150 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:31:16,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=814360.0, ans=0.125 2024-09-20 04:31:31,826 INFO [train.py:1198] (1/2) Epoch 45, batch 4500, loss[loss=0.2507, ctc_loss=0.1323, cr_loss=0.3683, attn_decoder_loss=0.2556, over 20716.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1187, cr_loss=0.3626, attn_decoder_loss=0.2462, over 5231720.39 frames. ], batch size: 209, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:31:32,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=814400.0, ans=0.025 2024-09-20 04:31:48,018 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.758e+01 1.032e+02 1.137e+02 1.254e+02 4.078e+02, threshold=2.275e+02, percent-clipped=1.0 2024-09-20 04:31:49,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=814440.0, ans=0.2 2024-09-20 04:32:39,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.76 vs. limit=15.0 2024-09-20 04:32:47,519 INFO [train.py:1198] (1/2) Epoch 46, batch 0, loss[loss=0.2161, ctc_loss=0.1005, cr_loss=0.3288, attn_decoder_loss=0.2216, over 29601.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1005, cr_loss=0.3288, attn_decoder_loss=0.2216, over 29601.00 frames. ], batch size: 73, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:32:47,520 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 04:33:04,847 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([5.1636, 4.9640, 4.7024, 4.4625], device='cuda:1') 2024-09-20 04:33:07,327 INFO [train.py:1230] (1/2) Epoch 46, validation: loss=0.2132, ctc_loss=0.03625, cr_loss=6.411e-15, attn_decoder_loss=0.2328, over 944034.00 frames. 2024-09-20 04:33:07,328 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 04:33:09,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=814500.0, ans=0.0 2024-09-20 04:33:11,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=814500.0, ans=0.125 2024-09-20 04:33:16,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=814500.0, ans=0.125 2024-09-20 04:33:18,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=814500.0, ans=0.2 2024-09-20 04:33:23,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.12 vs. limit=10.0 2024-09-20 04:33:27,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=814540.0, ans=0.2 2024-09-20 04:33:27,713 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2024-09-20 04:34:11,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.68 vs. limit=22.5 2024-09-20 04:34:24,683 INFO [train.py:1198] (1/2) Epoch 46, batch 50, loss[loss=0.2014, ctc_loss=0.08442, cr_loss=0.2861, attn_decoder_loss=0.2081, over 29419.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1127, cr_loss=0.3563, attn_decoder_loss=0.2395, over 1269160.28 frames. ], batch size: 70, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:34:54,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.16 vs. limit=15.0 2024-09-20 04:35:19,989 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 8.768e+01 9.324e+01 1.041e+02 2.439e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-20 04:35:23,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=814820.0, ans=0.0 2024-09-20 04:35:29,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.11 vs. limit=10.0 2024-09-20 04:35:41,089 INFO [train.py:1198] (1/2) Epoch 46, batch 100, loss[loss=0.2163, ctc_loss=0.09819, cr_loss=0.3176, attn_decoder_loss=0.2223, over 29529.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1114, cr_loss=0.3532, attn_decoder_loss=0.2396, over 2254008.49 frames. ], batch size: 76, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:35:51,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=814900.0, ans=0.125 2024-09-20 04:35:53,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=814900.0, ans=0.125 2024-09-20 04:35:56,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.28 vs. limit=15.0 2024-09-20 04:36:23,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=814980.0, ans=0.125 2024-09-20 04:36:25,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2024-09-20 04:36:26,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815020.0, ans=0.1 2024-09-20 04:36:35,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=815020.0, ans=0.125 2024-09-20 04:36:46,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=815060.0, ans=0.2 2024-09-20 04:36:55,409 INFO [train.py:1198] (1/2) Epoch 46, batch 150, loss[loss=0.2053, ctc_loss=0.08858, cr_loss=0.302, attn_decoder_loss=0.2115, over 29429.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1089, cr_loss=0.3479, attn_decoder_loss=0.2375, over 3048342.35 frames. ], batch size: 70, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:37:12,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=815140.0, ans=0.0 2024-09-20 04:37:25,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815180.0, ans=0.1 2024-09-20 04:37:27,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=815180.0, ans=0.0 2024-09-20 04:37:39,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=815220.0, ans=0.125 2024-09-20 04:37:51,965 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.420e+01 9.019e+01 9.584e+01 1.300e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-20 04:38:04,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815260.0, ans=0.1 2024-09-20 04:38:12,897 INFO [train.py:1198] (1/2) Epoch 46, batch 200, loss[loss=0.241, ctc_loss=0.1176, cr_loss=0.3505, attn_decoder_loss=0.247, over 27289.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1088, cr_loss=0.3479, attn_decoder_loss=0.2366, over 3659917.42 frames. ], batch size: 124, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:38:13,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=815300.0, ans=0.125 2024-09-20 04:38:28,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=815340.0, ans=0.125 2024-09-20 04:38:28,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=815340.0, ans=0.0 2024-09-20 04:38:56,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=815420.0, ans=0.125 2024-09-20 04:39:23,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=815460.0, ans=0.0 2024-09-20 04:39:23,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.05 vs. limit=10.0 2024-09-20 04:39:27,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=815460.0, ans=0.0 2024-09-20 04:39:30,378 INFO [train.py:1198] (1/2) Epoch 46, batch 250, loss[loss=0.2379, ctc_loss=0.1102, cr_loss=0.335, attn_decoder_loss=0.2447, over 29226.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1085, cr_loss=0.3469, attn_decoder_loss=0.2365, over 4141535.99 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:39:32,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815500.0, ans=0.1 2024-09-20 04:40:03,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=815580.0, ans=0.1 2024-09-20 04:40:18,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=815620.0, ans=0.125 2024-09-20 04:40:24,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.615e+01 9.020e+01 9.569e+01 1.385e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-20 04:40:38,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten.whitening_limit, batch_count=815660.0, ans=15.0 2024-09-20 04:40:45,341 INFO [train.py:1198] (1/2) Epoch 46, batch 300, loss[loss=0.2465, ctc_loss=0.1241, cr_loss=0.3912, attn_decoder_loss=0.2514, over 29547.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1086, cr_loss=0.3462, attn_decoder_loss=0.2364, over 4509769.39 frames. ], batch size: 92, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:40:47,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=815700.0, ans=0.125 2024-09-20 04:41:37,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.42 vs. limit=15.0 2024-09-20 04:41:51,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=815860.0, ans=0.125 2024-09-20 04:42:02,779 INFO [train.py:1198] (1/2) Epoch 46, batch 350, loss[loss=0.2173, ctc_loss=0.09855, cr_loss=0.3242, attn_decoder_loss=0.2233, over 29305.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1089, cr_loss=0.3472, attn_decoder_loss=0.2369, over 4795731.40 frames. ], batch size: 71, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:42:08,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.64 vs. limit=15.0 2024-09-20 04:42:24,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.60 vs. limit=15.0 2024-09-20 04:42:31,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=815980.0, ans=0.0 2024-09-20 04:42:52,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=815980.0, ans=0.0 2024-09-20 04:42:52,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=815980.0, ans=0.125 2024-09-20 04:42:52,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=815980.0, ans=0.025 2024-09-20 04:42:55,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=816020.0, ans=0.125 2024-09-20 04:43:04,445 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.751e+01 9.027e+01 9.740e+01 2.091e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-20 04:43:07,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816020.0, ans=0.1 2024-09-20 04:43:10,877 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:43:19,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=816060.0, ans=0.0 2024-09-20 04:43:27,886 INFO [train.py:1198] (1/2) Epoch 46, batch 400, loss[loss=0.2385, ctc_loss=0.1184, cr_loss=0.3815, attn_decoder_loss=0.2434, over 29688.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1087, cr_loss=0.3474, attn_decoder_loss=0.2369, over 5023996.44 frames. ], batch size: 82, lr: 2.40e-03, grad_scale: 32.0 2024-09-20 04:43:34,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=816100.0, ans=0.0 2024-09-20 04:43:43,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.35 vs. limit=15.0 2024-09-20 04:43:48,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=816140.0, ans=0.125 2024-09-20 04:43:53,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=816140.0, ans=0.125 2024-09-20 04:44:26,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=816220.0, ans=0.125 2024-09-20 04:44:43,883 INFO [train.py:1198] (1/2) Epoch 46, batch 450, loss[loss=0.2396, ctc_loss=0.1138, cr_loss=0.3683, attn_decoder_loss=0.2454, over 29690.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1092, cr_loss=0.3481, attn_decoder_loss=0.2373, over 5186139.49 frames. ], batch size: 83, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:44:53,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=816300.0, ans=0.125 2024-09-20 04:44:58,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.98 vs. limit=10.0 2024-09-20 04:45:04,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=816340.0, ans=0.125 2024-09-20 04:45:09,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=816340.0, ans=0.125 2024-09-20 04:45:14,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=816380.0, ans=0.0 2024-09-20 04:45:17,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=816380.0, ans=0.07 2024-09-20 04:45:22,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=816380.0, ans=0.07 2024-09-20 04:45:40,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.630e+01 9.037e+01 9.631e+01 6.120e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-20 04:45:42,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=816420.0, ans=0.0 2024-09-20 04:46:02,237 INFO [train.py:1198] (1/2) Epoch 46, batch 500, loss[loss=0.244, ctc_loss=0.1253, cr_loss=0.3631, attn_decoder_loss=0.2491, over 29457.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.109, cr_loss=0.3483, attn_decoder_loss=0.2369, over 5329559.63 frames. ], batch size: 94, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:46:02,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.50 vs. limit=15.0 2024-09-20 04:46:23,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=816540.0, ans=0.125 2024-09-20 04:46:27,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-09-20 04:46:29,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=816540.0, ans=0.0 2024-09-20 04:46:47,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=816620.0, ans=0.1 2024-09-20 04:47:01,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=816660.0, ans=0.0 2024-09-20 04:47:09,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=816660.0, ans=0.125 2024-09-20 04:47:20,170 INFO [train.py:1198] (1/2) Epoch 46, batch 550, loss[loss=0.2283, ctc_loss=0.107, cr_loss=0.3436, attn_decoder_loss=0.2341, over 28886.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1087, cr_loss=0.3475, attn_decoder_loss=0.2369, over 5422105.27 frames. ], batch size: 104, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:47:30,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.33 vs. limit=22.5 2024-09-20 04:47:30,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=816700.0, ans=0.035 2024-09-20 04:47:38,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=816740.0, ans=0.125 2024-09-20 04:47:41,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=816740.0, ans=0.125 2024-09-20 04:47:45,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-09-20 04:47:45,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.42 vs. limit=10.0 2024-09-20 04:47:47,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=816740.0, ans=0.125 2024-09-20 04:47:47,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=816740.0, ans=0.125 2024-09-20 04:47:52,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.78 vs. limit=15.0 2024-09-20 04:48:16,367 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.519e+01 9.115e+01 9.608e+01 2.263e+02, threshold=1.823e+02, percent-clipped=2.0 2024-09-20 04:48:24,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=816860.0, ans=0.125 2024-09-20 04:48:36,091 INFO [train.py:1198] (1/2) Epoch 46, batch 600, loss[loss=0.245, ctc_loss=0.1191, cr_loss=0.3773, attn_decoder_loss=0.2506, over 29279.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.109, cr_loss=0.3479, attn_decoder_loss=0.2371, over 5508219.81 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:48:49,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816940.0, ans=0.1 2024-09-20 04:48:51,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=816940.0, ans=0.125 2024-09-20 04:49:22,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=817020.0, ans=0.125 2024-09-20 04:49:29,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=817020.0, ans=0.125 2024-09-20 04:49:53,428 INFO [train.py:1198] (1/2) Epoch 46, batch 650, loss[loss=0.2314, ctc_loss=0.1095, cr_loss=0.349, attn_decoder_loss=0.2372, over 29775.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1083, cr_loss=0.347, attn_decoder_loss=0.2368, over 5585476.91 frames. ], batch size: 81, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:50:09,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=817140.0, ans=10.0 2024-09-20 04:50:33,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817180.0, ans=0.1 2024-09-20 04:50:39,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=817220.0, ans=0.025 2024-09-20 04:50:49,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.319e+01 8.831e+01 9.492e+01 1.301e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-20 04:50:51,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=817220.0, ans=0.125 2024-09-20 04:51:08,830 INFO [train.py:1198] (1/2) Epoch 46, batch 700, loss[loss=0.2203, ctc_loss=0.1024, cr_loss=0.3454, attn_decoder_loss=0.2258, over 29536.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1087, cr_loss=0.3478, attn_decoder_loss=0.237, over 5636140.56 frames. ], batch size: 76, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:51:09,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=817300.0, ans=0.125 2024-09-20 04:51:12,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.37 vs. limit=10.0 2024-09-20 04:51:22,541 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=12.0 2024-09-20 04:51:23,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=817300.0, ans=0.125 2024-09-20 04:51:41,957 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:51:43,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2024-09-20 04:52:00,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=817420.0, ans=0.0 2024-09-20 04:52:06,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817420.0, ans=0.1 2024-09-20 04:52:27,350 INFO [train.py:1198] (1/2) Epoch 46, batch 750, loss[loss=0.2331, ctc_loss=0.1053, cr_loss=0.3272, attn_decoder_loss=0.24, over 29730.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.109, cr_loss=0.3485, attn_decoder_loss=0.237, over 5675535.47 frames. ], batch size: 82, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:52:27,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=817500.0, ans=0.2 2024-09-20 04:52:29,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=817500.0, ans=0.07 2024-09-20 04:52:30,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2024-09-20 04:52:30,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=817500.0, ans=0.0 2024-09-20 04:52:39,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=817500.0, ans=0.0 2024-09-20 04:52:41,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=817540.0, ans=0.05 2024-09-20 04:53:02,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=817580.0, ans=0.125 2024-09-20 04:53:07,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=817580.0, ans=0.0 2024-09-20 04:53:08,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=817580.0, ans=0.125 2024-09-20 04:53:08,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=817580.0, ans=0.0 2024-09-20 04:53:23,358 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.548e+01 9.089e+01 9.698e+01 1.282e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-20 04:53:44,978 INFO [train.py:1198] (1/2) Epoch 46, batch 800, loss[loss=0.2116, ctc_loss=0.097, cr_loss=0.3215, attn_decoder_loss=0.2172, over 29628.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1091, cr_loss=0.3487, attn_decoder_loss=0.2369, over 5707331.97 frames. ], batch size: 73, lr: 2.40e-03, grad_scale: 32.0 2024-09-20 04:54:14,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=817780.0, ans=0.125 2024-09-20 04:54:28,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.23 vs. limit=22.5 2024-09-20 04:54:56,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-09-20 04:55:00,148 INFO [train.py:1198] (1/2) Epoch 46, batch 850, loss[loss=0.2461, ctc_loss=0.1155, cr_loss=0.3619, attn_decoder_loss=0.2526, over 29722.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1086, cr_loss=0.3473, attn_decoder_loss=0.2364, over 5735909.59 frames. ], batch size: 89, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:55:10,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=817900.0, ans=0.0 2024-09-20 04:55:27,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=22.5 2024-09-20 04:55:54,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=818020.0, ans=0.125 2024-09-20 04:55:59,848 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.524e+01 9.066e+01 9.505e+01 2.667e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-20 04:56:09,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=818060.0, ans=0.125 2024-09-20 04:56:10,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=818060.0, ans=0.0 2024-09-20 04:56:18,041 INFO [train.py:1198] (1/2) Epoch 46, batch 900, loss[loss=0.2057, ctc_loss=0.08618, cr_loss=0.2967, attn_decoder_loss=0.2124, over 29623.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1085, cr_loss=0.3471, attn_decoder_loss=0.2366, over 5740907.02 frames. ], batch size: 73, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:56:30,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=818100.0, ans=0.0 2024-09-20 04:56:43,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818140.0, ans=0.1 2024-09-20 04:56:45,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.29 vs. limit=10.0 2024-09-20 04:57:03,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2024-09-20 04:57:12,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=818220.0, ans=0.125 2024-09-20 04:57:15,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=818220.0, ans=0.0 2024-09-20 04:57:16,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=818260.0, ans=0.125 2024-09-20 04:57:17,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-20 04:57:24,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=818260.0, ans=0.0 2024-09-20 04:57:34,898 INFO [train.py:1198] (1/2) Epoch 46, batch 950, loss[loss=0.2178, ctc_loss=0.0954, cr_loss=0.3184, attn_decoder_loss=0.2244, over 29485.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1083, cr_loss=0.3465, attn_decoder_loss=0.2366, over 5741792.99 frames. ], batch size: 74, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:57:50,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2024-09-20 04:57:56,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=818340.0, ans=0.125 2024-09-20 04:58:01,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2024-09-20 04:58:06,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=818380.0, ans=0.125 2024-09-20 04:58:32,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.692e+01 9.271e+01 9.926e+01 1.686e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-20 04:58:50,241 INFO [train.py:1198] (1/2) Epoch 46, batch 1000, loss[loss=0.2193, ctc_loss=0.102, cr_loss=0.3419, attn_decoder_loss=0.2247, over 29496.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1091, cr_loss=0.3483, attn_decoder_loss=0.2373, over 5736506.81 frames. ], batch size: 77, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:58:55,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.50 vs. limit=10.0 2024-09-20 04:59:06,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.92 vs. limit=22.5 2024-09-20 04:59:15,620 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:59:21,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2024-09-20 05:00:06,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.53 vs. limit=15.0 2024-09-20 05:00:07,746 INFO [train.py:1198] (1/2) Epoch 46, batch 1050, loss[loss=0.2365, ctc_loss=0.1124, cr_loss=0.335, attn_decoder_loss=0.2429, over 29703.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1085, cr_loss=0.3466, attn_decoder_loss=0.2364, over 5744377.20 frames. ], batch size: 85, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:00:13,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=8.0 2024-09-20 05:00:13,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=818700.0, ans=0.07 2024-09-20 05:00:22,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-20 05:00:32,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=818740.0, ans=0.0 2024-09-20 05:00:32,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.22 vs. limit=15.0 2024-09-20 05:00:52,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=818820.0, ans=0.1 2024-09-20 05:01:05,426 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.722e+01 9.094e+01 9.715e+01 1.593e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-20 05:01:24,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=818900.0, ans=0.125 2024-09-20 05:01:25,793 INFO [train.py:1198] (1/2) Epoch 46, batch 1100, loss[loss=0.2304, ctc_loss=0.11, cr_loss=0.3627, attn_decoder_loss=0.2358, over 29458.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1082, cr_loss=0.346, attn_decoder_loss=0.2363, over 5756394.92 frames. ], batch size: 78, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:01:30,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=818900.0, ans=0.07 2024-09-20 05:01:30,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=818900.0, ans=0.125 2024-09-20 05:01:41,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=818940.0, ans=0.2 2024-09-20 05:01:45,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=818940.0, ans=0.025 2024-09-20 05:01:59,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=818980.0, ans=0.125 2024-09-20 05:02:05,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=818980.0, ans=0.0 2024-09-20 05:02:41,365 INFO [train.py:1198] (1/2) Epoch 46, batch 1150, loss[loss=0.2314, ctc_loss=0.1124, cr_loss=0.3582, attn_decoder_loss=0.2366, over 29453.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1087, cr_loss=0.3469, attn_decoder_loss=0.2366, over 5754651.21 frames. ], batch size: 78, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:02:48,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-20 05:03:07,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=819140.0, ans=0.125 2024-09-20 05:03:11,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=819140.0, ans=0.125 2024-09-20 05:03:14,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=819180.0, ans=0.2 2024-09-20 05:03:41,722 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.675e+01 9.165e+01 9.732e+01 5.471e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-20 05:03:42,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-20 05:03:58,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=819300.0, ans=0.125 2024-09-20 05:03:59,530 INFO [train.py:1198] (1/2) Epoch 46, batch 1200, loss[loss=0.2382, ctc_loss=0.1069, cr_loss=0.3394, attn_decoder_loss=0.2453, over 29682.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1091, cr_loss=0.3479, attn_decoder_loss=0.2372, over 5746404.03 frames. ], batch size: 85, lr: 2.40e-03, grad_scale: 32.0 2024-09-20 05:04:00,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-09-20 05:04:13,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=819340.0, ans=0.125 2024-09-20 05:04:16,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=819340.0, ans=0.95 2024-09-20 05:04:19,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=819340.0, ans=0.0 2024-09-20 05:04:20,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2024-09-20 05:04:33,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=819380.0, ans=0.125 2024-09-20 05:04:43,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=819420.0, ans=0.0 2024-09-20 05:04:47,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=15.0 2024-09-20 05:04:50,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-20 05:05:17,179 INFO [train.py:1198] (1/2) Epoch 46, batch 1250, loss[loss=0.2514, ctc_loss=0.124, cr_loss=0.377, attn_decoder_loss=0.2571, over 29507.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1097, cr_loss=0.3497, attn_decoder_loss=0.2379, over 5774475.54 frames. ], batch size: 92, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:05:23,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=819500.0, ans=0.0 2024-09-20 05:05:56,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=819580.0, ans=0.125 2024-09-20 05:06:14,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819620.0, ans=0.1 2024-09-20 05:06:16,051 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.479e+01 9.052e+01 9.530e+01 1.493e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-20 05:06:22,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2024-09-20 05:06:25,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=819660.0, ans=0.125 2024-09-20 05:06:32,584 INFO [train.py:1198] (1/2) Epoch 46, batch 1300, loss[loss=0.2267, ctc_loss=0.09715, cr_loss=0.3189, attn_decoder_loss=0.234, over 28196.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1088, cr_loss=0.3474, attn_decoder_loss=0.2369, over 5778826.55 frames. ], batch size: 111, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:06:35,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=819700.0, ans=0.025 2024-09-20 05:06:53,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.38 vs. limit=10.0 2024-09-20 05:06:55,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=819740.0, ans=0.95 2024-09-20 05:07:01,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=819780.0, ans=0.0 2024-09-20 05:07:15,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=819780.0, ans=0.0 2024-09-20 05:07:24,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=819820.0, ans=0.0 2024-09-20 05:07:43,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=819860.0, ans=0.2 2024-09-20 05:07:50,557 INFO [train.py:1198] (1/2) Epoch 46, batch 1350, loss[loss=0.2375, ctc_loss=0.1108, cr_loss=0.3656, attn_decoder_loss=0.2435, over 29778.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1085, cr_loss=0.3469, attn_decoder_loss=0.2368, over 5795626.95 frames. ], batch size: 81, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:07:51,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.57 vs. limit=15.0 2024-09-20 05:08:15,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=819940.0, ans=15.0 2024-09-20 05:08:28,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=819980.0, ans=0.125 2024-09-20 05:08:31,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=819980.0, ans=0.125 2024-09-20 05:08:48,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=820020.0, ans=0.0 2024-09-20 05:08:49,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.548e+01 9.009e+01 9.440e+01 1.283e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-20 05:09:08,072 INFO [train.py:1198] (1/2) Epoch 46, batch 1400, loss[loss=0.2074, ctc_loss=0.09375, cr_loss=0.3075, attn_decoder_loss=0.2132, over 29544.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1084, cr_loss=0.3468, attn_decoder_loss=0.2367, over 5806754.39 frames. ], batch size: 69, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:09:15,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.87 vs. limit=15.0 2024-09-20 05:09:28,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.18 vs. limit=15.0 2024-09-20 05:09:33,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=820140.0, ans=0.2 2024-09-20 05:09:33,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820140.0, ans=0.1 2024-09-20 05:09:35,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=820140.0, ans=0.125 2024-09-20 05:10:06,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=820260.0, ans=0.125 2024-09-20 05:10:18,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=820260.0, ans=0.125 2024-09-20 05:10:21,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=820300.0, ans=0.025 2024-09-20 05:10:23,186 INFO [train.py:1198] (1/2) Epoch 46, batch 1450, loss[loss=0.2451, ctc_loss=0.115, cr_loss=0.3553, attn_decoder_loss=0.2516, over 29449.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1084, cr_loss=0.3465, attn_decoder_loss=0.2371, over 5802897.95 frames. ], batch size: 94, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:10:50,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=820340.0, ans=0.0 2024-09-20 05:11:20,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2024-09-20 05:11:23,834 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 8.716e+01 9.154e+01 9.658e+01 1.732e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-20 05:11:25,802 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:11:28,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=820460.0, ans=0.125 2024-09-20 05:11:40,540 INFO [train.py:1198] (1/2) Epoch 46, batch 1500, loss[loss=0.2325, ctc_loss=0.1079, cr_loss=0.3338, attn_decoder_loss=0.2389, over 29620.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1086, cr_loss=0.3467, attn_decoder_loss=0.2374, over 5803825.29 frames. ], batch size: 86, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:11:42,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=820500.0, ans=0.0 2024-09-20 05:12:05,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=820540.0, ans=0.025 2024-09-20 05:12:18,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.72 vs. limit=15.0 2024-09-20 05:12:37,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=820620.0, ans=0.125 2024-09-20 05:12:40,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=820660.0, ans=0.125 2024-09-20 05:12:43,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-09-20 05:12:44,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=820660.0, ans=0.125 2024-09-20 05:12:46,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2024-09-20 05:12:52,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=820660.0, ans=0.0 2024-09-20 05:12:58,451 INFO [train.py:1198] (1/2) Epoch 46, batch 1550, loss[loss=0.2416, ctc_loss=0.1183, cr_loss=0.36, attn_decoder_loss=0.2473, over 29509.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1089, cr_loss=0.3475, attn_decoder_loss=0.2373, over 5780363.02 frames. ], batch size: 90, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:13:08,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-09-20 05:13:24,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.84 vs. limit=22.5 2024-09-20 05:13:26,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2024-09-20 05:13:31,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=820780.0, ans=0.125 2024-09-20 05:13:57,689 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.666e+01 9.165e+01 9.955e+01 1.733e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-20 05:14:00,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=820860.0, ans=0.125 2024-09-20 05:14:14,157 INFO [train.py:1198] (1/2) Epoch 46, batch 1600, loss[loss=0.2385, ctc_loss=0.1112, cr_loss=0.3567, attn_decoder_loss=0.2447, over 29683.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1092, cr_loss=0.3481, attn_decoder_loss=0.2373, over 5764263.93 frames. ], batch size: 85, lr: 2.39e-03, grad_scale: 32.0 2024-09-20 05:14:18,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=820900.0, ans=0.2 2024-09-20 05:15:06,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=821020.0, ans=0.0 2024-09-20 05:15:31,300 INFO [train.py:1198] (1/2) Epoch 46, batch 1650, loss[loss=0.24, ctc_loss=0.106, cr_loss=0.3396, attn_decoder_loss=0.2473, over 29690.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1088, cr_loss=0.3471, attn_decoder_loss=0.2371, over 5758899.23 frames. ], batch size: 89, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:15:43,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=821100.0, ans=0.1 2024-09-20 05:16:09,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=821180.0, ans=0.0 2024-09-20 05:16:31,207 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.592e+01 9.131e+01 9.784e+01 1.419e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-20 05:16:48,215 INFO [train.py:1198] (1/2) Epoch 46, batch 1700, loss[loss=0.2094, ctc_loss=0.09766, cr_loss=0.3283, attn_decoder_loss=0.2145, over 29584.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1085, cr_loss=0.3467, attn_decoder_loss=0.2369, over 5780513.89 frames. ], batch size: 69, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:16:53,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=821300.0, ans=0.04949747468305833 2024-09-20 05:16:59,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=821300.0, ans=0.125 2024-09-20 05:16:59,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=821300.0, ans=0.07 2024-09-20 05:17:06,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=821340.0, ans=0.125 2024-09-20 05:17:09,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=821340.0, ans=0.125 2024-09-20 05:17:15,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=821340.0, ans=0.125 2024-09-20 05:17:26,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=821380.0, ans=0.05 2024-09-20 05:17:29,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=821380.0, ans=0.025 2024-09-20 05:17:32,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=821420.0, ans=0.125 2024-09-20 05:17:37,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=821420.0, ans=0.125 2024-09-20 05:17:44,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=821420.0, ans=0.0 2024-09-20 05:17:44,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=821420.0, ans=0.125 2024-09-20 05:17:57,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=12.0 2024-09-20 05:17:59,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821460.0, ans=0.1 2024-09-20 05:18:03,232 INFO [train.py:1198] (1/2) Epoch 46, batch 1750, loss[loss=0.2113, ctc_loss=0.09574, cr_loss=0.3238, attn_decoder_loss=0.2169, over 29330.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1085, cr_loss=0.3473, attn_decoder_loss=0.2368, over 5789510.64 frames. ], batch size: 67, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:18:05,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=821500.0, ans=0.125 2024-09-20 05:18:43,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.58 vs. limit=15.0 2024-09-20 05:18:50,778 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:19:06,023 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.682e+01 9.175e+01 9.617e+01 1.208e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-20 05:19:12,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=821660.0, ans=0.0 2024-09-20 05:19:20,746 INFO [train.py:1198] (1/2) Epoch 46, batch 1800, loss[loss=0.2538, ctc_loss=0.1264, cr_loss=0.3891, attn_decoder_loss=0.2593, over 29682.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1088, cr_loss=0.3479, attn_decoder_loss=0.2373, over 5792174.65 frames. ], batch size: 83, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:19:23,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=821700.0, ans=0.125 2024-09-20 05:19:34,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=821740.0, ans=0.0 2024-09-20 05:19:47,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-09-20 05:19:47,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-20 05:19:59,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=22.5 2024-09-20 05:20:01,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=821780.0, ans=0.125 2024-09-20 05:20:22,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=821860.0, ans=0.025 2024-09-20 05:20:38,148 INFO [train.py:1198] (1/2) Epoch 46, batch 1850, loss[loss=0.244, ctc_loss=0.1118, cr_loss=0.3585, attn_decoder_loss=0.2507, over 29626.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1087, cr_loss=0.3477, attn_decoder_loss=0.2371, over 5798251.83 frames. ], batch size: 86, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:20:38,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=821900.0, ans=0.125 2024-09-20 05:20:47,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=821900.0, ans=15.0 2024-09-20 05:20:51,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=821940.0, ans=0.0 2024-09-20 05:20:54,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=821940.0, ans=0.125 2024-09-20 05:20:56,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=821940.0, ans=0.125 2024-09-20 05:21:07,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2024-09-20 05:21:38,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.515e+01 9.175e+01 9.634e+01 2.306e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-20 05:21:53,036 INFO [train.py:1198] (1/2) Epoch 46, batch 1900, loss[loss=0.2347, ctc_loss=0.1027, cr_loss=0.322, attn_decoder_loss=0.2422, over 29716.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1089, cr_loss=0.3478, attn_decoder_loss=0.2376, over 5805528.77 frames. ], batch size: 89, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:21:59,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=822100.0, ans=0.0 2024-09-20 05:22:19,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=822140.0, ans=0.0 2024-09-20 05:22:27,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.78 vs. limit=6.0 2024-09-20 05:22:51,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=822220.0, ans=0.0 2024-09-20 05:22:57,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=822260.0, ans=0.0 2024-09-20 05:22:59,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=822260.0, ans=0.0 2024-09-20 05:23:05,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2024-09-20 05:23:10,795 INFO [train.py:1198] (1/2) Epoch 46, batch 1950, loss[loss=0.2303, ctc_loss=0.1082, cr_loss=0.3431, attn_decoder_loss=0.2363, over 29489.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1093, cr_loss=0.3491, attn_decoder_loss=0.2383, over 5819703.07 frames. ], batch size: 78, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:23:20,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.93 vs. limit=15.0 2024-09-20 05:23:29,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=822340.0, ans=0.0 2024-09-20 05:23:32,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=822340.0, ans=0.1 2024-09-20 05:23:33,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822340.0, ans=0.1 2024-09-20 05:23:50,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=822380.0, ans=0.125 2024-09-20 05:23:53,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=822380.0, ans=0.125 2024-09-20 05:24:06,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=822420.0, ans=0.0 2024-09-20 05:24:09,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=822460.0, ans=0.125 2024-09-20 05:24:10,972 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.689e+01 9.269e+01 9.722e+01 1.487e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-20 05:24:16,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822460.0, ans=0.1 2024-09-20 05:24:18,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.21 vs. limit=6.0 2024-09-20 05:24:22,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=822460.0, ans=0.2 2024-09-20 05:24:24,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=822460.0, ans=0.0 2024-09-20 05:24:24,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=822460.0, ans=0.125 2024-09-20 05:24:28,129 INFO [train.py:1198] (1/2) Epoch 46, batch 2000, loss[loss=0.2158, ctc_loss=0.108, cr_loss=0.3495, attn_decoder_loss=0.22, over 29343.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1096, cr_loss=0.3499, attn_decoder_loss=0.2387, over 5796240.07 frames. ], batch size: 67, lr: 2.39e-03, grad_scale: 32.0 2024-09-20 05:24:32,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=822500.0, ans=0.2 2024-09-20 05:24:48,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=22.5 2024-09-20 05:25:03,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=12.0 2024-09-20 05:25:15,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=822620.0, ans=0.0 2024-09-20 05:25:18,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=822620.0, ans=0.0 2024-09-20 05:25:43,661 INFO [train.py:1198] (1/2) Epoch 46, batch 2050, loss[loss=0.2094, ctc_loss=0.09015, cr_loss=0.309, attn_decoder_loss=0.2158, over 29472.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.109, cr_loss=0.3481, attn_decoder_loss=0.238, over 5788563.68 frames. ], batch size: 70, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:25:47,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=822700.0, ans=0.125 2024-09-20 05:26:17,394 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:26:19,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=12.0 2024-09-20 05:26:24,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=822780.0, ans=0.0 2024-09-20 05:26:33,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=822820.0, ans=0.125 2024-09-20 05:26:35,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822820.0, ans=0.1 2024-09-20 05:26:35,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=822820.0, ans=0.125 2024-09-20 05:26:41,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=822820.0, ans=0.0 2024-09-20 05:26:47,717 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.548e+01 9.000e+01 9.590e+01 1.636e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-20 05:26:58,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=822860.0, ans=0.0 2024-09-20 05:27:01,495 INFO [train.py:1198] (1/2) Epoch 46, batch 2100, loss[loss=0.2283, ctc_loss=0.1031, cr_loss=0.3355, attn_decoder_loss=0.2348, over 29762.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1088, cr_loss=0.3478, attn_decoder_loss=0.2374, over 5798852.38 frames. ], batch size: 81, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:27:05,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-09-20 05:27:12,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=822900.0, ans=0.05 2024-09-20 05:27:16,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822940.0, ans=0.1 2024-09-20 05:27:18,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=822940.0, ans=0.025 2024-09-20 05:27:21,478 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:27:27,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=822940.0, ans=0.125 2024-09-20 05:27:28,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=822940.0, ans=0.125 2024-09-20 05:27:58,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=823020.0, ans=0.125 2024-09-20 05:27:59,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2024-09-20 05:28:16,966 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:28:18,249 INFO [train.py:1198] (1/2) Epoch 46, batch 2150, loss[loss=0.2227, ctc_loss=0.1023, cr_loss=0.3409, attn_decoder_loss=0.2285, over 29428.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.108, cr_loss=0.346, attn_decoder_loss=0.2367, over 5814226.38 frames. ], batch size: 78, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:28:18,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=823100.0, ans=0.0 2024-09-20 05:28:58,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=823180.0, ans=0.125 2024-09-20 05:29:14,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=823220.0, ans=0.0 2024-09-20 05:29:17,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.27 vs. limit=10.0 2024-09-20 05:29:17,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=823260.0, ans=0.125 2024-09-20 05:29:20,448 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.498e+01 9.079e+01 9.733e+01 1.239e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-20 05:29:28,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=823260.0, ans=0.125 2024-09-20 05:29:34,172 INFO [train.py:1198] (1/2) Epoch 46, batch 2200, loss[loss=0.2414, ctc_loss=0.1059, cr_loss=0.3376, attn_decoder_loss=0.2489, over 29629.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1084, cr_loss=0.3469, attn_decoder_loss=0.237, over 5811325.12 frames. ], batch size: 86, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:29:34,538 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:29:51,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=823340.0, ans=0.95 2024-09-20 05:29:58,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=823340.0, ans=0.125 2024-09-20 05:30:06,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=823380.0, ans=0.04949747468305833 2024-09-20 05:30:06,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=823380.0, ans=0.1 2024-09-20 05:30:52,034 INFO [train.py:1198] (1/2) Epoch 46, batch 2250, loss[loss=0.2217, ctc_loss=0.1029, cr_loss=0.3573, attn_decoder_loss=0.2269, over 29707.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1083, cr_loss=0.3463, attn_decoder_loss=0.2367, over 5809399.59 frames. ], batch size: 82, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:31:10,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=823540.0, ans=0.07 2024-09-20 05:31:15,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.82 vs. limit=10.0 2024-09-20 05:31:34,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=823580.0, ans=0.125 2024-09-20 05:31:40,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=823620.0, ans=0.125 2024-09-20 05:31:41,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=823620.0, ans=0.0 2024-09-20 05:31:46,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-09-20 05:31:47,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=823620.0, ans=0.0 2024-09-20 05:31:50,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=823660.0, ans=0.125 2024-09-20 05:31:53,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.446e+01 9.116e+01 9.634e+01 2.292e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-20 05:31:58,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=823660.0, ans=0.0 2024-09-20 05:32:08,797 INFO [train.py:1198] (1/2) Epoch 46, batch 2300, loss[loss=0.2042, ctc_loss=0.09119, cr_loss=0.3101, attn_decoder_loss=0.2099, over 29327.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1076, cr_loss=0.3449, attn_decoder_loss=0.2359, over 5795962.27 frames. ], batch size: 71, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:32:23,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=823740.0, ans=0.125 2024-09-20 05:32:30,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=823740.0, ans=0.125 2024-09-20 05:32:33,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=823740.0, ans=0.125 2024-09-20 05:32:36,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=823740.0, ans=0.0 2024-09-20 05:32:36,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=823740.0, ans=0.125 2024-09-20 05:32:43,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=823780.0, ans=0.0 2024-09-20 05:32:44,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-09-20 05:33:12,496 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:33:20,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=823860.0, ans=0.025 2024-09-20 05:33:24,357 INFO [train.py:1198] (1/2) Epoch 46, batch 2350, loss[loss=0.2342, ctc_loss=0.111, cr_loss=0.3536, attn_decoder_loss=0.2401, over 29686.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1079, cr_loss=0.3455, attn_decoder_loss=0.2363, over 5801735.00 frames. ], batch size: 83, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:33:24,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=823900.0, ans=0.125 2024-09-20 05:33:28,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=823900.0, ans=0.125 2024-09-20 05:33:28,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=823900.0, ans=0.125 2024-09-20 05:34:23,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.60 vs. limit=15.0 2024-09-20 05:34:26,078 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.622e+01 9.111e+01 9.786e+01 2.523e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-20 05:34:39,838 INFO [train.py:1198] (1/2) Epoch 46, batch 2400, loss[loss=0.221, ctc_loss=0.1005, cr_loss=0.335, attn_decoder_loss=0.227, over 29541.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1085, cr_loss=0.347, attn_decoder_loss=0.2368, over 5805891.48 frames. ], batch size: 76, lr: 2.39e-03, grad_scale: 32.0 2024-09-20 05:34:48,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=824100.0, ans=0.2 2024-09-20 05:35:08,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=824140.0, ans=0.0 2024-09-20 05:35:12,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824180.0, ans=0.1 2024-09-20 05:35:15,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=824180.0, ans=0.125 2024-09-20 05:35:39,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=824220.0, ans=0.125 2024-09-20 05:35:52,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=824260.0, ans=0.1 2024-09-20 05:35:59,534 INFO [train.py:1198] (1/2) Epoch 46, batch 2450, loss[loss=0.2353, ctc_loss=0.1131, cr_loss=0.3606, attn_decoder_loss=0.2409, over 29692.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1094, cr_loss=0.3487, attn_decoder_loss=0.2377, over 5782376.16 frames. ], batch size: 82, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:36:04,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.22 vs. limit=15.0 2024-09-20 05:36:25,472 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:36:33,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=824380.0, ans=0.2 2024-09-20 05:36:43,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=824420.0, ans=0.125 2024-09-20 05:36:44,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=824420.0, ans=0.95 2024-09-20 05:36:46,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=824420.0, ans=0.07 2024-09-20 05:36:56,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=824420.0, ans=0.0 2024-09-20 05:36:58,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=824460.0, ans=0.125 2024-09-20 05:37:01,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=824460.0, ans=0.5 2024-09-20 05:37:02,670 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.776e+01 9.478e+01 1.012e+02 4.785e+02, threshold=1.896e+02, percent-clipped=1.0 2024-09-20 05:37:14,643 INFO [train.py:1198] (1/2) Epoch 46, batch 2500, loss[loss=0.2508, ctc_loss=0.1189, cr_loss=0.364, attn_decoder_loss=0.2573, over 29641.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1095, cr_loss=0.3485, attn_decoder_loss=0.2377, over 5793437.97 frames. ], batch size: 86, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:37:16,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=824500.0, ans=0.125 2024-09-20 05:37:37,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=824540.0, ans=0.09899494936611666 2024-09-20 05:37:47,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2024-09-20 05:38:19,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=824660.0, ans=0.125 2024-09-20 05:38:29,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=824700.0, ans=0.0 2024-09-20 05:38:30,353 INFO [train.py:1198] (1/2) Epoch 46, batch 2550, loss[loss=0.2053, ctc_loss=0.09107, cr_loss=0.3219, attn_decoder_loss=0.2108, over 29327.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1092, cr_loss=0.3475, attn_decoder_loss=0.2375, over 5797133.30 frames. ], batch size: 67, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:38:39,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=824700.0, ans=0.0 2024-09-20 05:38:52,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=824740.0, ans=0.2 2024-09-20 05:38:55,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=824740.0, ans=0.2 2024-09-20 05:38:58,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=824740.0, ans=0.125 2024-09-20 05:39:13,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=824780.0, ans=0.0 2024-09-20 05:39:28,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=824820.0, ans=0.0 2024-09-20 05:39:33,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=824860.0, ans=0.0 2024-09-20 05:39:36,273 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.555e+01 9.140e+01 9.726e+01 1.841e+02, threshold=1.828e+02, percent-clipped=0.0 2024-09-20 05:39:36,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824860.0, ans=0.1 2024-09-20 05:39:50,516 INFO [train.py:1198] (1/2) Epoch 46, batch 2600, loss[loss=0.2239, ctc_loss=0.1019, cr_loss=0.3394, attn_decoder_loss=0.2299, over 29452.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.109, cr_loss=0.3473, attn_decoder_loss=0.2378, over 5793286.61 frames. ], batch size: 78, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:40:02,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-09-20 05:40:08,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=824940.0, ans=0.1 2024-09-20 05:40:11,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=824940.0, ans=0.125 2024-09-20 05:40:35,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=825020.0, ans=0.0 2024-09-20 05:40:53,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=825060.0, ans=0.125 2024-09-20 05:40:59,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-09-20 05:41:05,514 INFO [train.py:1198] (1/2) Epoch 46, batch 2650, loss[loss=0.2494, ctc_loss=0.1219, cr_loss=0.3779, attn_decoder_loss=0.2552, over 29244.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1095, cr_loss=0.3487, attn_decoder_loss=0.2384, over 5800181.74 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:41:36,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.22 vs. limit=22.5 2024-09-20 05:41:52,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=825220.0, ans=0.1 2024-09-20 05:41:55,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=825220.0, ans=0.2 2024-09-20 05:42:09,972 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.633e+01 9.169e+01 9.571e+01 1.241e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-20 05:42:20,698 INFO [train.py:1198] (1/2) Epoch 46, batch 2700, loss[loss=0.2429, ctc_loss=0.1156, cr_loss=0.3525, attn_decoder_loss=0.2492, over 29542.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1098, cr_loss=0.3497, attn_decoder_loss=0.2387, over 5795630.71 frames. ], batch size: 87, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:42:26,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=825300.0, ans=0.125 2024-09-20 05:42:39,722 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:43:12,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=825420.0, ans=0.025 2024-09-20 05:43:20,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=825420.0, ans=0.0 2024-09-20 05:43:40,483 INFO [train.py:1198] (1/2) Epoch 46, batch 2750, loss[loss=0.2273, ctc_loss=0.1028, cr_loss=0.3255, attn_decoder_loss=0.2339, over 29503.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1092, cr_loss=0.3484, attn_decoder_loss=0.2377, over 5793618.54 frames. ], batch size: 75, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:43:48,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825500.0, ans=0.1 2024-09-20 05:43:51,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=825500.0, ans=0.0 2024-09-20 05:44:32,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=825620.0, ans=0.125 2024-09-20 05:44:46,006 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.644e+01 9.121e+01 9.722e+01 2.212e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-20 05:44:56,657 INFO [train.py:1198] (1/2) Epoch 46, batch 2800, loss[loss=0.2465, ctc_loss=0.1242, cr_loss=0.3598, attn_decoder_loss=0.2521, over 20180.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1093, cr_loss=0.3486, attn_decoder_loss=0.2378, over 5774447.37 frames. ], batch size: 209, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:45:01,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=825700.0, ans=0.125 2024-09-20 05:45:02,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=825700.0, ans=0.125 2024-09-20 05:45:04,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.16 vs. limit=22.5 2024-09-20 05:45:05,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2024-09-20 05:45:32,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825780.0, ans=0.1 2024-09-20 05:46:11,549 INFO [train.py:1198] (1/2) Epoch 46, batch 2850, loss[loss=0.2206, ctc_loss=0.1036, cr_loss=0.3347, attn_decoder_loss=0.2261, over 29516.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1095, cr_loss=0.3483, attn_decoder_loss=0.238, over 5759734.64 frames. ], batch size: 77, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:46:39,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=825940.0, ans=0.125 2024-09-20 05:46:42,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=825980.0, ans=0.125 2024-09-20 05:47:12,006 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:47:22,084 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.726e+01 9.166e+01 9.745e+01 2.049e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-20 05:47:22,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=826060.0, ans=0.0 2024-09-20 05:47:31,170 INFO [train.py:1198] (1/2) Epoch 46, batch 2900, loss[loss=0.2313, ctc_loss=0.1183, cr_loss=0.3735, attn_decoder_loss=0.2355, over 29425.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.11, cr_loss=0.3496, attn_decoder_loss=0.2391, over 5785382.35 frames. ], batch size: 79, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:47:40,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.50 vs. limit=15.0 2024-09-20 05:47:58,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=826140.0, ans=0.125 2024-09-20 05:48:01,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=826180.0, ans=0.2 2024-09-20 05:48:09,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826180.0, ans=0.1 2024-09-20 05:48:13,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=826180.0, ans=0.125 2024-09-20 05:48:19,719 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:48:22,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=826220.0, ans=0.07 2024-09-20 05:48:38,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2024-09-20 05:48:46,542 INFO [train.py:1198] (1/2) Epoch 46, batch 2950, loss[loss=0.2279, ctc_loss=0.118, cr_loss=0.367, attn_decoder_loss=0.232, over 29531.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.109, cr_loss=0.3471, attn_decoder_loss=0.2377, over 5780283.74 frames. ], batch size: 75, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:49:01,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=826340.0, ans=0.07 2024-09-20 05:49:07,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=826340.0, ans=0.125 2024-09-20 05:49:39,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=826420.0, ans=0.125 2024-09-20 05:49:53,238 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.789e+01 9.210e+01 9.770e+01 1.527e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-20 05:50:02,335 INFO [train.py:1198] (1/2) Epoch 46, batch 3000, loss[loss=0.2371, ctc_loss=0.1099, cr_loss=0.3624, attn_decoder_loss=0.2432, over 29763.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1087, cr_loss=0.3467, attn_decoder_loss=0.2378, over 5781164.10 frames. ], batch size: 81, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:50:02,336 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 05:50:21,364 INFO [train.py:1230] (1/2) Epoch 46, validation: loss=0.2122, ctc_loss=0.03683, cr_loss=6.872e-15, attn_decoder_loss=0.2317, over 944034.00 frames. 2024-09-20 05:50:21,365 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 05:50:29,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=826500.0, ans=0.125 2024-09-20 05:50:32,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=826500.0, ans=0.2 2024-09-20 05:50:59,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.67 vs. limit=22.5 2024-09-20 05:51:30,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.95 vs. limit=10.0 2024-09-20 05:51:37,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=826700.0, ans=0.2 2024-09-20 05:51:39,070 INFO [train.py:1198] (1/2) Epoch 46, batch 3050, loss[loss=0.2235, ctc_loss=0.108, cr_loss=0.3494, attn_decoder_loss=0.2286, over 29511.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1093, cr_loss=0.3481, attn_decoder_loss=0.2384, over 5775661.30 frames. ], batch size: 76, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:52:17,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=826780.0, ans=0.035 2024-09-20 05:52:20,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=826780.0, ans=0.0 2024-09-20 05:52:38,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826860.0, ans=0.1 2024-09-20 05:52:45,558 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.604e+01 9.128e+01 9.681e+01 2.059e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-20 05:52:54,438 INFO [train.py:1198] (1/2) Epoch 46, batch 3100, loss[loss=0.2522, ctc_loss=0.1306, cr_loss=0.4012, attn_decoder_loss=0.2568, over 29279.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1093, cr_loss=0.3483, attn_decoder_loss=0.2381, over 5776892.62 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:53:03,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=826900.0, ans=0.0 2024-09-20 05:53:05,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.19 vs. limit=10.0 2024-09-20 05:53:08,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=826940.0, ans=0.125 2024-09-20 05:53:19,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=12.0 2024-09-20 05:53:24,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=826980.0, ans=0.2 2024-09-20 05:53:31,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826980.0, ans=0.1 2024-09-20 05:53:40,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=827020.0, ans=0.0 2024-09-20 05:53:44,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=827020.0, ans=0.125 2024-09-20 05:54:12,265 INFO [train.py:1198] (1/2) Epoch 46, batch 3150, loss[loss=0.2415, ctc_loss=0.1118, cr_loss=0.357, attn_decoder_loss=0.2479, over 28801.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1093, cr_loss=0.3483, attn_decoder_loss=0.2379, over 5784243.89 frames. ], batch size: 104, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:54:21,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=827100.0, ans=0.125 2024-09-20 05:54:24,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=827100.0, ans=0.125 2024-09-20 05:54:27,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=827140.0, ans=0.125 2024-09-20 05:54:32,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=827140.0, ans=0.2 2024-09-20 05:54:36,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=827140.0, ans=0.125 2024-09-20 05:54:50,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=827180.0, ans=0.125 2024-09-20 05:54:57,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=827180.0, ans=0.125 2024-09-20 05:55:09,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=827220.0, ans=0.125 2024-09-20 05:55:14,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.90 vs. limit=10.0 2024-09-20 05:55:20,996 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.661e+01 9.228e+01 9.834e+01 1.754e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-20 05:55:28,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=827300.0, ans=0.125 2024-09-20 05:55:30,094 INFO [train.py:1198] (1/2) Epoch 46, batch 3200, loss[loss=0.233, ctc_loss=0.1089, cr_loss=0.3439, attn_decoder_loss=0.2392, over 29413.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1092, cr_loss=0.3487, attn_decoder_loss=0.2376, over 5794530.86 frames. ], batch size: 79, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:55:31,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=22.5 2024-09-20 05:55:43,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.26 vs. limit=22.5 2024-09-20 05:55:50,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-09-20 05:55:53,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=827340.0, ans=0.125 2024-09-20 05:55:59,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=827380.0, ans=0.125 2024-09-20 05:56:03,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=827380.0, ans=0.125 2024-09-20 05:56:10,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-09-20 05:56:15,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=827420.0, ans=0.2 2024-09-20 05:56:16,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-09-20 05:56:30,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=15.0 2024-09-20 05:56:45,890 INFO [train.py:1198] (1/2) Epoch 46, batch 3250, loss[loss=0.2447, ctc_loss=0.1152, cr_loss=0.3708, attn_decoder_loss=0.2508, over 29713.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1091, cr_loss=0.349, attn_decoder_loss=0.2379, over 5801229.89 frames. ], batch size: 84, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:56:47,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=827500.0, ans=0.125 2024-09-20 05:57:09,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.90 vs. limit=15.0 2024-09-20 05:57:10,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=827540.0, ans=0.2 2024-09-20 05:57:11,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=827540.0, ans=0.0 2024-09-20 05:57:20,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=827580.0, ans=0.125 2024-09-20 05:57:37,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=827620.0, ans=0.125 2024-09-20 05:57:44,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=827660.0, ans=0.125 2024-09-20 05:57:53,417 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.520e+01 8.954e+01 9.495e+01 2.408e+02, threshold=1.791e+02, percent-clipped=1.0 2024-09-20 05:57:53,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=827660.0, ans=0.0 2024-09-20 05:57:55,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=827660.0, ans=0.2 2024-09-20 05:57:58,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827660.0, ans=0.1 2024-09-20 05:58:01,024 INFO [train.py:1198] (1/2) Epoch 46, batch 3300, loss[loss=0.2445, ctc_loss=0.1161, cr_loss=0.3714, attn_decoder_loss=0.2505, over 28350.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1087, cr_loss=0.3477, attn_decoder_loss=0.2369, over 5798031.33 frames. ], batch size: 111, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 05:58:01,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827700.0, ans=0.1 2024-09-20 05:58:04,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=827700.0, ans=0.0 2024-09-20 05:58:12,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=827700.0, ans=0.2 2024-09-20 05:58:13,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2024-09-20 05:58:17,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=827740.0, ans=0.125 2024-09-20 05:58:22,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827740.0, ans=0.1 2024-09-20 05:58:27,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=827740.0, ans=0.125 2024-09-20 05:59:10,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.79 vs. limit=22.5 2024-09-20 05:59:12,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=827860.0, ans=0.025 2024-09-20 05:59:14,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=827860.0, ans=0.1 2024-09-20 05:59:20,221 INFO [train.py:1198] (1/2) Epoch 46, batch 3350, loss[loss=0.2411, ctc_loss=0.1175, cr_loss=0.3606, attn_decoder_loss=0.2468, over 28902.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1092, cr_loss=0.3485, attn_decoder_loss=0.2375, over 5774021.87 frames. ], batch size: 104, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 05:59:49,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=827980.0, ans=0.125 2024-09-20 05:59:59,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-09-20 06:00:10,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=828020.0, ans=0.04949747468305833 2024-09-20 06:00:13,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828020.0, ans=0.1 2024-09-20 06:00:28,532 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.801e+01 9.345e+01 9.836e+01 1.654e+02, threshold=1.869e+02, percent-clipped=0.0 2024-09-20 06:00:34,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=828100.0, ans=0.125 2024-09-20 06:00:36,121 INFO [train.py:1198] (1/2) Epoch 46, batch 3400, loss[loss=0.2036, ctc_loss=0.09002, cr_loss=0.3138, attn_decoder_loss=0.2093, over 29288.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1095, cr_loss=0.3492, attn_decoder_loss=0.2376, over 5766848.79 frames. ], batch size: 67, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 06:01:24,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=828220.0, ans=0.125 2024-09-20 06:01:26,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=828220.0, ans=0.125 2024-09-20 06:01:51,524 INFO [train.py:1198] (1/2) Epoch 46, batch 3450, loss[loss=0.2363, ctc_loss=0.1111, cr_loss=0.3303, attn_decoder_loss=0.2429, over 28304.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1093, cr_loss=0.3495, attn_decoder_loss=0.2378, over 5774286.47 frames. ], batch size: 111, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 06:01:58,447 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:03:03,200 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 8.466e+01 9.080e+01 9.638e+01 4.809e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-20 06:03:08,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=828460.0, ans=0.0 2024-09-20 06:03:10,697 INFO [train.py:1198] (1/2) Epoch 46, batch 3500, loss[loss=0.2079, ctc_loss=0.08629, cr_loss=0.3034, attn_decoder_loss=0.2147, over 29295.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1093, cr_loss=0.3494, attn_decoder_loss=0.2372, over 5776465.15 frames. ], batch size: 71, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 06:03:12,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=828500.0, ans=0.2 2024-09-20 06:03:20,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=828500.0, ans=0.0 2024-09-20 06:03:24,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=828540.0, ans=0.125 2024-09-20 06:03:35,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=828540.0, ans=0.0 2024-09-20 06:03:41,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=828580.0, ans=0.125 2024-09-20 06:03:45,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=828580.0, ans=0.125 2024-09-20 06:03:46,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=828580.0, ans=0.125 2024-09-20 06:03:48,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=828580.0, ans=0.2 2024-09-20 06:04:09,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff3.min_abs, batch_count=828660.0, ans=0.2 2024-09-20 06:04:22,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=828660.0, ans=0.0 2024-09-20 06:04:25,149 INFO [train.py:1198] (1/2) Epoch 46, batch 3550, loss[loss=0.234, ctc_loss=0.1034, cr_loss=0.34, attn_decoder_loss=0.2409, over 29714.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1091, cr_loss=0.349, attn_decoder_loss=0.2372, over 5782804.55 frames. ], batch size: 89, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 06:04:32,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=828700.0, ans=0.0 2024-09-20 06:04:38,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=828740.0, ans=0.125 2024-09-20 06:04:43,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=828740.0, ans=0.125 2024-09-20 06:05:12,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828820.0, ans=0.1 2024-09-20 06:05:22,293 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:05:26,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=828860.0, ans=0.125 2024-09-20 06:05:32,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.518e+01 9.140e+01 9.697e+01 1.857e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-20 06:05:39,446 INFO [train.py:1198] (1/2) Epoch 46, batch 3600, loss[loss=0.2263, ctc_loss=0.09783, cr_loss=0.3222, attn_decoder_loss=0.2334, over 29533.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1089, cr_loss=0.3485, attn_decoder_loss=0.2372, over 5791461.67 frames. ], batch size: 77, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:05:44,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.34 vs. limit=15.0 2024-09-20 06:05:59,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2024-09-20 06:06:00,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828940.0, ans=0.1 2024-09-20 06:06:13,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=828980.0, ans=0.125 2024-09-20 06:06:18,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-09-20 06:06:21,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=828980.0, ans=0.125 2024-09-20 06:06:24,489 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:06:53,551 INFO [train.py:1198] (1/2) Epoch 46, batch 3650, loss[loss=0.2497, ctc_loss=0.1201, cr_loss=0.3848, attn_decoder_loss=0.2556, over 29496.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1086, cr_loss=0.3479, attn_decoder_loss=0.2368, over 5793113.62 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:06:56,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=829100.0, ans=0.125 2024-09-20 06:07:05,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=829100.0, ans=0.0 2024-09-20 06:07:08,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=829140.0, ans=0.0 2024-09-20 06:07:11,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=12.0 2024-09-20 06:07:12,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.08 vs. limit=10.0 2024-09-20 06:07:20,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=829140.0, ans=0.125 2024-09-20 06:08:03,938 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.537e+01 9.087e+01 9.420e+01 1.458e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-20 06:08:11,549 INFO [train.py:1198] (1/2) Epoch 46, batch 3700, loss[loss=0.2453, ctc_loss=0.1161, cr_loss=0.3551, attn_decoder_loss=0.2518, over 29705.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1087, cr_loss=0.3481, attn_decoder_loss=0.237, over 5803603.98 frames. ], batch size: 84, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:08:16,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=829300.0, ans=0.2 2024-09-20 06:08:19,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=829300.0, ans=0.0 2024-09-20 06:08:21,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2024-09-20 06:08:32,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=829340.0, ans=0.025 2024-09-20 06:08:34,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=829340.0, ans=0.0 2024-09-20 06:08:43,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.37 vs. limit=15.0 2024-09-20 06:08:46,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2024-09-20 06:08:53,393 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:09:25,406 INFO [train.py:1198] (1/2) Epoch 46, batch 3750, loss[loss=0.2114, ctc_loss=0.0894, cr_loss=0.3008, attn_decoder_loss=0.2183, over 29370.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1085, cr_loss=0.3471, attn_decoder_loss=0.2367, over 5807840.50 frames. ], batch size: 67, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:09:39,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=829540.0, ans=0.025 2024-09-20 06:10:03,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=829580.0, ans=0.0 2024-09-20 06:10:21,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829620.0, ans=0.1 2024-09-20 06:10:27,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=829660.0, ans=0.0 2024-09-20 06:10:28,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=829660.0, ans=0.125 2024-09-20 06:10:32,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.638e+01 9.232e+01 9.625e+01 1.772e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-20 06:10:40,161 INFO [train.py:1198] (1/2) Epoch 46, batch 3800, loss[loss=0.235, ctc_loss=0.1029, cr_loss=0.3323, attn_decoder_loss=0.2423, over 29628.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1084, cr_loss=0.3462, attn_decoder_loss=0.2365, over 5797782.57 frames. ], batch size: 86, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:10:40,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=829700.0, ans=0.025 2024-09-20 06:10:47,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=829700.0, ans=0.0 2024-09-20 06:10:50,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-20 06:10:51,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.10 vs. limit=12.0 2024-09-20 06:11:14,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=829780.0, ans=0.125 2024-09-20 06:11:22,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=829780.0, ans=0.125 2024-09-20 06:11:54,494 INFO [train.py:1198] (1/2) Epoch 46, batch 3850, loss[loss=0.2412, ctc_loss=0.1061, cr_loss=0.3384, attn_decoder_loss=0.2487, over 29220.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.108, cr_loss=0.346, attn_decoder_loss=0.2363, over 5811805.36 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:12:07,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=829940.0, ans=0.125 2024-09-20 06:12:18,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=829940.0, ans=0.0 2024-09-20 06:12:31,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=829980.0, ans=0.125 2024-09-20 06:12:46,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=830020.0, ans=0.125 2024-09-20 06:12:48,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=830020.0, ans=0.125 2024-09-20 06:12:54,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=830060.0, ans=0.0 2024-09-20 06:12:58,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=830060.0, ans=0.125 2024-09-20 06:13:01,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.600e+01 9.111e+01 9.601e+01 1.529e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-20 06:13:08,418 INFO [train.py:1198] (1/2) Epoch 46, batch 3900, loss[loss=0.2451, ctc_loss=0.1143, cr_loss=0.3686, attn_decoder_loss=0.2515, over 29625.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1082, cr_loss=0.3465, attn_decoder_loss=0.2366, over 5815827.97 frames. ], batch size: 86, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:13:22,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=830100.0, ans=0.0 2024-09-20 06:13:52,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=830180.0, ans=0.125 2024-09-20 06:14:05,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.58 vs. limit=15.0 2024-09-20 06:14:13,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830260.0, ans=0.1 2024-09-20 06:14:20,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=830260.0, ans=0.125 2024-09-20 06:14:24,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=12.0 2024-09-20 06:14:25,246 INFO [train.py:1198] (1/2) Epoch 46, batch 3950, loss[loss=0.2481, ctc_loss=0.1233, cr_loss=0.3881, attn_decoder_loss=0.2533, over 29465.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1083, cr_loss=0.3472, attn_decoder_loss=0.2368, over 5835360.64 frames. ], batch size: 97, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:14:28,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=830300.0, ans=0.125 2024-09-20 06:14:34,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=830300.0, ans=0.0 2024-09-20 06:14:39,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=830340.0, ans=0.125 2024-09-20 06:14:57,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.75 vs. limit=6.0 2024-09-20 06:15:01,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-20 06:15:14,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=830420.0, ans=0.0 2024-09-20 06:15:31,666 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.756e+01 9.097e+01 9.656e+01 1.303e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-20 06:15:38,977 INFO [train.py:1198] (1/2) Epoch 46, batch 4000, loss[loss=0.2129, ctc_loss=0.0917, cr_loss=0.2992, attn_decoder_loss=0.2197, over 29515.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1088, cr_loss=0.348, attn_decoder_loss=0.2368, over 5812635.35 frames. ], batch size: 74, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 06:15:49,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=830500.0, ans=0.125 2024-09-20 06:15:58,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=830540.0, ans=0.0 2024-09-20 06:16:04,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=830540.0, ans=0.0 2024-09-20 06:16:11,847 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:16:31,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2024-09-20 06:16:47,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830660.0, ans=0.1 2024-09-20 06:16:52,990 INFO [train.py:1198] (1/2) Epoch 46, batch 4050, loss[loss=0.2479, ctc_loss=0.1309, cr_loss=0.3564, attn_decoder_loss=0.253, over 19999.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1085, cr_loss=0.3469, attn_decoder_loss=0.2368, over 5796932.01 frames. ], batch size: 209, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:16:53,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=830700.0, ans=0.125 2024-09-20 06:17:06,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=830740.0, ans=0.0 2024-09-20 06:17:06,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=830740.0, ans=22.5 2024-09-20 06:17:10,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830740.0, ans=0.1 2024-09-20 06:17:15,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830740.0, ans=0.1 2024-09-20 06:17:15,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=830740.0, ans=0.0 2024-09-20 06:17:19,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=830740.0, ans=0.125 2024-09-20 06:17:22,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=830780.0, ans=0.0 2024-09-20 06:17:28,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830780.0, ans=0.1 2024-09-20 06:17:54,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=830860.0, ans=0.125 2024-09-20 06:17:57,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830860.0, ans=0.1 2024-09-20 06:17:58,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=830860.0, ans=0.0 2024-09-20 06:18:01,508 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.777e+01 9.283e+01 1.003e+02 3.559e+02, threshold=1.857e+02, percent-clipped=2.0 2024-09-20 06:18:08,721 INFO [train.py:1198] (1/2) Epoch 46, batch 4100, loss[loss=0.2534, ctc_loss=0.1293, cr_loss=0.4074, attn_decoder_loss=0.2581, over 29492.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1088, cr_loss=0.3474, attn_decoder_loss=0.2371, over 5791303.63 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:18:09,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.48 vs. limit=15.0 2024-09-20 06:18:38,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=830980.0, ans=0.025 2024-09-20 06:18:38,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=830980.0, ans=0.025 2024-09-20 06:18:41,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=12.0 2024-09-20 06:18:48,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830980.0, ans=0.1 2024-09-20 06:18:50,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2024-09-20 06:19:21,756 INFO [train.py:1198] (1/2) Epoch 46, batch 4150, loss[loss=0.2277, ctc_loss=0.1023, cr_loss=0.3265, attn_decoder_loss=0.2344, over 29506.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1086, cr_loss=0.3469, attn_decoder_loss=0.2366, over 5797481.50 frames. ], batch size: 77, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:19:39,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=831140.0, ans=10.0 2024-09-20 06:19:48,524 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:19:54,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=831180.0, ans=0.125 2024-09-20 06:20:00,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=831180.0, ans=0.0 2024-09-20 06:20:29,880 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.815e+01 9.247e+01 9.794e+01 1.755e+02, threshold=1.849e+02, percent-clipped=0.0 2024-09-20 06:20:35,817 INFO [train.py:1198] (1/2) Epoch 46, batch 4200, loss[loss=0.256, ctc_loss=0.1397, cr_loss=0.4208, attn_decoder_loss=0.2595, over 29524.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1091, cr_loss=0.348, attn_decoder_loss=0.2373, over 5799159.27 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:20:40,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=831300.0, ans=0.125 2024-09-20 06:21:01,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=831340.0, ans=0.0 2024-09-20 06:21:02,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-09-20 06:21:14,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=831380.0, ans=0.125 2024-09-20 06:21:33,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2024-09-20 06:21:38,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=831460.0, ans=0.0 2024-09-20 06:21:49,878 INFO [train.py:1198] (1/2) Epoch 46, batch 4250, loss[loss=0.2144, ctc_loss=0.09456, cr_loss=0.3123, attn_decoder_loss=0.2208, over 29542.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1087, cr_loss=0.3471, attn_decoder_loss=0.2373, over 5806115.61 frames. ], batch size: 74, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:21:50,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-09-20 06:21:51,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=831500.0, ans=0.125 2024-09-20 06:21:53,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=831500.0, ans=0.125 2024-09-20 06:22:04,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2024-09-20 06:22:27,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=831580.0, ans=0.125 2024-09-20 06:22:32,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=831580.0, ans=0.2 2024-09-20 06:22:38,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=831620.0, ans=0.0 2024-09-20 06:22:58,495 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.665e+01 9.148e+01 1.004e+02 2.126e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-20 06:23:04,401 INFO [train.py:1198] (1/2) Epoch 46, batch 4300, loss[loss=0.2406, ctc_loss=0.1124, cr_loss=0.361, attn_decoder_loss=0.2468, over 29539.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1086, cr_loss=0.3466, attn_decoder_loss=0.2373, over 5795223.61 frames. ], batch size: 87, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:23:13,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=831700.0, ans=0.125 2024-09-20 06:23:22,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=831740.0, ans=0.125 2024-09-20 06:23:23,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=831740.0, ans=0.1 2024-09-20 06:23:41,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=831780.0, ans=0.1 2024-09-20 06:23:52,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2024-09-20 06:23:58,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2024-09-20 06:24:18,201 INFO [train.py:1198] (1/2) Epoch 46, batch 4350, loss[loss=0.2497, ctc_loss=0.1185, cr_loss=0.3779, attn_decoder_loss=0.2559, over 29499.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1107, cr_loss=0.3518, attn_decoder_loss=0.2402, over 5797918.49 frames. ], batch size: 97, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:24:36,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=22.5 2024-09-20 06:24:52,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=831980.0, ans=0.125 2024-09-20 06:25:11,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2024-09-20 06:25:20,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=832020.0, ans=0.0 2024-09-20 06:25:34,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2024-09-20 06:25:36,413 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.165e+01 9.019e+01 9.377e+01 9.847e+01 2.022e+02, threshold=1.875e+02, percent-clipped=1.0 2024-09-20 06:25:42,289 INFO [train.py:1198] (1/2) Epoch 46, batch 4400, loss[loss=0.2374, ctc_loss=0.1238, cr_loss=0.3792, attn_decoder_loss=0.2416, over 27347.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1122, cr_loss=0.3548, attn_decoder_loss=0.2423, over 5767617.90 frames. ], batch size: 124, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 06:25:44,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=832100.0, ans=0.125 2024-09-20 06:25:56,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=832140.0, ans=0.125 2024-09-20 06:26:01,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=832140.0, ans=0.0 2024-09-20 06:26:13,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=832180.0, ans=0.2 2024-09-20 06:26:17,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.29 vs. limit=22.5 2024-09-20 06:26:18,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=832180.0, ans=0.125 2024-09-20 06:26:29,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=832220.0, ans=0.125 2024-09-20 06:26:32,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=832220.0, ans=0.0 2024-09-20 06:26:40,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=832260.0, ans=0.025 2024-09-20 06:26:54,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=832300.0, ans=0.0 2024-09-20 06:26:55,246 INFO [train.py:1198] (1/2) Epoch 46, batch 4450, loss[loss=0.2518, ctc_loss=0.1339, cr_loss=0.3944, attn_decoder_loss=0.2562, over 20857.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1154, cr_loss=0.3599, attn_decoder_loss=0.2441, over 5576503.56 frames. ], batch size: 210, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:26:59,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.35 vs. limit=10.0 2024-09-20 06:27:09,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=832340.0, ans=0.125 2024-09-20 06:27:15,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832340.0, ans=0.1 2024-09-20 06:27:27,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832380.0, ans=0.1 2024-09-20 06:27:30,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=832380.0, ans=0.0 2024-09-20 06:27:53,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=832420.0, ans=0.125 2024-09-20 06:27:54,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=832460.0, ans=0.125 2024-09-20 06:28:06,278 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.388e+01 1.015e+02 1.122e+02 1.210e+02 5.487e+02, threshold=2.243e+02, percent-clipped=3.0 2024-09-20 06:28:06,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=832460.0, ans=0.2 2024-09-20 06:28:10,693 INFO [train.py:1198] (1/2) Epoch 46, batch 4500, loss[loss=0.2534, ctc_loss=0.1356, cr_loss=0.3712, attn_decoder_loss=0.2582, over 19795.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1182, cr_loss=0.3622, attn_decoder_loss=0.2457, over 5235657.36 frames. ], batch size: 209, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:28:12,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=832500.0, ans=0.125 2024-09-20 06:28:42,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=832580.0, ans=0.025 2024-09-20 06:29:38,361 INFO [train.py:1198] (1/2) Epoch 47, batch 0, loss[loss=0.2163, ctc_loss=0.09544, cr_loss=0.3191, attn_decoder_loss=0.2226, over 29573.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.09544, cr_loss=0.3191, attn_decoder_loss=0.2226, over 29573.00 frames. ], batch size: 73, lr: 2.35e-03, grad_scale: 32.0 2024-09-20 06:29:38,361 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 06:29:56,723 INFO [train.py:1230] (1/2) Epoch 47, validation: loss=0.2131, ctc_loss=0.03582, cr_loss=6.765e-15, attn_decoder_loss=0.2328, over 944034.00 frames. 2024-09-20 06:29:56,724 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 06:29:58,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=832600.0, ans=0.2 2024-09-20 06:30:04,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832600.0, ans=0.1 2024-09-20 06:30:07,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=832600.0, ans=0.0 2024-09-20 06:30:13,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=832640.0, ans=0.0 2024-09-20 06:30:28,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=832680.0, ans=0.125 2024-09-20 06:30:37,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2024-09-20 06:30:45,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=832720.0, ans=0.125 2024-09-20 06:30:47,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2024-09-20 06:30:51,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=832720.0, ans=0.0 2024-09-20 06:31:12,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=832800.0, ans=0.5 2024-09-20 06:31:14,203 INFO [train.py:1198] (1/2) Epoch 47, batch 50, loss[loss=0.2077, ctc_loss=0.08875, cr_loss=0.2925, attn_decoder_loss=0.2144, over 29425.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1111, cr_loss=0.3524, attn_decoder_loss=0.238, over 1268515.67 frames. ], batch size: 70, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:31:16,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=832800.0, ans=0.125 2024-09-20 06:31:26,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=832800.0, ans=0.125 2024-09-20 06:31:31,158 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:31:34,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.01 vs. limit=6.0 2024-09-20 06:31:39,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-09-20 06:31:48,929 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.892e+01 9.712e+01 1.150e+02 2.007e+02, threshold=1.942e+02, percent-clipped=0.0 2024-09-20 06:31:56,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=832880.0, ans=0.2 2024-09-20 06:32:01,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=832920.0, ans=0.0 2024-09-20 06:32:04,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=832920.0, ans=0.125 2024-09-20 06:32:07,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=832920.0, ans=0.035 2024-09-20 06:32:08,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2024-09-20 06:32:09,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=832920.0, ans=0.0 2024-09-20 06:32:10,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832920.0, ans=0.1 2024-09-20 06:32:10,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.08 vs. limit=22.5 2024-09-20 06:32:14,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=832960.0, ans=0.025 2024-09-20 06:32:29,708 INFO [train.py:1198] (1/2) Epoch 47, batch 100, loss[loss=0.2241, ctc_loss=0.108, cr_loss=0.3507, attn_decoder_loss=0.2292, over 29513.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1112, cr_loss=0.3533, attn_decoder_loss=0.24, over 2250716.52 frames. ], batch size: 76, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:32:51,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=833040.0, ans=0.125 2024-09-20 06:32:52,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=833040.0, ans=0.125 2024-09-20 06:32:53,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.96 vs. limit=15.0 2024-09-20 06:33:43,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-09-20 06:33:45,860 INFO [train.py:1198] (1/2) Epoch 47, batch 150, loss[loss=0.198, ctc_loss=0.08195, cr_loss=0.2834, attn_decoder_loss=0.2046, over 29441.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1096, cr_loss=0.3501, attn_decoder_loss=0.2382, over 3045973.07 frames. ], batch size: 70, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:33:55,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-09-20 06:34:05,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=833240.0, ans=0.025 2024-09-20 06:34:11,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=833240.0, ans=0.125 2024-09-20 06:34:22,973 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 8.579e+01 9.254e+01 9.598e+01 1.367e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-20 06:34:42,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=833320.0, ans=0.05 2024-09-20 06:34:55,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=12.0 2024-09-20 06:34:56,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=833360.0, ans=0.0 2024-09-20 06:35:00,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=833360.0, ans=0.125 2024-09-20 06:35:02,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=833400.0, ans=0.125 2024-09-20 06:35:03,356 INFO [train.py:1198] (1/2) Epoch 47, batch 200, loss[loss=0.2529, ctc_loss=0.1283, cr_loss=0.4022, attn_decoder_loss=0.2578, over 27360.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1088, cr_loss=0.3495, attn_decoder_loss=0.237, over 3659968.21 frames. ], batch size: 124, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:35:12,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833400.0, ans=0.1 2024-09-20 06:35:42,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=833480.0, ans=10.0 2024-09-20 06:35:48,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=833520.0, ans=0.125 2024-09-20 06:36:19,047 INFO [train.py:1198] (1/2) Epoch 47, batch 250, loss[loss=0.2438, ctc_loss=0.1159, cr_loss=0.3487, attn_decoder_loss=0.2503, over 29299.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1087, cr_loss=0.3487, attn_decoder_loss=0.2369, over 4141684.11 frames. ], batch size: 100, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:36:28,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=833600.0, ans=0.125 2024-09-20 06:36:29,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=833600.0, ans=0.0 2024-09-20 06:36:48,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=833640.0, ans=0.125 2024-09-20 06:36:51,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=833680.0, ans=0.0 2024-09-20 06:36:55,949 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.644e+01 9.308e+01 9.912e+01 1.990e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-20 06:37:06,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=833720.0, ans=0.125 2024-09-20 06:37:12,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=833720.0, ans=0.125 2024-09-20 06:37:12,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=833720.0, ans=0.0 2024-09-20 06:37:12,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=833720.0, ans=0.125 2024-09-20 06:37:27,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833760.0, ans=0.1 2024-09-20 06:37:36,483 INFO [train.py:1198] (1/2) Epoch 47, batch 300, loss[loss=0.2488, ctc_loss=0.1191, cr_loss=0.3729, attn_decoder_loss=0.2549, over 29517.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1086, cr_loss=0.3477, attn_decoder_loss=0.2366, over 4510830.44 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:38:05,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=833880.0, ans=0.0 2024-09-20 06:38:28,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=833920.0, ans=0.125 2024-09-20 06:38:45,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=833960.0, ans=0.0 2024-09-20 06:38:49,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=833960.0, ans=0.2 2024-09-20 06:38:50,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2024-09-20 06:38:54,104 INFO [train.py:1198] (1/2) Epoch 47, batch 350, loss[loss=0.2044, ctc_loss=0.08448, cr_loss=0.2985, attn_decoder_loss=0.2111, over 29320.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1089, cr_loss=0.3489, attn_decoder_loss=0.2372, over 4795839.69 frames. ], batch size: 71, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:38:57,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.42 vs. limit=15.0 2024-09-20 06:39:00,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=834000.0, ans=0.09899494936611666 2024-09-20 06:39:11,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=834040.0, ans=0.125 2024-09-20 06:39:12,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.50 vs. limit=22.5 2024-09-20 06:39:28,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.657e+01 9.081e+01 9.524e+01 1.810e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-20 06:39:32,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2024-09-20 06:39:58,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=834160.0, ans=0.0 2024-09-20 06:40:08,858 INFO [train.py:1198] (1/2) Epoch 47, batch 400, loss[loss=0.2372, ctc_loss=0.1081, cr_loss=0.3336, attn_decoder_loss=0.2441, over 29698.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1087, cr_loss=0.3478, attn_decoder_loss=0.237, over 5026323.41 frames. ], batch size: 82, lr: 2.35e-03, grad_scale: 32.0 2024-09-20 06:40:12,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=834200.0, ans=0.125 2024-09-20 06:40:14,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2024-09-20 06:40:32,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=834240.0, ans=0.125 2024-09-20 06:40:35,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=834240.0, ans=0.125 2024-09-20 06:40:48,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=834280.0, ans=0.0 2024-09-20 06:41:12,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=834360.0, ans=0.125 2024-09-20 06:41:12,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=834360.0, ans=0.0 2024-09-20 06:41:15,291 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:41:27,380 INFO [train.py:1198] (1/2) Epoch 47, batch 450, loss[loss=0.2342, ctc_loss=0.1077, cr_loss=0.3472, attn_decoder_loss=0.2406, over 29710.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1087, cr_loss=0.3481, attn_decoder_loss=0.2373, over 5190305.19 frames. ], batch size: 83, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:41:42,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834440.0, ans=0.1 2024-09-20 06:41:53,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=834440.0, ans=0.0 2024-09-20 06:41:54,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=834440.0, ans=0.04949747468305833 2024-09-20 06:41:54,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=834440.0, ans=0.025 2024-09-20 06:42:02,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=834480.0, ans=0.0 2024-09-20 06:42:03,669 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.609e+01 9.172e+01 9.678e+01 2.074e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-20 06:42:16,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=834520.0, ans=0.125 2024-09-20 06:42:32,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=834560.0, ans=0.09899494936611666 2024-09-20 06:42:45,360 INFO [train.py:1198] (1/2) Epoch 47, batch 500, loss[loss=0.25, ctc_loss=0.1205, cr_loss=0.3752, attn_decoder_loss=0.2561, over 29464.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1081, cr_loss=0.3469, attn_decoder_loss=0.2365, over 5333240.99 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:42:53,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=834600.0, ans=0.125 2024-09-20 06:42:59,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=834640.0, ans=0.125 2024-09-20 06:43:11,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=834640.0, ans=0.0 2024-09-20 06:43:37,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.83 vs. limit=12.0 2024-09-20 06:43:46,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=834760.0, ans=15.0 2024-09-20 06:43:56,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.40 vs. limit=22.5 2024-09-20 06:44:00,791 INFO [train.py:1198] (1/2) Epoch 47, batch 550, loss[loss=0.2449, ctc_loss=0.1152, cr_loss=0.3672, attn_decoder_loss=0.2511, over 28831.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.108, cr_loss=0.3459, attn_decoder_loss=0.2363, over 5423609.92 frames. ], batch size: 104, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:44:18,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=834840.0, ans=0.1 2024-09-20 06:44:38,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=834880.0, ans=0.2 2024-09-20 06:44:39,385 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 8.688e+01 9.011e+01 9.708e+01 1.487e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-20 06:45:04,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2024-09-20 06:45:07,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.21 vs. limit=22.5 2024-09-20 06:45:18,979 INFO [train.py:1198] (1/2) Epoch 47, batch 600, loss[loss=0.2484, ctc_loss=0.1257, cr_loss=0.3871, attn_decoder_loss=0.2535, over 29230.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1084, cr_loss=0.3471, attn_decoder_loss=0.2369, over 5510235.99 frames. ], batch size: 100, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:45:20,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-09-20 06:45:35,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=835040.0, ans=0.125 2024-09-20 06:45:48,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=22.5 2024-09-20 06:46:02,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=835120.0, ans=10.0 2024-09-20 06:46:04,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=835120.0, ans=10.0 2024-09-20 06:46:22,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=835160.0, ans=0.125 2024-09-20 06:46:36,765 INFO [train.py:1198] (1/2) Epoch 47, batch 650, loss[loss=0.233, ctc_loss=0.1045, cr_loss=0.3493, attn_decoder_loss=0.2396, over 29769.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1077, cr_loss=0.3459, attn_decoder_loss=0.2361, over 5586914.79 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:47:13,109 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.697e+01 9.175e+01 9.724e+01 1.599e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-20 06:47:13,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=835280.0, ans=0.125 2024-09-20 06:47:13,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=835280.0, ans=0.0 2024-09-20 06:47:21,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835320.0, ans=0.1 2024-09-20 06:47:23,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=835320.0, ans=0.125 2024-09-20 06:47:28,407 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:47:29,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=835320.0, ans=0.125 2024-09-20 06:47:52,257 INFO [train.py:1198] (1/2) Epoch 47, batch 700, loss[loss=0.231, ctc_loss=0.1108, cr_loss=0.3502, attn_decoder_loss=0.2365, over 29527.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1082, cr_loss=0.3465, attn_decoder_loss=0.2366, over 5637194.93 frames. ], batch size: 76, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:48:02,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=835400.0, ans=0.0 2024-09-20 06:48:02,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=835400.0, ans=0.0 2024-09-20 06:48:26,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.53 vs. limit=22.5 2024-09-20 06:48:29,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=835480.0, ans=0.125 2024-09-20 06:48:29,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=835480.0, ans=0.125 2024-09-20 06:48:47,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=835520.0, ans=0.2 2024-09-20 06:48:56,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=835560.0, ans=0.125 2024-09-20 06:49:00,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2024-09-20 06:49:09,602 INFO [train.py:1198] (1/2) Epoch 47, batch 750, loss[loss=0.2404, ctc_loss=0.1116, cr_loss=0.366, attn_decoder_loss=0.2466, over 29674.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1081, cr_loss=0.3458, attn_decoder_loss=0.2365, over 5675717.48 frames. ], batch size: 82, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:49:11,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=835600.0, ans=0.125 2024-09-20 06:49:14,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=835600.0, ans=0.125 2024-09-20 06:49:17,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=835600.0, ans=0.1 2024-09-20 06:49:27,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=22.5 2024-09-20 06:49:29,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=835640.0, ans=0.125 2024-09-20 06:49:42,617 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:49:45,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.705e+01 9.081e+01 9.642e+01 1.954e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-20 06:50:06,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=835720.0, ans=0.2 2024-09-20 06:50:15,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-20 06:50:24,784 INFO [train.py:1198] (1/2) Epoch 47, batch 800, loss[loss=0.2047, ctc_loss=0.07926, cr_loss=0.2793, attn_decoder_loss=0.2125, over 29590.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.108, cr_loss=0.3459, attn_decoder_loss=0.2365, over 5706752.51 frames. ], batch size: 73, lr: 2.35e-03, grad_scale: 32.0 2024-09-20 06:50:31,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=835800.0, ans=0.0 2024-09-20 06:51:03,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=835880.0, ans=0.0 2024-09-20 06:51:07,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=835880.0, ans=0.0 2024-09-20 06:51:30,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=835960.0, ans=0.95 2024-09-20 06:51:37,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=835960.0, ans=0.125 2024-09-20 06:51:39,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=12.0 2024-09-20 06:51:42,575 INFO [train.py:1198] (1/2) Epoch 47, batch 850, loss[loss=0.2417, ctc_loss=0.1105, cr_loss=0.3467, attn_decoder_loss=0.2486, over 29709.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1078, cr_loss=0.3455, attn_decoder_loss=0.2364, over 5736275.39 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:51:45,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=836000.0, ans=0.0 2024-09-20 06:52:05,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-09-20 06:52:10,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=836040.0, ans=22.5 2024-09-20 06:52:22,245 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.624e+01 9.106e+01 9.735e+01 2.135e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-20 06:52:59,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=836200.0, ans=0.125 2024-09-20 06:53:00,275 INFO [train.py:1198] (1/2) Epoch 47, batch 900, loss[loss=0.2041, ctc_loss=0.08633, cr_loss=0.301, attn_decoder_loss=0.2105, over 29618.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1079, cr_loss=0.3459, attn_decoder_loss=0.2366, over 5741300.30 frames. ], batch size: 73, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:53:10,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=836200.0, ans=0.125 2024-09-20 06:53:20,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=836240.0, ans=0.125 2024-09-20 06:53:20,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2024-09-20 06:53:48,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=836320.0, ans=0.0 2024-09-20 06:54:01,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=836360.0, ans=0.125 2024-09-20 06:54:03,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=836360.0, ans=0.0 2024-09-20 06:54:03,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=836360.0, ans=0.2 2024-09-20 06:54:09,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=836360.0, ans=0.1 2024-09-20 06:54:09,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=836360.0, ans=0.125 2024-09-20 06:54:15,111 INFO [train.py:1198] (1/2) Epoch 47, batch 950, loss[loss=0.2172, ctc_loss=0.09891, cr_loss=0.3287, attn_decoder_loss=0.223, over 29518.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1079, cr_loss=0.3458, attn_decoder_loss=0.2367, over 5742430.88 frames. ], batch size: 74, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:54:29,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=836400.0, ans=0.125 2024-09-20 06:54:31,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.47 vs. limit=22.5 2024-09-20 06:54:37,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=836440.0, ans=0.0 2024-09-20 06:54:49,210 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:54:52,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=836480.0, ans=10.0 2024-09-20 06:54:56,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.702e+01 9.182e+01 9.933e+01 3.090e+02, threshold=1.836e+02, percent-clipped=1.0 2024-09-20 06:55:32,625 INFO [train.py:1198] (1/2) Epoch 47, batch 1000, loss[loss=0.2237, ctc_loss=0.09996, cr_loss=0.334, attn_decoder_loss=0.23, over 29499.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1087, cr_loss=0.3476, attn_decoder_loss=0.2375, over 5735647.85 frames. ], batch size: 77, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:55:50,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=836640.0, ans=0.125 2024-09-20 06:55:51,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=836640.0, ans=0.0 2024-09-20 06:55:51,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=836640.0, ans=0.0 2024-09-20 06:56:02,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=836640.0, ans=0.05 2024-09-20 06:56:05,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-20 06:56:06,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2024-09-20 06:56:30,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=836720.0, ans=0.125 2024-09-20 06:56:41,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=836760.0, ans=0.125 2024-09-20 06:56:45,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2024-09-20 06:56:47,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2024-09-20 06:56:50,187 INFO [train.py:1198] (1/2) Epoch 47, batch 1050, loss[loss=0.232, ctc_loss=0.1063, cr_loss=0.3383, attn_decoder_loss=0.2385, over 29657.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1082, cr_loss=0.3467, attn_decoder_loss=0.2367, over 5745339.13 frames. ], batch size: 85, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:57:05,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=836840.0, ans=0.125 2024-09-20 06:57:08,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=836840.0, ans=0.0 2024-09-20 06:57:11,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-09-20 06:57:19,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=836880.0, ans=0.0 2024-09-20 06:57:29,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 8.620e+01 9.050e+01 9.577e+01 1.323e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-20 06:57:30,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=836880.0, ans=0.125 2024-09-20 06:57:32,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-20 06:57:39,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=836920.0, ans=0.0 2024-09-20 06:57:42,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=836920.0, ans=0.0 2024-09-20 06:58:05,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2024-09-20 06:58:05,800 INFO [train.py:1198] (1/2) Epoch 47, batch 1100, loss[loss=0.227, ctc_loss=0.1078, cr_loss=0.3398, attn_decoder_loss=0.2327, over 29463.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1081, cr_loss=0.346, attn_decoder_loss=0.2364, over 5756907.47 frames. ], batch size: 78, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:58:44,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=837080.0, ans=0.0 2024-09-20 06:59:13,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837160.0, ans=0.1 2024-09-20 06:59:16,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2024-09-20 06:59:17,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=837160.0, ans=0.5 2024-09-20 06:59:23,431 INFO [train.py:1198] (1/2) Epoch 47, batch 1150, loss[loss=0.2289, ctc_loss=0.1145, cr_loss=0.3569, attn_decoder_loss=0.2336, over 29441.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1082, cr_loss=0.346, attn_decoder_loss=0.2365, over 5754214.53 frames. ], batch size: 78, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 07:00:04,972 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.402e+01 9.011e+01 9.514e+01 2.556e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 07:00:26,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.02 vs. limit=15.0 2024-09-20 07:00:30,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=837360.0, ans=0.2 2024-09-20 07:00:40,659 INFO [train.py:1198] (1/2) Epoch 47, batch 1200, loss[loss=0.2427, ctc_loss=0.1123, cr_loss=0.3592, attn_decoder_loss=0.2492, over 29674.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1083, cr_loss=0.3465, attn_decoder_loss=0.2373, over 5746856.94 frames. ], batch size: 85, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 07:00:48,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=837400.0, ans=0.025 2024-09-20 07:01:33,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=837520.0, ans=0.125 2024-09-20 07:01:48,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=837560.0, ans=0.05 2024-09-20 07:01:56,179 INFO [train.py:1198] (1/2) Epoch 47, batch 1250, loss[loss=0.2471, ctc_loss=0.1204, cr_loss=0.3626, attn_decoder_loss=0.2531, over 29517.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1086, cr_loss=0.3475, attn_decoder_loss=0.2378, over 5774631.55 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 07:02:09,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=837600.0, ans=0.0 2024-09-20 07:02:24,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=837640.0, ans=0.0 2024-09-20 07:02:30,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=837680.0, ans=0.07 2024-09-20 07:02:36,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.63 vs. limit=10.0 2024-09-20 07:02:37,736 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.615e+01 9.145e+01 9.650e+01 1.333e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-20 07:03:13,893 INFO [train.py:1198] (1/2) Epoch 47, batch 1300, loss[loss=0.2385, ctc_loss=0.108, cr_loss=0.3388, attn_decoder_loss=0.2455, over 28150.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.108, cr_loss=0.3462, attn_decoder_loss=0.2371, over 5777702.82 frames. ], batch size: 111, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:03:17,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=837800.0, ans=0.0 2024-09-20 07:03:20,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=837800.0, ans=0.09899494936611666 2024-09-20 07:03:26,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=837800.0, ans=0.0 2024-09-20 07:03:33,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=837840.0, ans=0.125 2024-09-20 07:04:12,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=837920.0, ans=0.125 2024-09-20 07:04:30,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=838000.0, ans=10.0 2024-09-20 07:04:31,982 INFO [train.py:1198] (1/2) Epoch 47, batch 1350, loss[loss=0.2341, ctc_loss=0.1104, cr_loss=0.3488, attn_decoder_loss=0.2401, over 29756.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1077, cr_loss=0.3459, attn_decoder_loss=0.2369, over 5795627.16 frames. ], batch size: 81, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:04:42,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=838000.0, ans=0.125 2024-09-20 07:05:04,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=838080.0, ans=0.125 2024-09-20 07:05:04,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=838080.0, ans=0.0 2024-09-20 07:05:09,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.91 vs. limit=12.0 2024-09-20 07:05:10,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.374e+01 8.876e+01 9.629e+01 1.227e+02, threshold=1.775e+02, percent-clipped=0.0 2024-09-20 07:05:36,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=838160.0, ans=0.0 2024-09-20 07:05:46,448 INFO [train.py:1198] (1/2) Epoch 47, batch 1400, loss[loss=0.214, ctc_loss=0.09901, cr_loss=0.3121, attn_decoder_loss=0.2198, over 29564.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1079, cr_loss=0.3463, attn_decoder_loss=0.2368, over 5806366.81 frames. ], batch size: 69, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:05:50,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2024-09-20 07:05:54,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=838200.0, ans=0.1 2024-09-20 07:07:03,987 INFO [train.py:1198] (1/2) Epoch 47, batch 1450, loss[loss=0.2497, ctc_loss=0.125, cr_loss=0.3956, attn_decoder_loss=0.2548, over 29462.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.108, cr_loss=0.3462, attn_decoder_loss=0.2371, over 5803532.26 frames. ], batch size: 94, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:07:45,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.627e+01 9.137e+01 9.746e+01 6.249e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-20 07:07:46,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2024-09-20 07:07:49,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=838520.0, ans=0.07 2024-09-20 07:07:52,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=838520.0, ans=0.0 2024-09-20 07:07:54,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=838520.0, ans=0.125 2024-09-20 07:08:03,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=12.0 2024-09-20 07:08:05,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=838560.0, ans=0.0 2024-09-20 07:08:08,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=838560.0, ans=0.125 2024-09-20 07:08:16,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=838560.0, ans=0.2 2024-09-20 07:08:20,887 INFO [train.py:1198] (1/2) Epoch 47, batch 1500, loss[loss=0.2455, ctc_loss=0.115, cr_loss=0.3698, attn_decoder_loss=0.2518, over 29625.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1081, cr_loss=0.3468, attn_decoder_loss=0.2372, over 5805456.49 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:08:21,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=838600.0, ans=0.0 2024-09-20 07:08:31,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=838600.0, ans=0.2 2024-09-20 07:08:40,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2024-09-20 07:08:40,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=838640.0, ans=0.1 2024-09-20 07:09:05,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=12.0 2024-09-20 07:09:08,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=838720.0, ans=0.125 2024-09-20 07:09:24,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=838760.0, ans=0.125 2024-09-20 07:09:30,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838760.0, ans=0.1 2024-09-20 07:09:36,604 INFO [train.py:1198] (1/2) Epoch 47, batch 1550, loss[loss=0.2523, ctc_loss=0.1284, cr_loss=0.4062, attn_decoder_loss=0.2571, over 29486.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1085, cr_loss=0.347, attn_decoder_loss=0.2372, over 5781114.62 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:09:38,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=838800.0, ans=0.5 2024-09-20 07:10:17,555 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.567e+01 9.122e+01 9.785e+01 2.024e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-20 07:10:17,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=838880.0, ans=0.1 2024-09-20 07:10:38,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-09-20 07:10:49,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838960.0, ans=0.1 2024-09-20 07:10:53,713 INFO [train.py:1198] (1/2) Epoch 47, batch 1600, loss[loss=0.2333, ctc_loss=0.1032, cr_loss=0.3402, attn_decoder_loss=0.2402, over 29665.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1085, cr_loss=0.3468, attn_decoder_loss=0.237, over 5763991.85 frames. ], batch size: 85, lr: 2.34e-03, grad_scale: 32.0 2024-09-20 07:11:35,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=839080.0, ans=0.05 2024-09-20 07:11:38,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=839080.0, ans=0.0 2024-09-20 07:11:50,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=839120.0, ans=0.125 2024-09-20 07:11:53,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=839120.0, ans=0.1 2024-09-20 07:11:56,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=839160.0, ans=0.0 2024-09-20 07:12:08,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=839160.0, ans=0.125 2024-09-20 07:12:11,183 INFO [train.py:1198] (1/2) Epoch 47, batch 1650, loss[loss=0.2323, ctc_loss=0.1024, cr_loss=0.3277, attn_decoder_loss=0.2395, over 29720.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1083, cr_loss=0.3461, attn_decoder_loss=0.2368, over 5759378.56 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:12:16,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839200.0, ans=0.1 2024-09-20 07:12:47,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=839280.0, ans=0.0 2024-09-20 07:12:52,012 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.634e+01 9.046e+01 9.641e+01 2.969e+02, threshold=1.809e+02, percent-clipped=2.0 2024-09-20 07:12:56,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=839320.0, ans=0.2 2024-09-20 07:12:58,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=839320.0, ans=0.125 2024-09-20 07:13:01,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=839320.0, ans=0.1 2024-09-20 07:13:02,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=839320.0, ans=0.125 2024-09-20 07:13:26,658 INFO [train.py:1198] (1/2) Epoch 47, batch 1700, loss[loss=0.2112, ctc_loss=0.09415, cr_loss=0.3306, attn_decoder_loss=0.2168, over 29551.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1079, cr_loss=0.3458, attn_decoder_loss=0.2367, over 5780806.44 frames. ], batch size: 69, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:13:31,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=839400.0, ans=0.125 2024-09-20 07:13:37,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=839400.0, ans=0.125 2024-09-20 07:13:43,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839440.0, ans=0.1 2024-09-20 07:13:46,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=839440.0, ans=0.125 2024-09-20 07:14:21,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-09-20 07:14:34,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=839560.0, ans=0.0 2024-09-20 07:14:43,705 INFO [train.py:1198] (1/2) Epoch 47, batch 1750, loss[loss=0.2155, ctc_loss=0.09914, cr_loss=0.3288, attn_decoder_loss=0.2211, over 29354.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.108, cr_loss=0.3461, attn_decoder_loss=0.2367, over 5788380.40 frames. ], batch size: 67, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:14:45,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=839600.0, ans=0.125 2024-09-20 07:14:46,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=839600.0, ans=0.07 2024-09-20 07:14:50,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=839600.0, ans=0.2 2024-09-20 07:15:25,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=839680.0, ans=0.125 2024-09-20 07:15:26,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.597e+01 9.141e+01 9.828e+01 1.386e+02, threshold=1.828e+02, percent-clipped=0.0 2024-09-20 07:15:27,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2024-09-20 07:15:31,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=839720.0, ans=0.0 2024-09-20 07:15:46,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=839760.0, ans=0.2 2024-09-20 07:16:00,711 INFO [train.py:1198] (1/2) Epoch 47, batch 1800, loss[loss=0.2296, ctc_loss=0.1023, cr_loss=0.3327, attn_decoder_loss=0.2364, over 29685.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1079, cr_loss=0.3462, attn_decoder_loss=0.2369, over 5790598.17 frames. ], batch size: 83, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:16:10,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=839800.0, ans=0.0 2024-09-20 07:16:21,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-09-20 07:16:22,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=839840.0, ans=0.1 2024-09-20 07:16:44,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=839920.0, ans=0.025 2024-09-20 07:16:50,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=839920.0, ans=0.0 2024-09-20 07:16:52,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=839920.0, ans=0.125 2024-09-20 07:17:16,945 INFO [train.py:1198] (1/2) Epoch 47, batch 1850, loss[loss=0.2388, ctc_loss=0.1093, cr_loss=0.3339, attn_decoder_loss=0.2457, over 29616.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1079, cr_loss=0.3464, attn_decoder_loss=0.2368, over 5796729.90 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:17:21,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.16 vs. limit=15.0 2024-09-20 07:17:23,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=840000.0, ans=0.0 2024-09-20 07:17:39,834 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:18:01,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.631e+01 9.188e+01 9.651e+01 1.430e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-20 07:18:01,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=840080.0, ans=0.125 2024-09-20 07:18:09,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2024-09-20 07:18:34,071 INFO [train.py:1198] (1/2) Epoch 47, batch 1900, loss[loss=0.2425, ctc_loss=0.114, cr_loss=0.3664, attn_decoder_loss=0.2486, over 29699.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1082, cr_loss=0.3473, attn_decoder_loss=0.2373, over 5804423.29 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:18:40,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=840200.0, ans=0.0 2024-09-20 07:18:50,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=840240.0, ans=0.125 2024-09-20 07:18:54,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=840240.0, ans=0.0 2024-09-20 07:18:54,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=22.5 2024-09-20 07:19:00,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.76 vs. limit=10.0 2024-09-20 07:19:11,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=840280.0, ans=0.025 2024-09-20 07:19:21,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=840320.0, ans=0.09899494936611666 2024-09-20 07:19:30,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=840320.0, ans=0.025 2024-09-20 07:19:34,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.33 vs. limit=10.0 2024-09-20 07:19:39,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=840360.0, ans=0.125 2024-09-20 07:19:48,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=840360.0, ans=0.025 2024-09-20 07:19:51,435 INFO [train.py:1198] (1/2) Epoch 47, batch 1950, loss[loss=0.223, ctc_loss=0.0948, cr_loss=0.3345, attn_decoder_loss=0.2298, over 29436.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1087, cr_loss=0.3481, attn_decoder_loss=0.2382, over 5818979.00 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:20:05,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=840440.0, ans=0.0 2024-09-20 07:20:33,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.893e+01 9.385e+01 9.948e+01 2.061e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-20 07:20:53,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=840560.0, ans=0.0 2024-09-20 07:20:57,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=840560.0, ans=0.125 2024-09-20 07:21:06,476 INFO [train.py:1198] (1/2) Epoch 47, batch 2000, loss[loss=0.1948, ctc_loss=0.07885, cr_loss=0.2782, attn_decoder_loss=0.2015, over 29317.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1092, cr_loss=0.3489, attn_decoder_loss=0.2387, over 5796573.89 frames. ], batch size: 67, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:21:25,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=840640.0, ans=0.0 2024-09-20 07:21:38,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=840680.0, ans=0.125 2024-09-20 07:21:54,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=840720.0, ans=0.2 2024-09-20 07:22:17,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-20 07:22:17,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2024-09-20 07:22:18,153 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:22:18,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.11 vs. limit=15.0 2024-09-20 07:22:24,171 INFO [train.py:1198] (1/2) Epoch 47, batch 2050, loss[loss=0.1998, ctc_loss=0.08284, cr_loss=0.3003, attn_decoder_loss=0.2061, over 29407.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1086, cr_loss=0.348, attn_decoder_loss=0.2375, over 5788815.04 frames. ], batch size: 70, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:22:25,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=840800.0, ans=0.125 2024-09-20 07:22:27,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=840800.0, ans=0.025 2024-09-20 07:22:43,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2024-09-20 07:22:53,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=840840.0, ans=0.0 2024-09-20 07:22:57,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2024-09-20 07:22:57,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=840880.0, ans=0.0 2024-09-20 07:23:09,566 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.643e+01 9.120e+01 9.477e+01 1.642e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-20 07:23:21,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=840920.0, ans=0.2 2024-09-20 07:23:40,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-09-20 07:23:41,217 INFO [train.py:1198] (1/2) Epoch 47, batch 2100, loss[loss=0.2224, ctc_loss=0.09714, cr_loss=0.3258, attn_decoder_loss=0.2291, over 29738.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1083, cr_loss=0.3473, attn_decoder_loss=0.237, over 5800471.34 frames. ], batch size: 81, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:23:53,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=841000.0, ans=0.125 2024-09-20 07:24:01,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=841040.0, ans=0.125 2024-09-20 07:24:05,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=841040.0, ans=0.125 2024-09-20 07:24:07,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841040.0, ans=0.1 2024-09-20 07:24:11,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=841080.0, ans=0.0 2024-09-20 07:24:34,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=841120.0, ans=0.025 2024-09-20 07:24:44,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=841160.0, ans=0.125 2024-09-20 07:24:50,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=841160.0, ans=0.0 2024-09-20 07:24:54,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-20 07:24:56,333 INFO [train.py:1198] (1/2) Epoch 47, batch 2150, loss[loss=0.2243, ctc_loss=0.1095, cr_loss=0.3575, attn_decoder_loss=0.2291, over 29452.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1082, cr_loss=0.3472, attn_decoder_loss=0.2366, over 5815244.90 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:24:58,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2024-09-20 07:25:04,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=841200.0, ans=0.125 2024-09-20 07:25:31,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=841280.0, ans=0.025 2024-09-20 07:25:39,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.572e+01 9.031e+01 9.738e+01 1.571e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-20 07:25:47,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=841320.0, ans=0.04949747468305833 2024-09-20 07:26:01,985 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:26:03,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=841360.0, ans=0.0 2024-09-20 07:26:13,831 INFO [train.py:1198] (1/2) Epoch 47, batch 2200, loss[loss=0.24, ctc_loss=0.1069, cr_loss=0.3464, attn_decoder_loss=0.2471, over 29628.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1081, cr_loss=0.347, attn_decoder_loss=0.2366, over 5811388.18 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:26:14,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=841400.0, ans=0.0 2024-09-20 07:26:23,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=841400.0, ans=0.025 2024-09-20 07:26:35,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-09-20 07:26:51,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=841480.0, ans=0.0 2024-09-20 07:26:54,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=841480.0, ans=0.125 2024-09-20 07:27:09,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=841520.0, ans=0.125 2024-09-20 07:27:15,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=841560.0, ans=0.0 2024-09-20 07:27:15,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2024-09-20 07:27:31,970 INFO [train.py:1198] (1/2) Epoch 47, batch 2250, loss[loss=0.2397, ctc_loss=0.1139, cr_loss=0.3683, attn_decoder_loss=0.2455, over 29703.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1081, cr_loss=0.347, attn_decoder_loss=0.2367, over 5810677.46 frames. ], batch size: 82, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:27:36,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=841600.0, ans=0.95 2024-09-20 07:27:50,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.87 vs. limit=15.0 2024-09-20 07:27:57,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=841640.0, ans=0.125 2024-09-20 07:27:57,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=841640.0, ans=0.125 2024-09-20 07:27:57,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=841640.0, ans=0.0 2024-09-20 07:28:03,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=841680.0, ans=0.1 2024-09-20 07:28:10,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=22.5 2024-09-20 07:28:15,524 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.630e+01 9.023e+01 9.514e+01 2.412e+02, threshold=1.805e+02, percent-clipped=2.0 2024-09-20 07:28:22,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-20 07:28:47,111 INFO [train.py:1198] (1/2) Epoch 47, batch 2300, loss[loss=0.2087, ctc_loss=0.09238, cr_loss=0.3021, attn_decoder_loss=0.2149, over 29338.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.107, cr_loss=0.3445, attn_decoder_loss=0.2356, over 5798047.48 frames. ], batch size: 71, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:28:53,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841800.0, ans=0.1 2024-09-20 07:29:20,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=841880.0, ans=0.5 2024-09-20 07:29:23,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=841880.0, ans=0.125 2024-09-20 07:29:25,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=841880.0, ans=0.0 2024-09-20 07:29:27,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2024-09-20 07:29:30,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.45 vs. limit=22.5 2024-09-20 07:29:34,266 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:29:41,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=841920.0, ans=0.0 2024-09-20 07:30:04,681 INFO [train.py:1198] (1/2) Epoch 47, batch 2350, loss[loss=0.2489, ctc_loss=0.1256, cr_loss=0.3929, attn_decoder_loss=0.2539, over 29704.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1073, cr_loss=0.345, attn_decoder_loss=0.2357, over 5803719.48 frames. ], batch size: 83, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:30:04,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=842000.0, ans=0.125 2024-09-20 07:30:06,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=842000.0, ans=0.125 2024-09-20 07:30:37,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=842080.0, ans=0.125 2024-09-20 07:30:47,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=842080.0, ans=0.125 2024-09-20 07:30:50,388 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 8.800e+01 9.287e+01 9.916e+01 3.475e+02, threshold=1.857e+02, percent-clipped=2.0 2024-09-20 07:30:50,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=842120.0, ans=0.1 2024-09-20 07:30:59,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=842120.0, ans=0.125 2024-09-20 07:31:11,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=842160.0, ans=0.125 2024-09-20 07:31:22,266 INFO [train.py:1198] (1/2) Epoch 47, batch 2400, loss[loss=0.2262, ctc_loss=0.1051, cr_loss=0.3515, attn_decoder_loss=0.2319, over 29550.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1076, cr_loss=0.3452, attn_decoder_loss=0.2361, over 5807582.72 frames. ], batch size: 76, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:31:25,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=842200.0, ans=0.5 2024-09-20 07:32:05,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=842280.0, ans=0.125 2024-09-20 07:32:05,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.16 vs. limit=15.0 2024-09-20 07:32:11,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842320.0, ans=0.1 2024-09-20 07:32:12,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=842320.0, ans=0.0 2024-09-20 07:32:14,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.11 vs. limit=15.0 2024-09-20 07:32:15,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=842320.0, ans=0.125 2024-09-20 07:32:38,369 INFO [train.py:1198] (1/2) Epoch 47, batch 2450, loss[loss=0.2237, ctc_loss=0.09652, cr_loss=0.3208, attn_decoder_loss=0.2307, over 29692.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.108, cr_loss=0.3462, attn_decoder_loss=0.2369, over 5785282.05 frames. ], batch size: 82, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:32:52,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=842440.0, ans=0.0 2024-09-20 07:33:01,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=842440.0, ans=0.125 2024-09-20 07:33:02,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=842440.0, ans=0.05 2024-09-20 07:33:11,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=842480.0, ans=0.0 2024-09-20 07:33:15,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=842480.0, ans=0.1 2024-09-20 07:33:21,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.574e+01 9.096e+01 9.798e+01 1.804e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-20 07:33:25,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=842520.0, ans=0.125 2024-09-20 07:33:26,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=842520.0, ans=0.0 2024-09-20 07:33:51,665 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=22.5 2024-09-20 07:33:55,466 INFO [train.py:1198] (1/2) Epoch 47, batch 2500, loss[loss=0.243, ctc_loss=0.1114, cr_loss=0.3604, attn_decoder_loss=0.2496, over 29623.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1085, cr_loss=0.347, attn_decoder_loss=0.2371, over 5795373.40 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:33:59,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-09-20 07:34:07,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=842600.0, ans=0.0 2024-09-20 07:34:12,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.90 vs. limit=15.0 2024-09-20 07:34:19,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=842640.0, ans=0.0 2024-09-20 07:34:39,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.53 vs. limit=10.0 2024-09-20 07:34:41,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=842720.0, ans=0.125 2024-09-20 07:34:51,717 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2024-09-20 07:35:13,287 INFO [train.py:1198] (1/2) Epoch 47, batch 2550, loss[loss=0.2089, ctc_loss=0.09904, cr_loss=0.3198, attn_decoder_loss=0.214, over 29347.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1083, cr_loss=0.3473, attn_decoder_loss=0.237, over 5797882.23 frames. ], batch size: 67, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:35:22,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=842800.0, ans=0.0 2024-09-20 07:35:48,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=842880.0, ans=0.125 2024-09-20 07:35:48,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=842880.0, ans=10.0 2024-09-20 07:35:52,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=842880.0, ans=0.2 2024-09-20 07:35:56,712 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.685e+01 9.047e+01 9.681e+01 1.454e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-20 07:36:09,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842920.0, ans=0.1 2024-09-20 07:36:24,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=842960.0, ans=0.0 2024-09-20 07:36:28,667 INFO [train.py:1198] (1/2) Epoch 47, batch 2600, loss[loss=0.2181, ctc_loss=0.09847, cr_loss=0.3345, attn_decoder_loss=0.224, over 29437.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1084, cr_loss=0.3475, attn_decoder_loss=0.2373, over 5795162.02 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:36:35,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=843000.0, ans=0.125 2024-09-20 07:36:48,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=843040.0, ans=0.125 2024-09-20 07:37:29,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=843160.0, ans=0.125 2024-09-20 07:37:30,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.93 vs. limit=22.5 2024-09-20 07:37:41,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=843160.0, ans=0.125 2024-09-20 07:37:46,292 INFO [train.py:1198] (1/2) Epoch 47, batch 2650, loss[loss=0.2446, ctc_loss=0.1076, cr_loss=0.3345, attn_decoder_loss=0.2524, over 29225.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1085, cr_loss=0.348, attn_decoder_loss=0.2378, over 5801231.75 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:38:17,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=843280.0, ans=0.0 2024-09-20 07:38:31,923 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.003e+01 8.688e+01 9.037e+01 9.488e+01 1.743e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-20 07:38:36,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=843320.0, ans=0.0 2024-09-20 07:38:39,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=843320.0, ans=0.0 2024-09-20 07:38:44,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=843320.0, ans=0.125 2024-09-20 07:39:01,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=843360.0, ans=0.125 2024-09-20 07:39:03,846 INFO [train.py:1198] (1/2) Epoch 47, batch 2700, loss[loss=0.2387, ctc_loss=0.114, cr_loss=0.3541, attn_decoder_loss=0.2447, over 29545.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1085, cr_loss=0.348, attn_decoder_loss=0.2379, over 5797936.64 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:39:14,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843400.0, ans=0.1 2024-09-20 07:39:19,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843440.0, ans=0.1 2024-09-20 07:39:50,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=843520.0, ans=0.2 2024-09-20 07:39:58,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=843520.0, ans=0.025 2024-09-20 07:40:19,198 INFO [train.py:1198] (1/2) Epoch 47, batch 2750, loss[loss=0.2231, ctc_loss=0.1031, cr_loss=0.3428, attn_decoder_loss=0.2288, over 29518.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1078, cr_loss=0.3461, attn_decoder_loss=0.2367, over 5796591.13 frames. ], batch size: 75, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:40:21,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=843600.0, ans=0.125 2024-09-20 07:41:02,932 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.699e+01 9.178e+01 9.870e+01 7.766e+02, threshold=1.836e+02, percent-clipped=3.0 2024-09-20 07:41:03,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=843720.0, ans=0.07 2024-09-20 07:41:07,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=843720.0, ans=0.04949747468305833 2024-09-20 07:41:15,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=843720.0, ans=0.1 2024-09-20 07:41:37,201 INFO [train.py:1198] (1/2) Epoch 47, batch 2800, loss[loss=0.2466, ctc_loss=0.1266, cr_loss=0.364, attn_decoder_loss=0.2518, over 21324.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1083, cr_loss=0.3471, attn_decoder_loss=0.2371, over 5777920.77 frames. ], batch size: 210, lr: 2.34e-03, grad_scale: 32.0 2024-09-20 07:42:08,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=843880.0, ans=0.2 2024-09-20 07:42:17,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843880.0, ans=0.1 2024-09-20 07:42:18,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843880.0, ans=0.1 2024-09-20 07:42:36,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=843920.0, ans=0.125 2024-09-20 07:42:44,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=843960.0, ans=0.125 2024-09-20 07:42:44,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843960.0, ans=0.1 2024-09-20 07:42:46,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=843960.0, ans=0.025 2024-09-20 07:42:47,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=843960.0, ans=0.125 2024-09-20 07:42:51,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=843960.0, ans=0.125 2024-09-20 07:42:54,619 INFO [train.py:1198] (1/2) Epoch 47, batch 2850, loss[loss=0.2197, ctc_loss=0.09959, cr_loss=0.3302, attn_decoder_loss=0.2257, over 29516.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1085, cr_loss=0.3473, attn_decoder_loss=0.2373, over 5762153.15 frames. ], batch size: 77, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:42:57,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=844000.0, ans=0.0 2024-09-20 07:43:19,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=844040.0, ans=0.125 2024-09-20 07:43:26,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=844080.0, ans=0.125 2024-09-20 07:43:38,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=844120.0, ans=0.0 2024-09-20 07:43:39,705 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.812e+01 9.340e+01 9.979e+01 3.635e+02, threshold=1.868e+02, percent-clipped=1.0 2024-09-20 07:43:44,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=844120.0, ans=0.1 2024-09-20 07:43:54,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=844160.0, ans=0.125 2024-09-20 07:43:55,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=844160.0, ans=0.0 2024-09-20 07:43:58,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844160.0, ans=0.1 2024-09-20 07:43:59,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=844160.0, ans=0.125 2024-09-20 07:44:09,785 INFO [train.py:1198] (1/2) Epoch 47, batch 2900, loss[loss=0.2201, ctc_loss=0.09431, cr_loss=0.3127, attn_decoder_loss=0.2271, over 29424.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1092, cr_loss=0.3492, attn_decoder_loss=0.2384, over 5788163.31 frames. ], batch size: 79, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:44:50,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=844280.0, ans=0.0 2024-09-20 07:44:57,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.96 vs. limit=10.0 2024-09-20 07:45:03,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.50 vs. limit=15.0 2024-09-20 07:45:06,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=844320.0, ans=0.0 2024-09-20 07:45:10,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=844360.0, ans=0.1 2024-09-20 07:45:27,377 INFO [train.py:1198] (1/2) Epoch 47, batch 2950, loss[loss=0.2322, ctc_loss=0.1116, cr_loss=0.3679, attn_decoder_loss=0.2374, over 29553.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1085, cr_loss=0.3476, attn_decoder_loss=0.2372, over 5782957.83 frames. ], batch size: 75, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:45:27,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=844400.0, ans=0.0 2024-09-20 07:45:52,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2024-09-20 07:46:14,755 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.670e+01 9.173e+01 9.775e+01 4.031e+02, threshold=1.835e+02, percent-clipped=2.0 2024-09-20 07:46:45,076 INFO [train.py:1198] (1/2) Epoch 47, batch 3000, loss[loss=0.2359, ctc_loss=0.1156, cr_loss=0.3698, attn_decoder_loss=0.241, over 29751.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1085, cr_loss=0.3475, attn_decoder_loss=0.2371, over 5783310.10 frames. ], batch size: 81, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:46:45,076 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 07:47:03,445 INFO [train.py:1230] (1/2) Epoch 47, validation: loss=0.2127, ctc_loss=0.03692, cr_loss=6.538e-15, attn_decoder_loss=0.2323, over 944034.00 frames. 2024-09-20 07:47:03,446 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 07:47:28,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=844640.0, ans=0.125 2024-09-20 07:47:29,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=844640.0, ans=0.125 2024-09-20 07:47:56,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=15.0 2024-09-20 07:48:16,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=844760.0, ans=0.125 2024-09-20 07:48:19,549 INFO [train.py:1198] (1/2) Epoch 47, batch 3050, loss[loss=0.2379, ctc_loss=0.1217, cr_loss=0.3777, attn_decoder_loss=0.2424, over 29530.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1088, cr_loss=0.3484, attn_decoder_loss=0.2374, over 5777499.87 frames. ], batch size: 76, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 07:48:45,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=844840.0, ans=0.0 2024-09-20 07:49:00,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=844880.0, ans=0.0 2024-09-20 07:49:06,337 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.765e+01 9.371e+01 9.890e+01 2.296e+02, threshold=1.874e+02, percent-clipped=1.0 2024-09-20 07:49:26,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=844960.0, ans=0.0 2024-09-20 07:49:29,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=844960.0, ans=0.125 2024-09-20 07:49:38,885 INFO [train.py:1198] (1/2) Epoch 47, batch 3100, loss[loss=0.2432, ctc_loss=0.1193, cr_loss=0.3909, attn_decoder_loss=0.2483, over 29258.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1086, cr_loss=0.3479, attn_decoder_loss=0.237, over 5775988.04 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 07:49:58,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=845040.0, ans=0.125 2024-09-20 07:49:59,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=845040.0, ans=0.125 2024-09-20 07:50:05,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2024-09-20 07:50:08,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=845080.0, ans=0.125 2024-09-20 07:50:15,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=845080.0, ans=0.1 2024-09-20 07:50:19,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=845080.0, ans=0.0 2024-09-20 07:50:33,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=845120.0, ans=0.1 2024-09-20 07:50:37,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=845160.0, ans=0.125 2024-09-20 07:50:38,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-09-20 07:50:54,270 INFO [train.py:1198] (1/2) Epoch 47, batch 3150, loss[loss=0.2458, ctc_loss=0.117, cr_loss=0.3763, attn_decoder_loss=0.2517, over 28852.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1088, cr_loss=0.3483, attn_decoder_loss=0.237, over 5781760.61 frames. ], batch size: 104, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 07:50:56,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=845200.0, ans=0.125 2024-09-20 07:51:20,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=845240.0, ans=0.125 2024-09-20 07:51:30,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=845280.0, ans=0.125 2024-09-20 07:51:40,981 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.583e+01 9.160e+01 9.723e+01 3.463e+02, threshold=1.832e+02, percent-clipped=1.0 2024-09-20 07:51:48,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=845320.0, ans=0.1 2024-09-20 07:52:09,910 INFO [train.py:1198] (1/2) Epoch 47, batch 3200, loss[loss=0.2368, ctc_loss=0.119, cr_loss=0.3753, attn_decoder_loss=0.2415, over 29421.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1085, cr_loss=0.3474, attn_decoder_loss=0.2368, over 5792302.26 frames. ], batch size: 79, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:52:13,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=845400.0, ans=0.125 2024-09-20 07:52:20,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=845400.0, ans=0.125 2024-09-20 07:52:34,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=845440.0, ans=0.0 2024-09-20 07:52:46,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=845480.0, ans=0.125 2024-09-20 07:53:22,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=845560.0, ans=0.1 2024-09-20 07:53:28,506 INFO [train.py:1198] (1/2) Epoch 47, batch 3250, loss[loss=0.2359, ctc_loss=0.1093, cr_loss=0.3397, attn_decoder_loss=0.2424, over 29705.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1088, cr_loss=0.348, attn_decoder_loss=0.2374, over 5799579.65 frames. ], batch size: 84, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:53:50,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=845640.0, ans=0.125 2024-09-20 07:53:57,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=845640.0, ans=0.0 2024-09-20 07:53:58,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-20 07:54:17,342 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.840e+01 9.453e+01 1.004e+02 3.254e+02, threshold=1.891e+02, percent-clipped=2.0 2024-09-20 07:54:45,885 INFO [train.py:1198] (1/2) Epoch 47, batch 3300, loss[loss=0.2468, ctc_loss=0.1167, cr_loss=0.3384, attn_decoder_loss=0.2538, over 28270.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1082, cr_loss=0.3465, attn_decoder_loss=0.2364, over 5796497.10 frames. ], batch size: 111, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:55:06,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.94 vs. limit=22.5 2024-09-20 07:55:07,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=845840.0, ans=0.0 2024-09-20 07:55:08,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=22.5 2024-09-20 07:55:09,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.22 vs. limit=22.5 2024-09-20 07:55:37,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=845920.0, ans=0.05 2024-09-20 07:55:37,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2024-09-20 07:55:43,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=845920.0, ans=0.1 2024-09-20 07:55:46,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=845960.0, ans=0.125 2024-09-20 07:55:52,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=845960.0, ans=0.1 2024-09-20 07:56:00,878 INFO [train.py:1198] (1/2) Epoch 47, batch 3350, loss[loss=0.2371, ctc_loss=0.1106, cr_loss=0.3463, attn_decoder_loss=0.2435, over 28907.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.109, cr_loss=0.3479, attn_decoder_loss=0.2373, over 5773030.87 frames. ], batch size: 104, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:56:16,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=846040.0, ans=0.2 2024-09-20 07:56:47,173 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.631e+01 9.228e+01 9.801e+01 1.993e+02, threshold=1.846e+02, percent-clipped=1.0 2024-09-20 07:56:48,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=846120.0, ans=0.2 2024-09-20 07:56:58,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=846120.0, ans=0.125 2024-09-20 07:57:10,956 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:57:20,291 INFO [train.py:1198] (1/2) Epoch 47, batch 3400, loss[loss=0.2005, ctc_loss=0.08575, cr_loss=0.2791, attn_decoder_loss=0.2071, over 29339.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1092, cr_loss=0.3482, attn_decoder_loss=0.2371, over 5765265.64 frames. ], batch size: 67, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:57:26,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=846200.0, ans=0.125 2024-09-20 07:57:28,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=846200.0, ans=0.2 2024-09-20 07:58:36,091 INFO [train.py:1198] (1/2) Epoch 47, batch 3450, loss[loss=0.2344, ctc_loss=0.1031, cr_loss=0.3436, attn_decoder_loss=0.2414, over 28335.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1093, cr_loss=0.3483, attn_decoder_loss=0.2374, over 5773348.56 frames. ], batch size: 111, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:58:40,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=22.5 2024-09-20 07:58:58,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=846440.0, ans=0.1 2024-09-20 07:59:04,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=846480.0, ans=0.125 2024-09-20 07:59:13,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=846480.0, ans=0.2 2024-09-20 07:59:22,441 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.718e+01 9.224e+01 9.719e+01 1.765e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-20 07:59:33,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=846520.0, ans=0.2 2024-09-20 07:59:51,158 INFO [train.py:1198] (1/2) Epoch 47, batch 3500, loss[loss=0.2121, ctc_loss=0.09599, cr_loss=0.3281, attn_decoder_loss=0.2177, over 29327.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1089, cr_loss=0.3478, attn_decoder_loss=0.237, over 5776235.95 frames. ], batch size: 71, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:59:51,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=846600.0, ans=0.125 2024-09-20 07:59:59,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=846600.0, ans=0.0 2024-09-20 07:59:59,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=846600.0, ans=10.0 2024-09-20 08:00:08,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=846640.0, ans=0.125 2024-09-20 08:00:13,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=846640.0, ans=0.0 2024-09-20 08:00:21,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-09-20 08:00:29,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.43 vs. limit=15.0 2024-09-20 08:00:30,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=846680.0, ans=0.125 2024-09-20 08:00:31,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=846680.0, ans=0.125 2024-09-20 08:01:05,574 INFO [train.py:1198] (1/2) Epoch 47, batch 3550, loss[loss=0.2561, ctc_loss=0.128, cr_loss=0.3869, attn_decoder_loss=0.2617, over 29726.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1086, cr_loss=0.3475, attn_decoder_loss=0.237, over 5782314.88 frames. ], batch size: 89, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 08:01:07,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=846800.0, ans=0.0 2024-09-20 08:01:42,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=846880.0, ans=0.0 2024-09-20 08:01:55,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.386e+01 8.882e+01 9.420e+01 1.531e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-20 08:01:58,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=846920.0, ans=0.125 2024-09-20 08:02:23,261 INFO [train.py:1198] (1/2) Epoch 47, batch 3600, loss[loss=0.2306, ctc_loss=0.1127, cr_loss=0.363, attn_decoder_loss=0.2357, over 29482.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1084, cr_loss=0.3471, attn_decoder_loss=0.2369, over 5790930.57 frames. ], batch size: 77, lr: 2.33e-03, grad_scale: 32.0 2024-09-20 08:02:42,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=847040.0, ans=0.125 2024-09-20 08:03:08,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=847120.0, ans=0.125 2024-09-20 08:03:08,156 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:03:15,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.78 vs. limit=15.0 2024-09-20 08:03:18,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=847120.0, ans=0.125 2024-09-20 08:03:18,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=847120.0, ans=0.0 2024-09-20 08:03:27,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=847160.0, ans=0.025 2024-09-20 08:03:27,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=847160.0, ans=0.125 2024-09-20 08:03:34,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=847160.0, ans=0.0 2024-09-20 08:03:37,746 INFO [train.py:1198] (1/2) Epoch 47, batch 3650, loss[loss=0.2528, ctc_loss=0.1269, cr_loss=0.3828, attn_decoder_loss=0.2582, over 29498.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1082, cr_loss=0.3465, attn_decoder_loss=0.2367, over 5792809.80 frames. ], batch size: 90, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 08:04:14,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=15.0 2024-09-20 08:04:26,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.685e+01 9.171e+01 9.762e+01 1.576e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-20 08:04:26,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=847320.0, ans=0.0 2024-09-20 08:04:28,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=847320.0, ans=0.0 2024-09-20 08:04:29,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=847320.0, ans=0.07 2024-09-20 08:04:43,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=847360.0, ans=0.125 2024-09-20 08:04:51,911 INFO [train.py:1198] (1/2) Epoch 47, batch 3700, loss[loss=0.234, ctc_loss=0.1086, cr_loss=0.3258, attn_decoder_loss=0.2407, over 29702.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1085, cr_loss=0.3473, attn_decoder_loss=0.2369, over 5801946.72 frames. ], batch size: 84, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:04:52,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=847400.0, ans=0.1 2024-09-20 08:04:53,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=847400.0, ans=0.125 2024-09-20 08:05:20,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=847480.0, ans=0.0 2024-09-20 08:05:37,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=847520.0, ans=0.125 2024-09-20 08:05:39,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=847520.0, ans=0.125 2024-09-20 08:05:47,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=847520.0, ans=0.0 2024-09-20 08:06:06,313 INFO [train.py:1198] (1/2) Epoch 47, batch 3750, loss[loss=0.2045, ctc_loss=0.08684, cr_loss=0.2968, attn_decoder_loss=0.211, over 29372.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1083, cr_loss=0.3473, attn_decoder_loss=0.2367, over 5806894.99 frames. ], batch size: 67, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:06:21,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=847640.0, ans=0.0 2024-09-20 08:06:55,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.560e+01 9.134e+01 9.769e+01 1.535e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-20 08:06:58,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=847720.0, ans=0.025 2024-09-20 08:07:01,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=847720.0, ans=0.125 2024-09-20 08:07:10,725 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:07:12,356 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:07:20,707 INFO [train.py:1198] (1/2) Epoch 47, batch 3800, loss[loss=0.2415, ctc_loss=0.1098, cr_loss=0.3619, attn_decoder_loss=0.2481, over 29608.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1083, cr_loss=0.347, attn_decoder_loss=0.2364, over 5798572.84 frames. ], batch size: 86, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:07:23,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=847800.0, ans=0.0 2024-09-20 08:07:29,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=847800.0, ans=0.125 2024-09-20 08:07:39,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=847840.0, ans=0.125 2024-09-20 08:07:49,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=847840.0, ans=0.2 2024-09-20 08:07:52,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=847880.0, ans=0.0 2024-09-20 08:08:05,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-09-20 08:08:17,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=847920.0, ans=0.125 2024-09-20 08:08:45,615 INFO [train.py:1198] (1/2) Epoch 47, batch 3850, loss[loss=0.251, ctc_loss=0.1203, cr_loss=0.3685, attn_decoder_loss=0.2573, over 29254.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1084, cr_loss=0.3474, attn_decoder_loss=0.2365, over 5811618.88 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:08:53,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.49 vs. limit=15.0 2024-09-20 08:09:22,749 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:09:34,445 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 8.837e+01 9.282e+01 9.780e+01 1.653e+02, threshold=1.856e+02, percent-clipped=0.0 2024-09-20 08:09:56,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=848160.0, ans=0.0 2024-09-20 08:09:59,712 INFO [train.py:1198] (1/2) Epoch 47, batch 3900, loss[loss=0.2447, ctc_loss=0.1106, cr_loss=0.3522, attn_decoder_loss=0.2518, over 29639.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1085, cr_loss=0.3468, attn_decoder_loss=0.2369, over 5816305.29 frames. ], batch size: 86, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:10:06,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848200.0, ans=0.1 2024-09-20 08:10:19,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=848240.0, ans=0.125 2024-09-20 08:10:26,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=848240.0, ans=0.125 2024-09-20 08:10:45,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=22.5 2024-09-20 08:10:48,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=848320.0, ans=0.04949747468305833 2024-09-20 08:10:54,390 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:11:13,504 INFO [train.py:1198] (1/2) Epoch 47, batch 3950, loss[loss=0.2425, ctc_loss=0.1165, cr_loss=0.3644, attn_decoder_loss=0.2484, over 29486.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.108, cr_loss=0.3463, attn_decoder_loss=0.2368, over 5835872.30 frames. ], batch size: 97, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:11:28,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=848440.0, ans=0.125 2024-09-20 08:11:43,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848480.0, ans=0.1 2024-09-20 08:11:47,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=848480.0, ans=0.125 2024-09-20 08:12:02,180 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.442e+01 9.066e+01 9.725e+01 6.124e+02, threshold=1.813e+02, percent-clipped=2.0 2024-09-20 08:12:27,091 INFO [train.py:1198] (1/2) Epoch 47, batch 4000, loss[loss=0.2168, ctc_loss=0.1023, cr_loss=0.3396, attn_decoder_loss=0.2219, over 29506.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1081, cr_loss=0.3464, attn_decoder_loss=0.2366, over 5812393.69 frames. ], batch size: 74, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 08:12:53,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=848640.0, ans=0.5 2024-09-20 08:13:06,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=848680.0, ans=0.125 2024-09-20 08:13:35,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.03 vs. limit=15.0 2024-09-20 08:13:43,949 INFO [train.py:1198] (1/2) Epoch 47, batch 4050, loss[loss=0.247, ctc_loss=0.1283, cr_loss=0.3653, attn_decoder_loss=0.252, over 19940.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.108, cr_loss=0.3461, attn_decoder_loss=0.2365, over 5794877.04 frames. ], batch size: 210, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:14:06,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=848840.0, ans=0.0 2024-09-20 08:14:11,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=848880.0, ans=0.0 2024-09-20 08:14:14,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=848880.0, ans=0.1 2024-09-20 08:14:33,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.525e+01 9.075e+01 9.953e+01 1.624e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-20 08:14:48,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2024-09-20 08:14:56,871 INFO [train.py:1198] (1/2) Epoch 47, batch 4100, loss[loss=0.25, ctc_loss=0.1223, cr_loss=0.3865, attn_decoder_loss=0.2556, over 29516.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.108, cr_loss=0.346, attn_decoder_loss=0.2366, over 5790413.01 frames. ], batch size: 90, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:14:59,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=849000.0, ans=0.0 2024-09-20 08:15:28,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.12 vs. limit=22.5 2024-09-20 08:15:47,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=849120.0, ans=0.125 2024-09-20 08:16:10,033 INFO [train.py:1198] (1/2) Epoch 47, batch 4150, loss[loss=0.2241, ctc_loss=0.1023, cr_loss=0.321, attn_decoder_loss=0.2305, over 29517.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.108, cr_loss=0.3462, attn_decoder_loss=0.2363, over 5796123.78 frames. ], batch size: 77, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:16:11,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=849200.0, ans=0.0 2024-09-20 08:16:23,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=849240.0, ans=0.1 2024-09-20 08:16:32,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=849240.0, ans=10.0 2024-09-20 08:16:37,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2024-09-20 08:17:00,940 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.758e+01 9.113e+01 9.755e+01 1.948e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-20 08:17:25,430 INFO [train.py:1198] (1/2) Epoch 47, batch 4200, loss[loss=0.2484, ctc_loss=0.1193, cr_loss=0.3858, attn_decoder_loss=0.2541, over 29524.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1084, cr_loss=0.3471, attn_decoder_loss=0.2367, over 5798493.07 frames. ], batch size: 90, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:17:36,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=849400.0, ans=0.09899494936611666 2024-09-20 08:17:37,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=849400.0, ans=0.125 2024-09-20 08:17:40,688 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:17:41,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=849440.0, ans=0.125 2024-09-20 08:18:07,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.36 vs. limit=10.0 2024-09-20 08:18:08,422 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:18:14,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=849520.0, ans=0.1 2024-09-20 08:18:15,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=849520.0, ans=0.125 2024-09-20 08:18:16,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=22.5 2024-09-20 08:18:20,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=849520.0, ans=0.2 2024-09-20 08:18:24,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=849560.0, ans=0.0 2024-09-20 08:18:31,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=849560.0, ans=0.125 2024-09-20 08:18:39,542 INFO [train.py:1198] (1/2) Epoch 47, batch 4250, loss[loss=0.2062, ctc_loss=0.08498, cr_loss=0.2873, attn_decoder_loss=0.2133, over 29513.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1082, cr_loss=0.3463, attn_decoder_loss=0.2369, over 5804813.39 frames. ], batch size: 74, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:18:47,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2024-09-20 08:18:53,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=849640.0, ans=0.2 2024-09-20 08:18:56,133 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:19:29,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.716e+01 9.310e+01 9.869e+01 2.948e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-20 08:19:44,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=849760.0, ans=0.2 2024-09-20 08:19:48,985 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:19:50,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=849760.0, ans=0.025 2024-09-20 08:19:53,044 INFO [train.py:1198] (1/2) Epoch 47, batch 4300, loss[loss=0.2398, ctc_loss=0.1102, cr_loss=0.3564, attn_decoder_loss=0.2463, over 29511.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1079, cr_loss=0.3459, attn_decoder_loss=0.237, over 5794955.04 frames. ], batch size: 87, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:19:54,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=849800.0, ans=0.125 2024-09-20 08:19:57,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=849800.0, ans=0.0 2024-09-20 08:20:00,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=849800.0, ans=0.125 2024-09-20 08:20:02,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=849800.0, ans=0.1 2024-09-20 08:20:13,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2024-09-20 08:20:27,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2024-09-20 08:20:32,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=849880.0, ans=0.0 2024-09-20 08:20:37,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-09-20 08:20:40,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=849920.0, ans=0.1 2024-09-20 08:21:07,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=850000.0, ans=0.125 2024-09-20 08:21:08,689 INFO [train.py:1198] (1/2) Epoch 47, batch 4350, loss[loss=0.2429, ctc_loss=0.1218, cr_loss=0.3806, attn_decoder_loss=0.2479, over 29503.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1109, cr_loss=0.3526, attn_decoder_loss=0.2406, over 5798205.11 frames. ], batch size: 97, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:21:25,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=850040.0, ans=0.125 2024-09-20 08:21:47,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=850080.0, ans=0.125 2024-09-20 08:21:58,353 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 9.076e+01 9.411e+01 9.982e+01 1.475e+02, threshold=1.882e+02, percent-clipped=0.0 2024-09-20 08:22:03,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=850120.0, ans=0.0 2024-09-20 08:22:07,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=850160.0, ans=0.2 2024-09-20 08:22:12,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2024-09-20 08:22:21,582 INFO [train.py:1198] (1/2) Epoch 47, batch 4400, loss[loss=0.2342, ctc_loss=0.1139, cr_loss=0.3552, attn_decoder_loss=0.2397, over 27388.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1121, cr_loss=0.3548, attn_decoder_loss=0.2425, over 5768800.74 frames. ], batch size: 125, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 08:22:49,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=850280.0, ans=0.125 2024-09-20 08:22:55,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.53 vs. limit=15.0 2024-09-20 08:23:36,666 INFO [train.py:1198] (1/2) Epoch 47, batch 4450, loss[loss=0.2475, ctc_loss=0.1252, cr_loss=0.3689, attn_decoder_loss=0.2529, over 20410.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1153, cr_loss=0.3598, attn_decoder_loss=0.2446, over 5574932.92 frames. ], batch size: 209, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:23:38,581 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:24:10,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=12.0 2024-09-20 08:24:12,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=850480.0, ans=0.125 2024-09-20 08:24:25,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=850520.0, ans=0.1 2024-09-20 08:24:29,097 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.530e+01 9.504e+01 1.076e+02 1.200e+02 1.579e+02, threshold=2.152e+02, percent-clipped=0.0 2024-09-20 08:24:30,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850520.0, ans=0.1 2024-09-20 08:24:41,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=850560.0, ans=0.125 2024-09-20 08:24:51,489 INFO [train.py:1198] (1/2) Epoch 47, batch 4500, loss[loss=0.2648, ctc_loss=0.1478, cr_loss=0.4006, attn_decoder_loss=0.2688, over 19573.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1182, cr_loss=0.3622, attn_decoder_loss=0.2463, over 5237270.42 frames. ], batch size: 209, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:24:57,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=850600.0, ans=0.125 2024-09-20 08:25:03,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=850600.0, ans=0.0 2024-09-20 08:25:15,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=850640.0, ans=0.0 2024-09-20 08:26:14,114 INFO [train.py:1198] (1/2) Epoch 48, batch 0, loss[loss=0.2116, ctc_loss=0.09392, cr_loss=0.3314, attn_decoder_loss=0.2173, over 29644.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.09392, cr_loss=0.3314, attn_decoder_loss=0.2173, over 29644.00 frames. ], batch size: 73, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:26:14,114 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 08:26:32,432 INFO [train.py:1230] (1/2) Epoch 48, validation: loss=0.2131, ctc_loss=0.03621, cr_loss=7.075e-15, attn_decoder_loss=0.2327, over 944034.00 frames. 2024-09-20 08:26:32,433 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 08:26:40,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=850700.0, ans=0.05 2024-09-20 08:27:00,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=12.0 2024-09-20 08:27:25,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=850820.0, ans=0.0 2024-09-20 08:27:49,844 INFO [train.py:1198] (1/2) Epoch 48, batch 50, loss[loss=0.2056, ctc_loss=0.0882, cr_loss=0.3018, attn_decoder_loss=0.212, over 29418.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1094, cr_loss=0.3496, attn_decoder_loss=0.2377, over 1268714.62 frames. ], batch size: 70, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:27:55,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-09-20 08:28:04,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.52 vs. limit=10.0 2024-09-20 08:28:04,958 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 9.048e+01 9.836e+01 1.173e+02 2.253e+02, threshold=1.967e+02, percent-clipped=1.0 2024-09-20 08:28:55,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=851060.0, ans=0.125 2024-09-20 08:29:01,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=851060.0, ans=0.1 2024-09-20 08:29:01,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-20 08:29:03,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.88 vs. limit=15.0 2024-09-20 08:29:07,776 INFO [train.py:1198] (1/2) Epoch 48, batch 100, loss[loss=0.2212, ctc_loss=0.1063, cr_loss=0.341, attn_decoder_loss=0.2264, over 29539.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1107, cr_loss=0.3522, attn_decoder_loss=0.2394, over 2253164.88 frames. ], batch size: 76, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:29:28,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=851140.0, ans=0.0 2024-09-20 08:29:44,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-09-20 08:29:48,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=851180.0, ans=0.125 2024-09-20 08:30:03,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=851220.0, ans=0.0 2024-09-20 08:30:13,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.34 vs. limit=15.0 2024-09-20 08:30:22,108 INFO [train.py:1198] (1/2) Epoch 48, batch 150, loss[loss=0.2067, ctc_loss=0.09413, cr_loss=0.3265, attn_decoder_loss=0.2119, over 29462.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1089, cr_loss=0.3483, attn_decoder_loss=0.2374, over 3047710.17 frames. ], batch size: 70, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:30:22,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851300.0, ans=0.1 2024-09-20 08:30:38,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.661e+01 9.113e+01 9.779e+01 1.487e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-20 08:30:42,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=851340.0, ans=0.125 2024-09-20 08:30:55,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=851380.0, ans=0.0 2024-09-20 08:30:55,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=851380.0, ans=0.125 2024-09-20 08:30:55,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=851380.0, ans=0.125 2024-09-20 08:31:24,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=851460.0, ans=0.125 2024-09-20 08:31:33,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=851460.0, ans=10.0 2024-09-20 08:31:39,277 INFO [train.py:1198] (1/2) Epoch 48, batch 200, loss[loss=0.2404, ctc_loss=0.1207, cr_loss=0.3763, attn_decoder_loss=0.2454, over 27480.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.108, cr_loss=0.3462, attn_decoder_loss=0.2362, over 3660459.10 frames. ], batch size: 125, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:31:41,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=851500.0, ans=0.125 2024-09-20 08:32:19,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.95 vs. limit=15.0 2024-09-20 08:32:54,542 INFO [train.py:1198] (1/2) Epoch 48, batch 250, loss[loss=0.2373, ctc_loss=0.1175, cr_loss=0.3705, attn_decoder_loss=0.2424, over 29219.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1079, cr_loss=0.3466, attn_decoder_loss=0.2363, over 4140590.60 frames. ], batch size: 100, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:32:56,239 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:33:13,492 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.550e+01 9.278e+01 9.687e+01 3.776e+02, threshold=1.856e+02, percent-clipped=1.0 2024-09-20 08:33:18,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=851740.0, ans=0.125 2024-09-20 08:33:19,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=851740.0, ans=0.125 2024-09-20 08:33:23,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.61 vs. limit=22.5 2024-09-20 08:33:40,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=851820.0, ans=0.0 2024-09-20 08:33:40,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=851820.0, ans=0.0 2024-09-20 08:33:51,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=851820.0, ans=0.125 2024-09-20 08:34:09,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=851860.0, ans=0.125 2024-09-20 08:34:12,355 INFO [train.py:1198] (1/2) Epoch 48, batch 300, loss[loss=0.2389, ctc_loss=0.1175, cr_loss=0.3878, attn_decoder_loss=0.2438, over 29525.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1074, cr_loss=0.3459, attn_decoder_loss=0.236, over 4508493.64 frames. ], batch size: 92, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:34:14,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=851900.0, ans=0.0 2024-09-20 08:34:18,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851900.0, ans=0.1 2024-09-20 08:34:31,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=851940.0, ans=15.0 2024-09-20 08:34:55,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=851980.0, ans=0.2 2024-09-20 08:35:16,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=852060.0, ans=0.025 2024-09-20 08:35:21,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=852060.0, ans=0.2 2024-09-20 08:35:22,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=852060.0, ans=0.0 2024-09-20 08:35:29,934 INFO [train.py:1198] (1/2) Epoch 48, batch 350, loss[loss=0.2135, ctc_loss=0.08479, cr_loss=0.2734, attn_decoder_loss=0.2217, over 29332.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1077, cr_loss=0.3467, attn_decoder_loss=0.2368, over 4793540.12 frames. ], batch size: 71, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:35:46,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.632e+01 9.132e+01 9.604e+01 3.712e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-20 08:36:03,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=852180.0, ans=0.125 2024-09-20 08:36:45,275 INFO [train.py:1198] (1/2) Epoch 48, batch 400, loss[loss=0.2335, ctc_loss=0.1065, cr_loss=0.3451, attn_decoder_loss=0.24, over 29691.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1076, cr_loss=0.3461, attn_decoder_loss=0.2364, over 5023712.06 frames. ], batch size: 82, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:36:56,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=852300.0, ans=0.125 2024-09-20 08:37:18,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=852380.0, ans=0.125 2024-09-20 08:37:58,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=852460.0, ans=0.125 2024-09-20 08:37:59,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=12.0 2024-09-20 08:38:03,191 INFO [train.py:1198] (1/2) Epoch 48, batch 450, loss[loss=0.2458, ctc_loss=0.1208, cr_loss=0.3707, attn_decoder_loss=0.2515, over 29675.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1076, cr_loss=0.3451, attn_decoder_loss=0.2364, over 5186700.41 frames. ], batch size: 83, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:38:13,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-20 08:38:14,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=852500.0, ans=0.125 2024-09-20 08:38:19,716 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.734e+01 9.234e+01 9.898e+01 1.385e+02, threshold=1.847e+02, percent-clipped=0.0 2024-09-20 08:38:51,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2024-09-20 08:39:00,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=852620.0, ans=0.0 2024-09-20 08:39:12,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=852660.0, ans=0.025 2024-09-20 08:39:21,102 INFO [train.py:1198] (1/2) Epoch 48, batch 500, loss[loss=0.2388, ctc_loss=0.1099, cr_loss=0.3331, attn_decoder_loss=0.2458, over 29459.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1072, cr_loss=0.3447, attn_decoder_loss=0.2357, over 5329776.43 frames. ], batch size: 94, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:39:25,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=852700.0, ans=0.2 2024-09-20 08:39:52,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=852780.0, ans=0.0 2024-09-20 08:40:08,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=852820.0, ans=0.0 2024-09-20 08:40:16,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=852820.0, ans=0.2 2024-09-20 08:40:36,936 INFO [train.py:1198] (1/2) Epoch 48, batch 550, loss[loss=0.2431, ctc_loss=0.1098, cr_loss=0.3504, attn_decoder_loss=0.2501, over 28871.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1071, cr_loss=0.3446, attn_decoder_loss=0.2357, over 5422019.22 frames. ], batch size: 104, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:40:53,454 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.626e+01 8.943e+01 9.744e+01 1.321e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-20 08:41:07,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2024-09-20 08:41:41,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=853060.0, ans=0.0 2024-09-20 08:41:50,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=853060.0, ans=0.2 2024-09-20 08:41:54,605 INFO [train.py:1198] (1/2) Epoch 48, batch 600, loss[loss=0.2421, ctc_loss=0.1102, cr_loss=0.341, attn_decoder_loss=0.2492, over 29213.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1071, cr_loss=0.3447, attn_decoder_loss=0.2361, over 5508860.71 frames. ], batch size: 100, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:41:56,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=853100.0, ans=0.025 2024-09-20 08:42:05,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=853100.0, ans=0.125 2024-09-20 08:42:20,717 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-09-20 08:42:43,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.87 vs. limit=10.0 2024-09-20 08:43:04,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=853260.0, ans=0.0 2024-09-20 08:43:12,012 INFO [train.py:1198] (1/2) Epoch 48, batch 650, loss[loss=0.2313, ctc_loss=0.1087, cr_loss=0.3548, attn_decoder_loss=0.237, over 29782.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1063, cr_loss=0.3429, attn_decoder_loss=0.2352, over 5585649.31 frames. ], batch size: 81, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:43:15,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=853300.0, ans=0.1 2024-09-20 08:43:28,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.608e+01 8.952e+01 9.536e+01 4.634e+02, threshold=1.790e+02, percent-clipped=1.0 2024-09-20 08:43:48,612 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:43:51,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=853380.0, ans=0.125 2024-09-20 08:43:54,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=853380.0, ans=0.0 2024-09-20 08:44:03,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853420.0, ans=0.1 2024-09-20 08:44:05,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-09-20 08:44:24,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=853460.0, ans=0.125 2024-09-20 08:44:27,293 INFO [train.py:1198] (1/2) Epoch 48, batch 700, loss[loss=0.2281, ctc_loss=0.1098, cr_loss=0.3538, attn_decoder_loss=0.2334, over 29520.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1069, cr_loss=0.3445, attn_decoder_loss=0.2357, over 5636393.15 frames. ], batch size: 76, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:44:42,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=853540.0, ans=0.125 2024-09-20 08:44:43,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=853540.0, ans=0.125 2024-09-20 08:44:47,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=853540.0, ans=0.0 2024-09-20 08:44:51,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=853540.0, ans=0.0 2024-09-20 08:44:53,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=853540.0, ans=0.0 2024-09-20 08:45:04,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=853580.0, ans=0.025 2024-09-20 08:45:05,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.85 vs. limit=10.0 2024-09-20 08:45:11,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=853580.0, ans=0.125 2024-09-20 08:45:13,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2024-09-20 08:45:21,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=853620.0, ans=0.0 2024-09-20 08:45:45,552 INFO [train.py:1198] (1/2) Epoch 48, batch 750, loss[loss=0.2333, ctc_loss=0.1117, cr_loss=0.3591, attn_decoder_loss=0.2388, over 29678.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.107, cr_loss=0.3447, attn_decoder_loss=0.2355, over 5673829.58 frames. ], batch size: 82, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:45:47,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=853700.0, ans=0.0 2024-09-20 08:45:51,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=853700.0, ans=0.1 2024-09-20 08:45:53,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=853700.0, ans=0.125 2024-09-20 08:46:00,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=853740.0, ans=0.5 2024-09-20 08:46:01,950 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.792e+01 9.309e+01 9.827e+01 1.298e+02, threshold=1.862e+02, percent-clipped=0.0 2024-09-20 08:46:18,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=853780.0, ans=0.125 2024-09-20 08:46:31,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2024-09-20 08:46:37,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=22.5 2024-09-20 08:47:03,067 INFO [train.py:1198] (1/2) Epoch 48, batch 800, loss[loss=0.2043, ctc_loss=0.08484, cr_loss=0.2855, attn_decoder_loss=0.2113, over 29608.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1073, cr_loss=0.3449, attn_decoder_loss=0.236, over 5705317.22 frames. ], batch size: 73, lr: 2.30e-03, grad_scale: 32.0 2024-09-20 08:47:10,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=853900.0, ans=0.125 2024-09-20 08:47:42,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=853980.0, ans=0.0 2024-09-20 08:47:44,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=853980.0, ans=0.125 2024-09-20 08:47:47,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=854020.0, ans=0.125 2024-09-20 08:47:48,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=854020.0, ans=0.125 2024-09-20 08:47:50,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.98 vs. limit=22.5 2024-09-20 08:48:18,093 INFO [train.py:1198] (1/2) Epoch 48, batch 850, loss[loss=0.2291, ctc_loss=0.1001, cr_loss=0.3323, attn_decoder_loss=0.236, over 29711.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1068, cr_loss=0.3437, attn_decoder_loss=0.2356, over 5734264.92 frames. ], batch size: 89, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:48:26,189 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:48:35,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.678e+01 9.128e+01 9.659e+01 1.410e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-20 08:48:54,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=854180.0, ans=0.125 2024-09-20 08:49:07,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=854220.0, ans=0.025 2024-09-20 08:49:09,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=854220.0, ans=0.2 2024-09-20 08:49:10,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=854220.0, ans=0.2 2024-09-20 08:49:36,100 INFO [train.py:1198] (1/2) Epoch 48, batch 900, loss[loss=0.2097, ctc_loss=0.09065, cr_loss=0.3045, attn_decoder_loss=0.2161, over 29599.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1072, cr_loss=0.3447, attn_decoder_loss=0.2361, over 5739560.83 frames. ], batch size: 73, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:49:39,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.93 vs. limit=22.5 2024-09-20 08:50:20,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=854420.0, ans=0.125 2024-09-20 08:50:21,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=854420.0, ans=0.0 2024-09-20 08:50:30,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2024-09-20 08:50:42,254 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:50:53,784 INFO [train.py:1198] (1/2) Epoch 48, batch 950, loss[loss=0.2151, ctc_loss=0.09202, cr_loss=0.3122, attn_decoder_loss=0.2218, over 29489.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1072, cr_loss=0.3444, attn_decoder_loss=0.2362, over 5742352.21 frames. ], batch size: 74, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:50:57,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=3.85 vs. limit=12.0 2024-09-20 08:51:04,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854500.0, ans=0.1 2024-09-20 08:51:11,717 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.747e+01 9.386e+01 9.871e+01 2.198e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-20 08:51:57,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.55 vs. limit=12.0 2024-09-20 08:52:00,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=854660.0, ans=0.0 2024-09-20 08:52:08,494 INFO [train.py:1198] (1/2) Epoch 48, batch 1000, loss[loss=0.2322, ctc_loss=0.1154, cr_loss=0.3697, attn_decoder_loss=0.237, over 29528.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1082, cr_loss=0.3466, attn_decoder_loss=0.237, over 5736964.08 frames. ], batch size: 77, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:52:25,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=854740.0, ans=0.0 2024-09-20 08:52:32,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=854740.0, ans=0.125 2024-09-20 08:52:43,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=854780.0, ans=0.125 2024-09-20 08:52:45,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=12.0 2024-09-20 08:52:50,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=854780.0, ans=0.2 2024-09-20 08:52:56,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.65 vs. limit=15.0 2024-09-20 08:53:03,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=854820.0, ans=0.0 2024-09-20 08:53:12,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=854860.0, ans=0.125 2024-09-20 08:53:25,979 INFO [train.py:1198] (1/2) Epoch 48, batch 1050, loss[loss=0.2387, ctc_loss=0.112, cr_loss=0.3501, attn_decoder_loss=0.245, over 29667.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.108, cr_loss=0.3462, attn_decoder_loss=0.2367, over 5746922.98 frames. ], batch size: 85, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:53:44,086 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.686e+01 9.233e+01 9.898e+01 2.337e+02, threshold=1.847e+02, percent-clipped=2.0 2024-09-20 08:53:56,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=854980.0, ans=0.125 2024-09-20 08:54:04,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=854980.0, ans=0.1 2024-09-20 08:54:13,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=855020.0, ans=0.0 2024-09-20 08:54:43,764 INFO [train.py:1198] (1/2) Epoch 48, batch 1100, loss[loss=0.2193, ctc_loss=0.09411, cr_loss=0.323, attn_decoder_loss=0.2261, over 29432.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1076, cr_loss=0.3453, attn_decoder_loss=0.2363, over 5757703.43 frames. ], batch size: 78, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:54:48,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=855100.0, ans=0.2 2024-09-20 08:55:25,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=855180.0, ans=0.2 2024-09-20 08:55:56,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=855260.0, ans=0.0 2024-09-20 08:55:58,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=855300.0, ans=0.2 2024-09-20 08:55:59,692 INFO [train.py:1198] (1/2) Epoch 48, batch 1150, loss[loss=0.2289, ctc_loss=0.1113, cr_loss=0.3342, attn_decoder_loss=0.2346, over 29446.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1079, cr_loss=0.3457, attn_decoder_loss=0.2365, over 5757322.31 frames. ], batch size: 78, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:56:13,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=855340.0, ans=0.0 2024-09-20 08:56:19,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.603e+01 9.086e+01 9.808e+01 3.950e+02, threshold=1.817e+02, percent-clipped=2.0 2024-09-20 08:56:20,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-20 08:57:17,294 INFO [train.py:1198] (1/2) Epoch 48, batch 1200, loss[loss=0.2345, ctc_loss=0.105, cr_loss=0.3456, attn_decoder_loss=0.2412, over 29666.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1081, cr_loss=0.3464, attn_decoder_loss=0.2371, over 5748307.11 frames. ], batch size: 85, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:57:18,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.83 vs. limit=15.0 2024-09-20 08:57:28,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=855500.0, ans=0.0 2024-09-20 08:57:29,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=855500.0, ans=0.125 2024-09-20 08:57:55,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=855580.0, ans=0.125 2024-09-20 08:58:12,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=22.5 2024-09-20 08:58:21,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=855660.0, ans=0.0 2024-09-20 08:58:32,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=855660.0, ans=0.125 2024-09-20 08:58:34,853 INFO [train.py:1198] (1/2) Epoch 48, batch 1250, loss[loss=0.2448, ctc_loss=0.1219, cr_loss=0.373, attn_decoder_loss=0.2501, over 29503.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1084, cr_loss=0.347, attn_decoder_loss=0.2375, over 5775664.28 frames. ], batch size: 92, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:58:54,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.788e+01 9.389e+01 9.946e+01 2.084e+02, threshold=1.878e+02, percent-clipped=1.0 2024-09-20 08:59:48,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=855860.0, ans=0.2 2024-09-20 08:59:50,615 INFO [train.py:1198] (1/2) Epoch 48, batch 1300, loss[loss=0.2366, ctc_loss=0.1081, cr_loss=0.3372, attn_decoder_loss=0.2434, over 28295.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1083, cr_loss=0.3464, attn_decoder_loss=0.2374, over 5779597.57 frames. ], batch size: 111, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:59:56,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=855900.0, ans=0.125 2024-09-20 09:00:10,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=855940.0, ans=0.04949747468305833 2024-09-20 09:00:13,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=855940.0, ans=0.2 2024-09-20 09:00:16,727 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:00:34,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2024-09-20 09:01:09,005 INFO [train.py:1198] (1/2) Epoch 48, batch 1350, loss[loss=0.2294, ctc_loss=0.1017, cr_loss=0.3155, attn_decoder_loss=0.2366, over 29766.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1078, cr_loss=0.3458, attn_decoder_loss=0.237, over 5798061.06 frames. ], batch size: 81, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:01:10,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=856100.0, ans=0.0 2024-09-20 09:01:22,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=856140.0, ans=0.2 2024-09-20 09:01:29,676 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.601e+01 8.992e+01 9.491e+01 1.134e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-20 09:01:29,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856140.0, ans=0.1 2024-09-20 09:01:47,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=856180.0, ans=0.2 2024-09-20 09:01:53,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=856220.0, ans=0.0 2024-09-20 09:01:59,890 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:02:17,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=856260.0, ans=0.125 2024-09-20 09:02:25,970 INFO [train.py:1198] (1/2) Epoch 48, batch 1400, loss[loss=0.2062, ctc_loss=0.08993, cr_loss=0.3119, attn_decoder_loss=0.2122, over 29599.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1075, cr_loss=0.3451, attn_decoder_loss=0.2365, over 5809005.02 frames. ], batch size: 69, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:02:29,703 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-20 09:02:33,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=856300.0, ans=0.0 2024-09-20 09:03:05,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.91 vs. limit=10.0 2024-09-20 09:03:09,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856420.0, ans=0.1 2024-09-20 09:03:19,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2024-09-20 09:03:38,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=856460.0, ans=0.125 2024-09-20 09:03:41,078 INFO [train.py:1198] (1/2) Epoch 48, batch 1450, loss[loss=0.2389, ctc_loss=0.09963, cr_loss=0.3289, attn_decoder_loss=0.247, over 29424.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1076, cr_loss=0.3455, attn_decoder_loss=0.2371, over 5805309.54 frames. ], batch size: 94, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:03:52,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.40 vs. limit=15.0 2024-09-20 09:03:58,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=856540.0, ans=0.2 2024-09-20 09:04:02,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.708e+01 9.120e+01 9.678e+01 1.766e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-20 09:04:03,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=856540.0, ans=0.0 2024-09-20 09:04:09,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=856580.0, ans=0.125 2024-09-20 09:04:26,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=856620.0, ans=0.125 2024-09-20 09:04:49,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.09 vs. limit=22.5 2024-09-20 09:04:58,579 INFO [train.py:1198] (1/2) Epoch 48, batch 1500, loss[loss=0.2405, ctc_loss=0.1056, cr_loss=0.3294, attn_decoder_loss=0.2482, over 29623.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1077, cr_loss=0.3458, attn_decoder_loss=0.2374, over 5805172.51 frames. ], batch size: 86, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:05:00,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=856700.0, ans=0.125 2024-09-20 09:05:15,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=856740.0, ans=0.0 2024-09-20 09:05:18,658 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:05:21,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=856740.0, ans=0.0 2024-09-20 09:05:30,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=856780.0, ans=0.125 2024-09-20 09:05:38,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.59 vs. limit=22.5 2024-09-20 09:05:41,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=856780.0, ans=0.05 2024-09-20 09:06:13,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-20 09:06:17,038 INFO [train.py:1198] (1/2) Epoch 48, batch 1550, loss[loss=0.2355, ctc_loss=0.1097, cr_loss=0.35, attn_decoder_loss=0.2416, over 29514.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1078, cr_loss=0.3457, attn_decoder_loss=0.2372, over 5781196.23 frames. ], batch size: 90, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:06:30,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=856940.0, ans=0.0 2024-09-20 09:06:35,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=856940.0, ans=0.0 2024-09-20 09:06:38,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.059e+01 8.748e+01 9.189e+01 9.595e+01 2.151e+02, threshold=1.838e+02, percent-clipped=1.0 2024-09-20 09:07:31,859 INFO [train.py:1198] (1/2) Epoch 48, batch 1600, loss[loss=0.2355, ctc_loss=0.1059, cr_loss=0.3512, attn_decoder_loss=0.2421, over 29659.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1076, cr_loss=0.3453, attn_decoder_loss=0.2367, over 5765535.37 frames. ], batch size: 85, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:07:33,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=857100.0, ans=0.0 2024-09-20 09:07:51,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=857140.0, ans=0.05 2024-09-20 09:08:49,422 INFO [train.py:1198] (1/2) Epoch 48, batch 1650, loss[loss=0.2503, ctc_loss=0.1206, cr_loss=0.3647, attn_decoder_loss=0.2566, over 29697.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1076, cr_loss=0.3452, attn_decoder_loss=0.2366, over 5759324.33 frames. ], batch size: 89, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:09:09,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.12 vs. limit=22.5 2024-09-20 09:09:10,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.669e+01 9.204e+01 9.828e+01 1.752e+02, threshold=1.841e+02, percent-clipped=0.0 2024-09-20 09:09:33,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=857420.0, ans=0.0 2024-09-20 09:09:47,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=857420.0, ans=0.125 2024-09-20 09:09:53,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857460.0, ans=0.1 2024-09-20 09:10:07,180 INFO [train.py:1198] (1/2) Epoch 48, batch 1700, loss[loss=0.2005, ctc_loss=0.08553, cr_loss=0.2933, attn_decoder_loss=0.2068, over 29539.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1073, cr_loss=0.3448, attn_decoder_loss=0.2364, over 5779436.13 frames. ], batch size: 69, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:10:08,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=857500.0, ans=0.125 2024-09-20 09:10:15,150 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:10:16,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=857500.0, ans=0.0 2024-09-20 09:10:18,192 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:10:30,395 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:10:50,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2024-09-20 09:11:23,280 INFO [train.py:1198] (1/2) Epoch 48, batch 1750, loss[loss=0.2074, ctc_loss=0.09606, cr_loss=0.3261, attn_decoder_loss=0.2125, over 29322.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1072, cr_loss=0.345, attn_decoder_loss=0.2361, over 5787045.40 frames. ], batch size: 67, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:11:35,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=857700.0, ans=0.125 2024-09-20 09:11:43,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=857740.0, ans=0.125 2024-09-20 09:11:44,442 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.682e+01 9.026e+01 9.554e+01 1.464e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-20 09:11:56,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=857780.0, ans=0.0 2024-09-20 09:12:13,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=857820.0, ans=0.125 2024-09-20 09:12:18,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.55 vs. limit=15.0 2024-09-20 09:12:40,542 INFO [train.py:1198] (1/2) Epoch 48, batch 1800, loss[loss=0.2422, ctc_loss=0.1145, cr_loss=0.3639, attn_decoder_loss=0.2483, over 29685.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1072, cr_loss=0.3454, attn_decoder_loss=0.2363, over 5790576.88 frames. ], batch size: 83, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:12:40,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=857900.0, ans=0.5 2024-09-20 09:12:40,999 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:12:43,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=857900.0, ans=0.0 2024-09-20 09:12:48,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=857900.0, ans=0.125 2024-09-20 09:12:55,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=857940.0, ans=0.125 2024-09-20 09:13:00,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=857940.0, ans=0.07 2024-09-20 09:13:11,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.12 vs. limit=15.0 2024-09-20 09:13:58,142 INFO [train.py:1198] (1/2) Epoch 48, batch 1850, loss[loss=0.2475, ctc_loss=0.1228, cr_loss=0.3764, attn_decoder_loss=0.253, over 29621.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1076, cr_loss=0.3461, attn_decoder_loss=0.2365, over 5797645.45 frames. ], batch size: 86, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:14:00,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.16 vs. limit=10.0 2024-09-20 09:14:04,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=858100.0, ans=0.125 2024-09-20 09:14:19,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.577e+01 9.244e+01 9.733e+01 2.629e+02, threshold=1.849e+02, percent-clipped=1.0 2024-09-20 09:14:27,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=858180.0, ans=0.125 2024-09-20 09:14:30,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=858180.0, ans=0.125 2024-09-20 09:14:40,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858180.0, ans=0.1 2024-09-20 09:14:54,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=858220.0, ans=0.0 2024-09-20 09:14:54,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.78 vs. limit=12.0 2024-09-20 09:15:07,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=858260.0, ans=0.125 2024-09-20 09:15:13,435 INFO [train.py:1198] (1/2) Epoch 48, batch 1900, loss[loss=0.2429, ctc_loss=0.112, cr_loss=0.3516, attn_decoder_loss=0.2496, over 29708.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1078, cr_loss=0.3464, attn_decoder_loss=0.237, over 5804819.68 frames. ], batch size: 89, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:16:07,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=858420.0, ans=0.0 2024-09-20 09:16:07,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=858420.0, ans=0.125 2024-09-20 09:16:12,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=858420.0, ans=0.125 2024-09-20 09:16:20,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-09-20 09:16:30,249 INFO [train.py:1198] (1/2) Epoch 48, batch 1950, loss[loss=0.2196, ctc_loss=0.1014, cr_loss=0.3454, attn_decoder_loss=0.225, over 29461.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1082, cr_loss=0.3475, attn_decoder_loss=0.2379, over 5819554.38 frames. ], batch size: 78, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:16:53,365 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.839e+01 9.358e+01 9.818e+01 1.771e+02, threshold=1.872e+02, percent-clipped=0.0 2024-09-20 09:16:54,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=858540.0, ans=22.5 2024-09-20 09:17:07,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858580.0, ans=0.1 2024-09-20 09:17:12,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-09-20 09:17:13,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=858580.0, ans=0.0 2024-09-20 09:17:30,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=858620.0, ans=0.0 2024-09-20 09:17:49,893 INFO [train.py:1198] (1/2) Epoch 48, batch 2000, loss[loss=0.201, ctc_loss=0.08404, cr_loss=0.2885, attn_decoder_loss=0.2076, over 29339.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1083, cr_loss=0.3474, attn_decoder_loss=0.2379, over 5798589.72 frames. ], batch size: 67, lr: 2.29e-03, grad_scale: 32.0 2024-09-20 09:17:51,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=858700.0, ans=0.2 2024-09-20 09:18:05,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=858740.0, ans=0.0 2024-09-20 09:18:49,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=858860.0, ans=0.125 2024-09-20 09:18:50,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=858860.0, ans=0.125 2024-09-20 09:18:51,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=858860.0, ans=0.2 2024-09-20 09:18:59,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=858860.0, ans=0.0 2024-09-20 09:19:05,497 INFO [train.py:1198] (1/2) Epoch 48, batch 2050, loss[loss=0.2095, ctc_loss=0.08499, cr_loss=0.2997, attn_decoder_loss=0.2167, over 29441.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1077, cr_loss=0.3458, attn_decoder_loss=0.237, over 5790150.11 frames. ], batch size: 70, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:19:07,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=858900.0, ans=0.125 2024-09-20 09:19:08,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=858900.0, ans=0.125 2024-09-20 09:19:12,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-20 09:19:25,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=858940.0, ans=0.0 2024-09-20 09:19:28,040 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.559e+01 9.116e+01 9.582e+01 1.621e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-20 09:20:04,220 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:20:08,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=859060.0, ans=0.07 2024-09-20 09:20:17,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=859060.0, ans=0.2 2024-09-20 09:20:20,537 INFO [train.py:1198] (1/2) Epoch 48, batch 2100, loss[loss=0.2312, ctc_loss=0.1047, cr_loss=0.3523, attn_decoder_loss=0.2374, over 29752.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1075, cr_loss=0.3456, attn_decoder_loss=0.2366, over 5801461.55 frames. ], batch size: 81, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:20:26,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859100.0, ans=0.1 2024-09-20 09:20:39,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.79 vs. limit=15.0 2024-09-20 09:20:45,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=859140.0, ans=0.025 2024-09-20 09:21:10,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=859220.0, ans=0.125 2024-09-20 09:21:17,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.28 vs. limit=15.0 2024-09-20 09:21:29,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=859260.0, ans=0.125 2024-09-20 09:21:32,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=859260.0, ans=0.0 2024-09-20 09:21:37,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=859260.0, ans=0.125 2024-09-20 09:21:40,092 INFO [train.py:1198] (1/2) Epoch 48, batch 2150, loss[loss=0.2258, ctc_loss=0.1071, cr_loss=0.3491, attn_decoder_loss=0.2313, over 29439.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1072, cr_loss=0.345, attn_decoder_loss=0.2362, over 5814949.91 frames. ], batch size: 78, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:21:44,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=859300.0, ans=0.1 2024-09-20 09:21:56,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2024-09-20 09:22:02,727 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.576e+01 8.993e+01 9.601e+01 1.335e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-20 09:22:16,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859380.0, ans=0.1 2024-09-20 09:22:16,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=859380.0, ans=0.025 2024-09-20 09:22:39,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=859460.0, ans=0.0 2024-09-20 09:22:43,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=859460.0, ans=0.125 2024-09-20 09:22:55,663 INFO [train.py:1198] (1/2) Epoch 48, batch 2200, loss[loss=0.2409, ctc_loss=0.107, cr_loss=0.3448, attn_decoder_loss=0.2482, over 29617.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1077, cr_loss=0.346, attn_decoder_loss=0.2365, over 5812033.29 frames. ], batch size: 86, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:23:17,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.41 vs. limit=15.0 2024-09-20 09:23:35,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=859580.0, ans=0.0 2024-09-20 09:23:40,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2024-09-20 09:23:41,064 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:23:46,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2024-09-20 09:23:53,028 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:23:55,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=859660.0, ans=0.0 2024-09-20 09:24:00,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=859660.0, ans=0.0 2024-09-20 09:24:10,683 INFO [train.py:1198] (1/2) Epoch 48, batch 2250, loss[loss=0.2417, ctc_loss=0.1169, cr_loss=0.3556, attn_decoder_loss=0.2476, over 29710.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.107, cr_loss=0.3445, attn_decoder_loss=0.2361, over 5811872.23 frames. ], batch size: 82, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:24:32,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859740.0, ans=0.1 2024-09-20 09:24:35,429 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.683e+01 9.115e+01 9.671e+01 7.163e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-20 09:25:01,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=859820.0, ans=0.2 2024-09-20 09:25:21,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=859860.0, ans=0.0 2024-09-20 09:25:30,677 INFO [train.py:1198] (1/2) Epoch 48, batch 2300, loss[loss=0.2063, ctc_loss=0.09438, cr_loss=0.3221, attn_decoder_loss=0.2116, over 29322.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1067, cr_loss=0.3439, attn_decoder_loss=0.2354, over 5798276.93 frames. ], batch size: 71, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:25:41,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=859900.0, ans=0.0 2024-09-20 09:25:45,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859940.0, ans=0.1 2024-09-20 09:26:05,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=859980.0, ans=0.125 2024-09-20 09:26:07,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.95 vs. limit=22.5 2024-09-20 09:26:34,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=860060.0, ans=0.0 2024-09-20 09:26:46,311 INFO [train.py:1198] (1/2) Epoch 48, batch 2350, loss[loss=0.2343, ctc_loss=0.1126, cr_loss=0.3593, attn_decoder_loss=0.2399, over 29684.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1071, cr_loss=0.3449, attn_decoder_loss=0.2357, over 5804426.15 frames. ], batch size: 83, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:26:49,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=860100.0, ans=0.125 2024-09-20 09:26:53,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=860100.0, ans=0.1 2024-09-20 09:27:10,171 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.540e+01 9.100e+01 9.543e+01 1.555e+02, threshold=1.820e+02, percent-clipped=0.0 2024-09-20 09:27:34,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=860220.0, ans=0.125 2024-09-20 09:27:46,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=860260.0, ans=0.125 2024-09-20 09:28:01,993 INFO [train.py:1198] (1/2) Epoch 48, batch 2400, loss[loss=0.2227, ctc_loss=0.1047, cr_loss=0.3558, attn_decoder_loss=0.2279, over 29535.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1077, cr_loss=0.3462, attn_decoder_loss=0.2362, over 5807889.15 frames. ], batch size: 76, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:28:36,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=860380.0, ans=0.2 2024-09-20 09:28:45,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=860380.0, ans=0.125 2024-09-20 09:28:46,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=860380.0, ans=0.125 2024-09-20 09:28:59,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=860420.0, ans=0.0 2024-09-20 09:29:04,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=860420.0, ans=0.125 2024-09-20 09:29:08,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=860460.0, ans=0.125 2024-09-20 09:29:11,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=860460.0, ans=0.125 2024-09-20 09:29:14,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=860460.0, ans=0.0 2024-09-20 09:29:21,827 INFO [train.py:1198] (1/2) Epoch 48, batch 2450, loss[loss=0.2391, ctc_loss=0.1148, cr_loss=0.3808, attn_decoder_loss=0.2445, over 29675.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1081, cr_loss=0.347, attn_decoder_loss=0.2372, over 5785344.29 frames. ], batch size: 82, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:29:30,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=860500.0, ans=0.2 2024-09-20 09:29:45,554 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 8.875e+01 9.472e+01 1.005e+02 1.888e+02, threshold=1.894e+02, percent-clipped=1.0 2024-09-20 09:29:54,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=860580.0, ans=0.125 2024-09-20 09:30:19,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=860620.0, ans=0.025 2024-09-20 09:30:21,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=860660.0, ans=0.025 2024-09-20 09:30:27,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.28 vs. limit=22.5 2024-09-20 09:30:36,885 INFO [train.py:1198] (1/2) Epoch 48, batch 2500, loss[loss=0.236, ctc_loss=0.0978, cr_loss=0.3234, attn_decoder_loss=0.2442, over 29616.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1081, cr_loss=0.3473, attn_decoder_loss=0.2372, over 5794802.87 frames. ], batch size: 86, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:30:40,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=860700.0, ans=0.1 2024-09-20 09:30:43,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.70 vs. limit=15.0 2024-09-20 09:30:46,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.61 vs. limit=22.5 2024-09-20 09:31:15,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860780.0, ans=0.1 2024-09-20 09:31:34,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.71 vs. limit=22.5 2024-09-20 09:31:35,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.36 vs. limit=15.0 2024-09-20 09:31:41,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=860860.0, ans=0.125 2024-09-20 09:31:47,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=860860.0, ans=0.0 2024-09-20 09:31:53,216 INFO [train.py:1198] (1/2) Epoch 48, batch 2550, loss[loss=0.2113, ctc_loss=0.1057, cr_loss=0.3512, attn_decoder_loss=0.2153, over 29325.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1082, cr_loss=0.3472, attn_decoder_loss=0.2371, over 5797279.12 frames. ], batch size: 67, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:31:53,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=860900.0, ans=0.0 2024-09-20 09:32:07,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=860940.0, ans=0.0 2024-09-20 09:32:14,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860940.0, ans=0.1 2024-09-20 09:32:18,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.740e+01 9.125e+01 9.570e+01 1.327e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-20 09:32:45,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=861020.0, ans=0.2 2024-09-20 09:33:12,680 INFO [train.py:1198] (1/2) Epoch 48, batch 2600, loss[loss=0.2195, ctc_loss=0.09165, cr_loss=0.305, attn_decoder_loss=0.2269, over 29454.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1084, cr_loss=0.3474, attn_decoder_loss=0.2372, over 5794295.54 frames. ], batch size: 78, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:33:15,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=861100.0, ans=0.025 2024-09-20 09:33:50,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=12.0 2024-09-20 09:33:58,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=861220.0, ans=10.0 2024-09-20 09:34:23,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=861260.0, ans=0.025 2024-09-20 09:34:24,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=861260.0, ans=0.0 2024-09-20 09:34:27,392 INFO [train.py:1198] (1/2) Epoch 48, batch 2650, loss[loss=0.241, ctc_loss=0.1142, cr_loss=0.3576, attn_decoder_loss=0.2472, over 29228.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1085, cr_loss=0.3468, attn_decoder_loss=0.2373, over 5800907.81 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:34:53,030 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.627e+01 9.156e+01 9.635e+01 1.174e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-20 09:35:24,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=861420.0, ans=0.2 2024-09-20 09:35:33,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=861460.0, ans=0.125 2024-09-20 09:35:42,638 INFO [train.py:1198] (1/2) Epoch 48, batch 2700, loss[loss=0.2341, ctc_loss=0.1157, cr_loss=0.3549, attn_decoder_loss=0.2393, over 29501.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1086, cr_loss=0.3468, attn_decoder_loss=0.2376, over 5796589.44 frames. ], batch size: 87, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:35:46,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=861500.0, ans=0.125 2024-09-20 09:36:24,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=861580.0, ans=0.125 2024-09-20 09:36:29,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-09-20 09:36:32,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=861620.0, ans=0.125 2024-09-20 09:36:45,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=861620.0, ans=0.025 2024-09-20 09:37:03,345 INFO [train.py:1198] (1/2) Epoch 48, batch 2750, loss[loss=0.2143, ctc_loss=0.09691, cr_loss=0.3164, attn_decoder_loss=0.2203, over 29502.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1079, cr_loss=0.3454, attn_decoder_loss=0.2366, over 5795666.58 frames. ], batch size: 75, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:37:06,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=861700.0, ans=0.0 2024-09-20 09:37:28,906 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.868e+01 9.360e+01 1.005e+02 2.892e+02, threshold=1.872e+02, percent-clipped=3.0 2024-09-20 09:37:35,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=861780.0, ans=0.125 2024-09-20 09:37:54,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=861820.0, ans=0.125 2024-09-20 09:37:56,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=861820.0, ans=0.125 2024-09-20 09:38:02,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=861860.0, ans=0.1 2024-09-20 09:38:11,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=861860.0, ans=0.0 2024-09-20 09:38:18,880 INFO [train.py:1198] (1/2) Epoch 48, batch 2800, loss[loss=0.2534, ctc_loss=0.128, cr_loss=0.3625, attn_decoder_loss=0.2593, over 19800.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1083, cr_loss=0.3461, attn_decoder_loss=0.2369, over 5776123.39 frames. ], batch size: 209, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:38:20,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=861900.0, ans=0.0 2024-09-20 09:38:29,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861900.0, ans=0.1 2024-09-20 09:38:35,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861940.0, ans=0.1 2024-09-20 09:38:41,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=861940.0, ans=0.07 2024-09-20 09:38:46,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.74 vs. limit=12.0 2024-09-20 09:39:01,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2024-09-20 09:39:04,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2024-09-20 09:39:07,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=862020.0, ans=0.125 2024-09-20 09:39:10,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=862020.0, ans=0.125 2024-09-20 09:39:10,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=862020.0, ans=0.2 2024-09-20 09:39:28,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=862060.0, ans=0.125 2024-09-20 09:39:34,248 INFO [train.py:1198] (1/2) Epoch 48, batch 2850, loss[loss=0.2312, ctc_loss=0.1088, cr_loss=0.3588, attn_decoder_loss=0.2369, over 29518.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1086, cr_loss=0.3471, attn_decoder_loss=0.2372, over 5761619.03 frames. ], batch size: 77, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:39:36,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=862100.0, ans=0.125 2024-09-20 09:39:37,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=862100.0, ans=0.2 2024-09-20 09:39:39,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=862100.0, ans=0.04949747468305833 2024-09-20 09:39:39,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=15.0 2024-09-20 09:39:51,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=862140.0, ans=0.2 2024-09-20 09:39:52,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=862140.0, ans=0.0 2024-09-20 09:39:55,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=862140.0, ans=0.125 2024-09-20 09:39:59,995 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.779e+01 9.246e+01 9.697e+01 4.650e+02, threshold=1.849e+02, percent-clipped=1.0 2024-09-20 09:40:25,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=862220.0, ans=0.0 2024-09-20 09:40:46,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-20 09:40:53,965 INFO [train.py:1198] (1/2) Epoch 48, batch 2900, loss[loss=0.2282, ctc_loss=0.1023, cr_loss=0.3499, attn_decoder_loss=0.2344, over 29415.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1092, cr_loss=0.349, attn_decoder_loss=0.2382, over 5788184.79 frames. ], batch size: 79, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:41:00,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=862300.0, ans=0.125 2024-09-20 09:41:07,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=862340.0, ans=0.1 2024-09-20 09:41:24,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=862380.0, ans=0.05 2024-09-20 09:41:32,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=862380.0, ans=0.125 2024-09-20 09:41:39,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=12.0 2024-09-20 09:41:58,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=862460.0, ans=0.2 2024-09-20 09:41:58,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=862460.0, ans=0.0 2024-09-20 09:41:58,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=862460.0, ans=0.2 2024-09-20 09:42:07,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=862460.0, ans=0.125 2024-09-20 09:42:10,010 INFO [train.py:1198] (1/2) Epoch 48, batch 2950, loss[loss=0.2217, ctc_loss=0.09924, cr_loss=0.3295, attn_decoder_loss=0.228, over 29507.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1082, cr_loss=0.3469, attn_decoder_loss=0.2371, over 5781972.42 frames. ], batch size: 75, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:42:30,170 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:42:37,510 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.743e+01 9.257e+01 9.610e+01 1.643e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-20 09:42:39,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=862580.0, ans=0.0 2024-09-20 09:42:43,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=862580.0, ans=0.1 2024-09-20 09:43:00,266 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:43:25,735 INFO [train.py:1198] (1/2) Epoch 48, batch 3000, loss[loss=0.2279, ctc_loss=0.1024, cr_loss=0.3392, attn_decoder_loss=0.2343, over 29751.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1079, cr_loss=0.3464, attn_decoder_loss=0.2368, over 5782898.00 frames. ], batch size: 81, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:43:25,735 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 09:43:41,475 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.3013, 6.2081, 5.6011, 5.9684], device='cuda:1') 2024-09-20 09:43:44,030 INFO [train.py:1230] (1/2) Epoch 48, validation: loss=0.2127, ctc_loss=0.03675, cr_loss=6.55e-15, attn_decoder_loss=0.2323, over 944034.00 frames. 2024-09-20 09:43:44,030 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 09:43:44,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=862700.0, ans=0.125 2024-09-20 09:44:31,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=862780.0, ans=0.2 2024-09-20 09:44:49,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=862860.0, ans=0.0 2024-09-20 09:44:52,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.09 vs. limit=15.0 2024-09-20 09:44:58,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=862860.0, ans=0.125 2024-09-20 09:44:59,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=862860.0, ans=0.125 2024-09-20 09:45:04,011 INFO [train.py:1198] (1/2) Epoch 48, batch 3050, loss[loss=0.2175, ctc_loss=0.09267, cr_loss=0.3249, attn_decoder_loss=0.2241, over 29501.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1086, cr_loss=0.3477, attn_decoder_loss=0.2376, over 5777544.85 frames. ], batch size: 76, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:45:23,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=862940.0, ans=0.04949747468305833 2024-09-20 09:45:26,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2024-09-20 09:45:31,010 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 8.880e+01 9.329e+01 1.001e+02 1.444e+02, threshold=1.866e+02, percent-clipped=0.0 2024-09-20 09:45:32,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=862980.0, ans=0.2 2024-09-20 09:46:18,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=863100.0, ans=0.0 2024-09-20 09:46:19,417 INFO [train.py:1198] (1/2) Epoch 48, batch 3100, loss[loss=0.2407, ctc_loss=0.1115, cr_loss=0.3581, attn_decoder_loss=0.2471, over 29255.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1084, cr_loss=0.3469, attn_decoder_loss=0.2373, over 5777365.58 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:46:21,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.82 vs. limit=10.0 2024-09-20 09:46:54,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=863180.0, ans=0.125 2024-09-20 09:47:03,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=863220.0, ans=0.0 2024-09-20 09:47:35,294 INFO [train.py:1198] (1/2) Epoch 48, batch 3150, loss[loss=0.2426, ctc_loss=0.1087, cr_loss=0.3379, attn_decoder_loss=0.25, over 28764.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1083, cr_loss=0.3467, attn_decoder_loss=0.2372, over 5782987.05 frames. ], batch size: 104, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:47:42,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=22.5 2024-09-20 09:47:52,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=863340.0, ans=0.0 2024-09-20 09:47:55,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=863340.0, ans=0.0 2024-09-20 09:48:06,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.651e+01 9.014e+01 9.549e+01 1.887e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-20 09:48:06,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=863340.0, ans=0.0 2024-09-20 09:48:18,053 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.65 vs. limit=10.0 2024-09-20 09:48:48,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.11 vs. limit=15.0 2024-09-20 09:48:50,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=863460.0, ans=0.0 2024-09-20 09:48:54,865 INFO [train.py:1198] (1/2) Epoch 48, batch 3200, loss[loss=0.2318, ctc_loss=0.1203, cr_loss=0.3695, attn_decoder_loss=0.236, over 29412.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1079, cr_loss=0.3462, attn_decoder_loss=0.2367, over 5792352.01 frames. ], batch size: 79, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:48:55,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=863500.0, ans=0.07 2024-09-20 09:49:21,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.50 vs. limit=15.0 2024-09-20 09:50:10,579 INFO [train.py:1198] (1/2) Epoch 48, batch 3250, loss[loss=0.2505, ctc_loss=0.1282, cr_loss=0.3886, attn_decoder_loss=0.2555, over 29711.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1083, cr_loss=0.3471, attn_decoder_loss=0.2374, over 5799864.59 frames. ], batch size: 84, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:50:15,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=863700.0, ans=0.1 2024-09-20 09:50:33,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=12.0 2024-09-20 09:50:37,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.777e+01 9.225e+01 9.680e+01 2.463e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-20 09:50:47,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=863780.0, ans=0.0 2024-09-20 09:50:56,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=863820.0, ans=0.125 2024-09-20 09:50:59,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=863820.0, ans=0.125 2024-09-20 09:51:05,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.90 vs. limit=15.0 2024-09-20 09:51:26,310 INFO [train.py:1198] (1/2) Epoch 48, batch 3300, loss[loss=0.2388, ctc_loss=0.1103, cr_loss=0.3469, attn_decoder_loss=0.2454, over 28316.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1075, cr_loss=0.3454, attn_decoder_loss=0.2362, over 5797433.61 frames. ], batch size: 111, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:51:41,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=863940.0, ans=0.1 2024-09-20 09:51:56,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=863940.0, ans=0.0 2024-09-20 09:52:26,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=864020.0, ans=0.2 2024-09-20 09:52:33,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=864020.0, ans=0.125 2024-09-20 09:52:52,957 INFO [train.py:1198] (1/2) Epoch 48, batch 3350, loss[loss=0.2377, ctc_loss=0.1064, cr_loss=0.324, attn_decoder_loss=0.245, over 28804.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1079, cr_loss=0.3461, attn_decoder_loss=0.2368, over 5774126.54 frames. ], batch size: 104, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:53:20,156 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.868e+01 9.379e+01 9.923e+01 1.602e+02, threshold=1.876e+02, percent-clipped=0.0 2024-09-20 09:53:20,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=864140.0, ans=0.125 2024-09-20 09:53:24,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=864180.0, ans=0.125 2024-09-20 09:53:34,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.74 vs. limit=22.5 2024-09-20 09:53:35,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=864180.0, ans=0.125 2024-09-20 09:53:40,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=864220.0, ans=0.2 2024-09-20 09:53:42,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=864220.0, ans=0.125 2024-09-20 09:53:56,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=864260.0, ans=0.2 2024-09-20 09:53:59,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=864260.0, ans=0.2 2024-09-20 09:54:08,443 INFO [train.py:1198] (1/2) Epoch 48, batch 3400, loss[loss=0.2052, ctc_loss=0.08525, cr_loss=0.2995, attn_decoder_loss=0.2118, over 29354.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1076, cr_loss=0.3458, attn_decoder_loss=0.2366, over 5765113.10 frames. ], batch size: 67, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:54:17,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=864300.0, ans=0.2 2024-09-20 09:54:22,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=864340.0, ans=0.0 2024-09-20 09:54:24,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=864340.0, ans=0.0 2024-09-20 09:55:09,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=864460.0, ans=0.125 2024-09-20 09:55:23,988 INFO [train.py:1198] (1/2) Epoch 48, batch 3450, loss[loss=0.2346, ctc_loss=0.1035, cr_loss=0.3368, attn_decoder_loss=0.2416, over 28173.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1075, cr_loss=0.3459, attn_decoder_loss=0.237, over 5772586.54 frames. ], batch size: 111, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:55:42,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=864540.0, ans=0.09899494936611666 2024-09-20 09:55:55,191 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.532e+01 9.118e+01 9.502e+01 1.543e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-20 09:55:59,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.64 vs. limit=6.0 2024-09-20 09:56:22,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=864620.0, ans=0.2 2024-09-20 09:56:35,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-09-20 09:56:43,110 INFO [train.py:1198] (1/2) Epoch 48, batch 3500, loss[loss=0.2142, ctc_loss=0.0957, cr_loss=0.3154, attn_decoder_loss=0.2204, over 29348.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1072, cr_loss=0.3452, attn_decoder_loss=0.2364, over 5774683.90 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:57:10,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864740.0, ans=0.1 2024-09-20 09:57:19,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-09-20 09:57:22,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=22.5 2024-09-20 09:57:24,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=864780.0, ans=0.025 2024-09-20 09:57:31,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.83 vs. limit=12.0 2024-09-20 09:57:43,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=864860.0, ans=0.125 2024-09-20 09:57:46,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=864860.0, ans=0.125 2024-09-20 09:57:58,094 INFO [train.py:1198] (1/2) Epoch 48, batch 3550, loss[loss=0.241, ctc_loss=0.1112, cr_loss=0.3601, attn_decoder_loss=0.2475, over 29701.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1068, cr_loss=0.3443, attn_decoder_loss=0.2361, over 5782443.77 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:58:12,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=864940.0, ans=22.5 2024-09-20 09:58:20,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864940.0, ans=0.1 2024-09-20 09:58:24,720 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.565e+01 9.018e+01 9.505e+01 1.694e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-20 09:58:31,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=864980.0, ans=0.0 2024-09-20 09:58:44,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=865020.0, ans=0.1 2024-09-20 09:58:57,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=865060.0, ans=0.02 2024-09-20 09:59:03,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865060.0, ans=0.1 2024-09-20 09:59:12,458 INFO [train.py:1198] (1/2) Epoch 48, batch 3600, loss[loss=0.2356, ctc_loss=0.1079, cr_loss=0.3496, attn_decoder_loss=0.2421, over 29505.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1071, cr_loss=0.3453, attn_decoder_loss=0.2364, over 5790717.79 frames. ], batch size: 77, lr: 2.28e-03, grad_scale: 32.0 2024-09-20 09:59:47,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-09-20 10:00:15,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=22.5 2024-09-20 10:00:20,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=865260.0, ans=0.0 2024-09-20 10:00:25,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2024-09-20 10:00:26,303 INFO [train.py:1198] (1/2) Epoch 48, batch 3650, loss[loss=0.251, ctc_loss=0.1253, cr_loss=0.3763, attn_decoder_loss=0.2566, over 29486.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1065, cr_loss=0.344, attn_decoder_loss=0.2358, over 5793401.93 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:00:54,248 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.368e+01 8.575e+01 9.071e+01 9.730e+01 1.168e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 10:00:56,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=865380.0, ans=0.125 2024-09-20 10:01:14,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=12.0 2024-09-20 10:01:28,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=865460.0, ans=0.125 2024-09-20 10:01:44,217 INFO [train.py:1198] (1/2) Epoch 48, batch 3700, loss[loss=0.2378, ctc_loss=0.1103, cr_loss=0.3498, attn_decoder_loss=0.2441, over 29713.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1068, cr_loss=0.3448, attn_decoder_loss=0.2359, over 5803672.56 frames. ], batch size: 84, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:01:46,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=865500.0, ans=0.125 2024-09-20 10:02:24,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.39 vs. limit=15.0 2024-09-20 10:02:39,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=865620.0, ans=0.125 2024-09-20 10:02:50,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=865660.0, ans=0.2 2024-09-20 10:02:52,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=865660.0, ans=0.125 2024-09-20 10:02:57,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=865700.0, ans=0.0 2024-09-20 10:02:58,586 INFO [train.py:1198] (1/2) Epoch 48, batch 3750, loss[loss=0.2117, ctc_loss=0.1014, cr_loss=0.3191, attn_decoder_loss=0.2168, over 29309.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1068, cr_loss=0.3446, attn_decoder_loss=0.2357, over 5807058.68 frames. ], batch size: 67, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:03:01,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=865700.0, ans=0.125 2024-09-20 10:03:07,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=865700.0, ans=0.125 2024-09-20 10:03:08,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2024-09-20 10:03:12,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=4.98 vs. limit=15.0 2024-09-20 10:03:19,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=865740.0, ans=0.125 2024-09-20 10:03:28,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.668e+01 9.150e+01 9.729e+01 2.139e+02, threshold=1.830e+02, percent-clipped=2.0 2024-09-20 10:03:49,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-09-20 10:03:51,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=865820.0, ans=0.0 2024-09-20 10:04:02,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=8.0 2024-09-20 10:04:04,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=865860.0, ans=0.2 2024-09-20 10:04:10,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=865860.0, ans=0.0 2024-09-20 10:04:12,899 INFO [train.py:1198] (1/2) Epoch 48, batch 3800, loss[loss=0.2439, ctc_loss=0.1171, cr_loss=0.3684, attn_decoder_loss=0.2498, over 29632.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1066, cr_loss=0.3436, attn_decoder_loss=0.2355, over 5796678.72 frames. ], batch size: 86, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:04:17,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=865900.0, ans=0.025 2024-09-20 10:04:39,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=865940.0, ans=0.125 2024-09-20 10:04:43,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2024-09-20 10:04:45,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865980.0, ans=0.1 2024-09-20 10:04:47,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=865980.0, ans=0.025 2024-09-20 10:04:53,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-09-20 10:05:27,016 INFO [train.py:1198] (1/2) Epoch 48, batch 3850, loss[loss=0.2452, ctc_loss=0.1127, cr_loss=0.3496, attn_decoder_loss=0.2521, over 29260.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1067, cr_loss=0.3438, attn_decoder_loss=0.2356, over 5811246.99 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:05:27,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=866100.0, ans=0.2 2024-09-20 10:05:37,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=866100.0, ans=0.2 2024-09-20 10:05:44,814 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:05:44,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=866140.0, ans=0.0 2024-09-20 10:05:49,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=866140.0, ans=0.0 2024-09-20 10:05:55,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=866180.0, ans=0.0 2024-09-20 10:05:55,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=866180.0, ans=0.125 2024-09-20 10:05:56,511 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.668e+01 9.090e+01 9.614e+01 1.900e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-20 10:06:02,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=866180.0, ans=0.125 2024-09-20 10:06:14,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-09-20 10:06:32,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=866260.0, ans=0.125 2024-09-20 10:06:32,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=866260.0, ans=0.07 2024-09-20 10:06:36,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=866260.0, ans=0.2 2024-09-20 10:06:40,954 INFO [train.py:1198] (1/2) Epoch 48, batch 3900, loss[loss=0.2361, ctc_loss=0.1057, cr_loss=0.3319, attn_decoder_loss=0.2432, over 29613.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1071, cr_loss=0.3445, attn_decoder_loss=0.2363, over 5815379.34 frames. ], batch size: 86, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:06:45,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=866300.0, ans=0.0 2024-09-20 10:06:47,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.87 vs. limit=12.0 2024-09-20 10:07:08,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-20 10:07:19,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=866380.0, ans=0.125 2024-09-20 10:07:32,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.85 vs. limit=10.0 2024-09-20 10:07:58,103 INFO [train.py:1198] (1/2) Epoch 48, batch 3950, loss[loss=0.2436, ctc_loss=0.1164, cr_loss=0.3563, attn_decoder_loss=0.2498, over 29534.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1071, cr_loss=0.3448, attn_decoder_loss=0.2365, over 5835016.30 frames. ], batch size: 97, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:07:58,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=866500.0, ans=0.04949747468305833 2024-09-20 10:08:27,414 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.623e+01 9.056e+01 9.623e+01 1.586e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-20 10:08:30,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=866580.0, ans=0.0 2024-09-20 10:08:31,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.02 vs. limit=10.0 2024-09-20 10:08:45,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=866620.0, ans=0.125 2024-09-20 10:08:59,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=866660.0, ans=0.125 2024-09-20 10:09:01,297 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:09:11,170 INFO [train.py:1198] (1/2) Epoch 48, batch 4000, loss[loss=0.2139, ctc_loss=0.08753, cr_loss=0.2989, attn_decoder_loss=0.2213, over 29520.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1071, cr_loss=0.344, attn_decoder_loss=0.2366, over 5812361.95 frames. ], batch size: 74, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:09:11,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=866700.0, ans=0.0 2024-09-20 10:09:23,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=866700.0, ans=0.125 2024-09-20 10:09:27,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=866740.0, ans=0.0 2024-09-20 10:09:34,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=866740.0, ans=0.5 2024-09-20 10:09:43,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=866780.0, ans=0.125 2024-09-20 10:09:52,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=866780.0, ans=10.0 2024-09-20 10:10:08,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=866860.0, ans=0.125 2024-09-20 10:10:24,660 INFO [train.py:1198] (1/2) Epoch 48, batch 4050, loss[loss=0.2401, ctc_loss=0.1138, cr_loss=0.3129, attn_decoder_loss=0.2472, over 20006.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.107, cr_loss=0.3432, attn_decoder_loss=0.2364, over 5796157.68 frames. ], batch size: 209, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:10:27,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=866900.0, ans=0.125 2024-09-20 10:10:37,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=866940.0, ans=0.125 2024-09-20 10:10:45,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=866940.0, ans=0.125 2024-09-20 10:10:49,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=866940.0, ans=0.0 2024-09-20 10:10:51,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=866940.0, ans=0.04949747468305833 2024-09-20 10:10:53,680 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 8.805e+01 9.236e+01 9.679e+01 1.942e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-20 10:10:55,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=866980.0, ans=0.2 2024-09-20 10:11:11,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=867020.0, ans=0.09899494936611666 2024-09-20 10:11:39,141 INFO [train.py:1198] (1/2) Epoch 48, batch 4100, loss[loss=0.2464, ctc_loss=0.1288, cr_loss=0.3848, attn_decoder_loss=0.2509, over 29524.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1075, cr_loss=0.3445, attn_decoder_loss=0.2368, over 5791480.74 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:11:39,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=867100.0, ans=0.0 2024-09-20 10:11:48,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=867100.0, ans=0.1 2024-09-20 10:12:11,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=867180.0, ans=0.2 2024-09-20 10:12:11,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.44 vs. limit=22.5 2024-09-20 10:12:21,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.96 vs. limit=15.0 2024-09-20 10:12:31,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-09-20 10:12:31,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.51 vs. limit=12.0 2024-09-20 10:12:37,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=867220.0, ans=0.125 2024-09-20 10:12:42,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-20 10:12:54,723 INFO [train.py:1198] (1/2) Epoch 48, batch 4150, loss[loss=0.23, ctc_loss=0.1077, cr_loss=0.352, attn_decoder_loss=0.2358, over 29493.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1077, cr_loss=0.345, attn_decoder_loss=0.2366, over 5796553.94 frames. ], batch size: 77, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:13:00,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=867300.0, ans=0.0 2024-09-20 10:13:12,885 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:13:22,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=867380.0, ans=0.0 2024-09-20 10:13:23,987 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.808e+01 9.166e+01 9.915e+01 1.612e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-20 10:13:39,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.37 vs. limit=10.0 2024-09-20 10:13:47,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=867420.0, ans=0.2 2024-09-20 10:13:59,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=867460.0, ans=0.0 2024-09-20 10:14:00,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=867460.0, ans=0.125 2024-09-20 10:14:05,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867460.0, ans=0.1 2024-09-20 10:14:08,055 INFO [train.py:1198] (1/2) Epoch 48, batch 4200, loss[loss=0.2459, ctc_loss=0.1211, cr_loss=0.3707, attn_decoder_loss=0.2515, over 29484.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1081, cr_loss=0.3465, attn_decoder_loss=0.2369, over 5798781.09 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:14:31,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2024-09-20 10:14:42,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.91 vs. limit=10.0 2024-09-20 10:14:45,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.33 vs. limit=22.5 2024-09-20 10:15:05,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=867660.0, ans=0.125 2024-09-20 10:15:18,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=867660.0, ans=0.0 2024-09-20 10:15:18,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=12.0 2024-09-20 10:15:22,556 INFO [train.py:1198] (1/2) Epoch 48, batch 4250, loss[loss=0.219, ctc_loss=0.1001, cr_loss=0.3204, attn_decoder_loss=0.2251, over 29514.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.108, cr_loss=0.3463, attn_decoder_loss=0.2371, over 5804182.47 frames. ], batch size: 74, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:15:41,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=867740.0, ans=0.125 2024-09-20 10:15:47,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=867740.0, ans=0.125 2024-09-20 10:15:54,014 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.733e+01 9.174e+01 9.868e+01 2.354e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-20 10:15:54,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=12.0 2024-09-20 10:15:56,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2024-09-20 10:16:00,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=867780.0, ans=0.5 2024-09-20 10:16:03,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867780.0, ans=0.1 2024-09-20 10:16:09,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=867820.0, ans=0.2 2024-09-20 10:16:22,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=867860.0, ans=0.125 2024-09-20 10:16:36,668 INFO [train.py:1198] (1/2) Epoch 48, batch 4300, loss[loss=0.2425, ctc_loss=0.1171, cr_loss=0.3679, attn_decoder_loss=0.2483, over 29541.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1076, cr_loss=0.3451, attn_decoder_loss=0.2371, over 5794786.82 frames. ], batch size: 87, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:16:53,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.39 vs. limit=22.5 2024-09-20 10:17:00,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=867940.0, ans=0.0 2024-09-20 10:17:02,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=867940.0, ans=0.125 2024-09-20 10:17:16,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=867980.0, ans=0.125 2024-09-20 10:17:23,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.16 vs. limit=15.0 2024-09-20 10:17:50,557 INFO [train.py:1198] (1/2) Epoch 48, batch 4350, loss[loss=0.243, ctc_loss=0.1086, cr_loss=0.3447, attn_decoder_loss=0.2502, over 29471.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1104, cr_loss=0.3518, attn_decoder_loss=0.2402, over 5797557.39 frames. ], batch size: 97, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:18:13,514 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:18:14,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=868140.0, ans=0.2 2024-09-20 10:18:14,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=868140.0, ans=0.125 2024-09-20 10:18:16,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=868140.0, ans=0.0 2024-09-20 10:18:20,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=868180.0, ans=0.2 2024-09-20 10:18:21,800 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.099e+01 9.551e+01 1.026e+02 1.775e+02, threshold=1.910e+02, percent-clipped=0.0 2024-09-20 10:18:28,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=22.5 2024-09-20 10:18:28,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2024-09-20 10:19:04,480 INFO [train.py:1198] (1/2) Epoch 48, batch 4400, loss[loss=0.2354, ctc_loss=0.1098, cr_loss=0.3448, attn_decoder_loss=0.2417, over 27591.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1112, cr_loss=0.3537, attn_decoder_loss=0.242, over 5767066.89 frames. ], batch size: 125, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:19:12,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=868300.0, ans=0.09899494936611666 2024-09-20 10:19:15,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=868300.0, ans=0.09899494936611666 2024-09-20 10:19:18,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=868340.0, ans=0.0 2024-09-20 10:19:32,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=868380.0, ans=0.1 2024-09-20 10:19:34,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2024-09-20 10:19:38,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=868380.0, ans=0.125 2024-09-20 10:19:40,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=868380.0, ans=0.125 2024-09-20 10:19:42,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=868380.0, ans=0.0 2024-09-20 10:19:54,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=868420.0, ans=0.0 2024-09-20 10:20:18,888 INFO [train.py:1198] (1/2) Epoch 48, batch 4450, loss[loss=0.2609, ctc_loss=0.1491, cr_loss=0.4081, attn_decoder_loss=0.2642, over 20462.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1147, cr_loss=0.3586, attn_decoder_loss=0.244, over 5572597.95 frames. ], batch size: 209, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:20:24,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.16 vs. limit=12.0 2024-09-20 10:20:25,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=868500.0, ans=0.0 2024-09-20 10:20:29,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=868500.0, ans=0.125 2024-09-20 10:20:34,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-09-20 10:20:37,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=868540.0, ans=0.125 2024-09-20 10:20:43,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=868540.0, ans=0.125 2024-09-20 10:20:51,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.32 vs. limit=15.0 2024-09-20 10:20:52,106 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.165e+01 9.214e+01 1.004e+02 1.130e+02 1.604e+02, threshold=2.007e+02, percent-clipped=0.0 2024-09-20 10:21:13,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-20 10:21:13,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-09-20 10:21:29,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=868660.0, ans=0.125 2024-09-20 10:21:33,682 INFO [train.py:1198] (1/2) Epoch 48, batch 4500, loss[loss=0.2508, ctc_loss=0.1356, cr_loss=0.3868, attn_decoder_loss=0.255, over 20881.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1171, cr_loss=0.3606, attn_decoder_loss=0.2455, over 5234422.88 frames. ], batch size: 209, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:21:45,813 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:21:58,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=12.0 2024-09-20 10:23:01,533 INFO [train.py:1198] (1/2) Epoch 49, batch 0, loss[loss=0.2151, ctc_loss=0.09674, cr_loss=0.3347, attn_decoder_loss=0.2209, over 29625.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.09674, cr_loss=0.3347, attn_decoder_loss=0.2209, over 29625.00 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:23:01,534 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 10:23:19,983 INFO [train.py:1230] (1/2) Epoch 49, validation: loss=0.2124, ctc_loss=0.03569, cr_loss=6.554e-15, attn_decoder_loss=0.2321, over 944034.00 frames. 2024-09-20 10:23:19,983 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 10:23:30,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=868800.0, ans=0.2 2024-09-20 10:23:31,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-09-20 10:23:49,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=868880.0, ans=0.1 2024-09-20 10:23:55,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.65 vs. limit=10.0 2024-09-20 10:24:17,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=868920.0, ans=0.125 2024-09-20 10:24:24,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=868960.0, ans=0.125 2024-09-20 10:24:24,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=868960.0, ans=0.025 2024-09-20 10:24:26,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=868960.0, ans=0.125 2024-09-20 10:24:32,177 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 9.535e+01 1.078e+02 1.164e+02 4.744e+02, threshold=2.156e+02, percent-clipped=1.0 2024-09-20 10:24:35,436 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:24:36,537 INFO [train.py:1198] (1/2) Epoch 49, batch 50, loss[loss=0.2073, ctc_loss=0.09304, cr_loss=0.3166, attn_decoder_loss=0.213, over 29402.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1098, cr_loss=0.352, attn_decoder_loss=0.2383, over 1268831.92 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:24:55,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=869040.0, ans=0.125 2024-09-20 10:25:11,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=869080.0, ans=0.1 2024-09-20 10:25:19,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=869080.0, ans=0.0 2024-09-20 10:25:19,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2024-09-20 10:25:41,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=869160.0, ans=0.125 2024-09-20 10:25:42,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.67 vs. limit=10.0 2024-09-20 10:25:53,941 INFO [train.py:1198] (1/2) Epoch 49, batch 100, loss[loss=0.2269, ctc_loss=0.1141, cr_loss=0.3541, attn_decoder_loss=0.2315, over 29530.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1101, cr_loss=0.3515, attn_decoder_loss=0.2397, over 2253439.42 frames. ], batch size: 76, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:25:57,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=869200.0, ans=0.125 2024-09-20 10:26:06,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=869200.0, ans=0.125 2024-09-20 10:26:26,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=4.76 vs. limit=15.0 2024-09-20 10:26:35,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.99 vs. limit=10.0 2024-09-20 10:27:05,479 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.667e+01 9.247e+01 9.821e+01 1.649e+02, threshold=1.849e+02, percent-clipped=0.0 2024-09-20 10:27:08,509 INFO [train.py:1198] (1/2) Epoch 49, batch 150, loss[loss=0.2114, ctc_loss=0.09767, cr_loss=0.3313, attn_decoder_loss=0.2167, over 29462.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1078, cr_loss=0.3458, attn_decoder_loss=0.2373, over 3047938.53 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:27:08,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=869400.0, ans=0.04949747468305833 2024-09-20 10:27:51,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=869480.0, ans=0.0 2024-09-20 10:28:09,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=869560.0, ans=0.125 2024-09-20 10:28:26,275 INFO [train.py:1198] (1/2) Epoch 49, batch 200, loss[loss=0.2548, ctc_loss=0.1249, cr_loss=0.3995, attn_decoder_loss=0.2603, over 27549.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1076, cr_loss=0.3459, attn_decoder_loss=0.2365, over 3661870.17 frames. ], batch size: 125, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:28:29,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2024-09-20 10:29:05,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=869680.0, ans=0.125 2024-09-20 10:29:37,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=869760.0, ans=0.125 2024-09-20 10:29:40,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.622e+01 9.248e+01 9.651e+01 1.394e+02, threshold=1.850e+02, percent-clipped=0.0 2024-09-20 10:29:41,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=22.5 2024-09-20 10:29:43,770 INFO [train.py:1198] (1/2) Epoch 49, batch 250, loss[loss=0.2404, ctc_loss=0.1079, cr_loss=0.3544, attn_decoder_loss=0.2472, over 29271.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1071, cr_loss=0.3449, attn_decoder_loss=0.2361, over 4144737.95 frames. ], batch size: 100, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:29:48,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=869800.0, ans=0.125 2024-09-20 10:30:08,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.45 vs. limit=15.0 2024-09-20 10:30:17,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=869880.0, ans=0.0 2024-09-20 10:30:17,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=869880.0, ans=0.2 2024-09-20 10:30:22,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.09 vs. limit=15.0 2024-09-20 10:30:23,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=869880.0, ans=0.0 2024-09-20 10:30:29,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.64 vs. limit=15.0 2024-09-20 10:30:30,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=869920.0, ans=0.1 2024-09-20 10:30:33,985 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:30:53,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=869960.0, ans=0.125 2024-09-20 10:30:55,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.97 vs. limit=22.5 2024-09-20 10:30:55,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.92 vs. limit=12.0 2024-09-20 10:30:59,202 INFO [train.py:1198] (1/2) Epoch 49, batch 300, loss[loss=0.2527, ctc_loss=0.1287, cr_loss=0.3835, attn_decoder_loss=0.258, over 29515.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.107, cr_loss=0.3446, attn_decoder_loss=0.2361, over 4511746.19 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:31:17,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=870040.0, ans=0.95 2024-09-20 10:31:51,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=870120.0, ans=0.125 2024-09-20 10:32:10,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=870160.0, ans=0.0 2024-09-20 10:32:11,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=15.0 2024-09-20 10:32:13,610 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.601e+01 9.011e+01 9.321e+01 1.888e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 10:32:16,512 INFO [train.py:1198] (1/2) Epoch 49, batch 350, loss[loss=0.2088, ctc_loss=0.08572, cr_loss=0.309, attn_decoder_loss=0.2156, over 29317.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1071, cr_loss=0.3452, attn_decoder_loss=0.2365, over 4797009.94 frames. ], batch size: 71, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:32:18,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=870200.0, ans=0.125 2024-09-20 10:32:22,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=870200.0, ans=0.125 2024-09-20 10:32:25,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=870200.0, ans=0.2 2024-09-20 10:32:25,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=870200.0, ans=10.0 2024-09-20 10:32:25,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=870200.0, ans=0.125 2024-09-20 10:32:34,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=870240.0, ans=0.125 2024-09-20 10:32:42,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2024-09-20 10:32:49,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=870280.0, ans=0.025 2024-09-20 10:33:19,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=870360.0, ans=0.125 2024-09-20 10:33:31,555 INFO [train.py:1198] (1/2) Epoch 49, batch 400, loss[loss=0.2315, ctc_loss=0.1098, cr_loss=0.3566, attn_decoder_loss=0.2371, over 29695.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1069, cr_loss=0.3446, attn_decoder_loss=0.2361, over 5026235.43 frames. ], batch size: 82, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:33:49,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=870440.0, ans=0.125 2024-09-20 10:33:50,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870440.0, ans=0.1 2024-09-20 10:33:53,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=870440.0, ans=0.0 2024-09-20 10:33:54,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.69 vs. limit=22.5 2024-09-20 10:33:59,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=870440.0, ans=0.125 2024-09-20 10:34:23,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.11 vs. limit=22.5 2024-09-20 10:34:25,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=870520.0, ans=0.0 2024-09-20 10:34:27,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=870520.0, ans=0.0 2024-09-20 10:34:37,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=870560.0, ans=0.125 2024-09-20 10:34:45,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=870560.0, ans=0.1 2024-09-20 10:34:46,411 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.552e+01 9.241e+01 9.788e+01 2.728e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-20 10:34:49,354 INFO [train.py:1198] (1/2) Epoch 49, batch 450, loss[loss=0.2276, ctc_loss=0.1012, cr_loss=0.3331, attn_decoder_loss=0.2343, over 29698.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1066, cr_loss=0.3443, attn_decoder_loss=0.2359, over 5188865.69 frames. ], batch size: 83, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:34:59,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=15.0 2024-09-20 10:35:26,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-09-20 10:35:57,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=870760.0, ans=15.0 2024-09-20 10:36:07,278 INFO [train.py:1198] (1/2) Epoch 49, batch 500, loss[loss=0.2414, ctc_loss=0.1121, cr_loss=0.3551, attn_decoder_loss=0.2479, over 29424.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1066, cr_loss=0.3443, attn_decoder_loss=0.2356, over 5331867.56 frames. ], batch size: 94, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:36:12,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=870800.0, ans=0.2 2024-09-20 10:36:17,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-09-20 10:37:01,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=870920.0, ans=0.0 2024-09-20 10:37:08,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=870960.0, ans=10.0 2024-09-20 10:37:18,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=870960.0, ans=0.09899494936611666 2024-09-20 10:37:18,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=870960.0, ans=0.125 2024-09-20 10:37:19,725 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.745e+01 9.074e+01 9.621e+01 1.472e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-20 10:37:20,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=870960.0, ans=0.0 2024-09-20 10:37:25,056 INFO [train.py:1198] (1/2) Epoch 49, batch 550, loss[loss=0.2451, ctc_loss=0.1174, cr_loss=0.3687, attn_decoder_loss=0.2511, over 28792.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1068, cr_loss=0.3448, attn_decoder_loss=0.2357, over 5424374.95 frames. ], batch size: 104, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:37:31,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=871000.0, ans=0.0 2024-09-20 10:38:07,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=871080.0, ans=0.0 2024-09-20 10:38:11,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=871120.0, ans=0.125 2024-09-20 10:38:40,827 INFO [train.py:1198] (1/2) Epoch 49, batch 600, loss[loss=0.2323, ctc_loss=0.09819, cr_loss=0.3164, attn_decoder_loss=0.2402, over 29322.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1072, cr_loss=0.3458, attn_decoder_loss=0.236, over 5508554.25 frames. ], batch size: 100, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:38:45,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=871200.0, ans=0.1 2024-09-20 10:38:50,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=871200.0, ans=0.0 2024-09-20 10:38:50,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=871200.0, ans=0.2 2024-09-20 10:38:54,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871240.0, ans=0.1 2024-09-20 10:39:00,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=871240.0, ans=0.125 2024-09-20 10:39:18,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=871280.0, ans=0.125 2024-09-20 10:39:41,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=871360.0, ans=0.025 2024-09-20 10:39:54,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=871360.0, ans=0.125 2024-09-20 10:39:55,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 8.578e+01 9.036e+01 9.635e+01 5.589e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-20 10:39:57,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=871400.0, ans=0.0 2024-09-20 10:39:58,321 INFO [train.py:1198] (1/2) Epoch 49, batch 650, loss[loss=0.2291, ctc_loss=0.104, cr_loss=0.3344, attn_decoder_loss=0.2355, over 29762.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1062, cr_loss=0.3432, attn_decoder_loss=0.2352, over 5585897.11 frames. ], batch size: 81, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:40:15,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.28 vs. limit=15.0 2024-09-20 10:40:32,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=871480.0, ans=0.025 2024-09-20 10:40:50,150 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:41:10,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=871560.0, ans=0.2 2024-09-20 10:41:10,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=871560.0, ans=0.125 2024-09-20 10:41:13,707 INFO [train.py:1198] (1/2) Epoch 49, batch 700, loss[loss=0.2184, ctc_loss=0.09866, cr_loss=0.3274, attn_decoder_loss=0.2244, over 29551.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1067, cr_loss=0.3442, attn_decoder_loss=0.236, over 5637421.97 frames. ], batch size: 76, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:41:17,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=871600.0, ans=0.0 2024-09-20 10:41:22,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.69 vs. limit=15.0 2024-09-20 10:41:23,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=871600.0, ans=0.125 2024-09-20 10:41:23,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871600.0, ans=0.1 2024-09-20 10:41:26,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=871600.0, ans=0.0 2024-09-20 10:41:31,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=871640.0, ans=0.125 2024-09-20 10:41:31,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2024-09-20 10:41:42,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.72 vs. limit=15.0 2024-09-20 10:42:20,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=871760.0, ans=0.2 2024-09-20 10:42:25,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=871760.0, ans=0.2 2024-09-20 10:42:29,492 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 8.695e+01 9.527e+01 1.020e+02 1.538e+02, threshold=1.905e+02, percent-clipped=0.0 2024-09-20 10:42:31,055 INFO [train.py:1198] (1/2) Epoch 49, batch 750, loss[loss=0.2309, ctc_loss=0.1035, cr_loss=0.3308, attn_decoder_loss=0.2377, over 29707.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1065, cr_loss=0.3439, attn_decoder_loss=0.2357, over 5675847.03 frames. ], batch size: 82, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:42:49,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.94 vs. limit=15.0 2024-09-20 10:43:00,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-09-20 10:43:35,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=871960.0, ans=0.0 2024-09-20 10:43:37,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=871960.0, ans=0.0 2024-09-20 10:43:48,830 INFO [train.py:1198] (1/2) Epoch 49, batch 800, loss[loss=0.2165, ctc_loss=0.1016, cr_loss=0.3259, attn_decoder_loss=0.222, over 29597.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1067, cr_loss=0.3442, attn_decoder_loss=0.2358, over 5704978.26 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:44:02,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=872040.0, ans=0.0 2024-09-20 10:44:02,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=872040.0, ans=0.025 2024-09-20 10:44:13,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=872040.0, ans=0.0 2024-09-20 10:44:14,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=872040.0, ans=0.2 2024-09-20 10:44:14,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=872040.0, ans=0.0 2024-09-20 10:44:20,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=872080.0, ans=0.025 2024-09-20 10:44:32,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=872120.0, ans=0.0 2024-09-20 10:44:40,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=872120.0, ans=0.0 2024-09-20 10:44:43,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=872120.0, ans=0.125 2024-09-20 10:44:59,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=872160.0, ans=0.0 2024-09-20 10:45:03,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.964e+01 8.646e+01 9.267e+01 9.884e+01 3.056e+02, threshold=1.853e+02, percent-clipped=1.0 2024-09-20 10:45:03,423 INFO [train.py:1198] (1/2) Epoch 49, batch 850, loss[loss=0.2401, ctc_loss=0.1097, cr_loss=0.3462, attn_decoder_loss=0.2469, over 29713.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1063, cr_loss=0.3433, attn_decoder_loss=0.2355, over 5735300.79 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:45:08,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.46 vs. limit=15.0 2024-09-20 10:45:12,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=872200.0, ans=0.5 2024-09-20 10:45:17,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=872200.0, ans=0.07 2024-09-20 10:45:25,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.01 vs. limit=15.0 2024-09-20 10:45:28,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=872240.0, ans=0.0 2024-09-20 10:45:41,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=872280.0, ans=0.125 2024-09-20 10:45:56,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-09-20 10:46:04,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=872360.0, ans=0.025 2024-09-20 10:46:11,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.99 vs. limit=15.0 2024-09-20 10:46:11,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.91 vs. limit=15.0 2024-09-20 10:46:21,060 INFO [train.py:1198] (1/2) Epoch 49, batch 900, loss[loss=0.217, ctc_loss=0.1001, cr_loss=0.3272, attn_decoder_loss=0.2227, over 29614.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1066, cr_loss=0.3442, attn_decoder_loss=0.2361, over 5741074.54 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:46:27,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=872400.0, ans=0.5 2024-09-20 10:46:33,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=872400.0, ans=0.125 2024-09-20 10:46:33,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=872400.0, ans=0.125 2024-09-20 10:47:02,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=872480.0, ans=0.5 2024-09-20 10:47:08,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=872520.0, ans=0.125 2024-09-20 10:47:24,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=872560.0, ans=0.2 2024-09-20 10:47:29,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.47 vs. limit=15.0 2024-09-20 10:47:38,399 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.666e+01 9.208e+01 1.007e+02 2.481e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-20 10:47:38,430 INFO [train.py:1198] (1/2) Epoch 49, batch 950, loss[loss=0.214, ctc_loss=0.09318, cr_loss=0.2995, attn_decoder_loss=0.2207, over 29506.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1068, cr_loss=0.3445, attn_decoder_loss=0.2363, over 5743444.39 frames. ], batch size: 74, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:47:43,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=872600.0, ans=0.0 2024-09-20 10:48:01,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-09-20 10:48:17,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=872680.0, ans=0.0 2024-09-20 10:48:20,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=872680.0, ans=0.125 2024-09-20 10:48:25,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.28 vs. limit=15.0 2024-09-20 10:48:34,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=872720.0, ans=0.125 2024-09-20 10:48:41,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=872760.0, ans=0.125 2024-09-20 10:48:43,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.86 vs. limit=15.0 2024-09-20 10:48:49,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2024-09-20 10:48:50,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=872760.0, ans=0.125 2024-09-20 10:48:53,490 INFO [train.py:1198] (1/2) Epoch 49, batch 1000, loss[loss=0.2319, ctc_loss=0.1095, cr_loss=0.3602, attn_decoder_loss=0.2375, over 29475.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1076, cr_loss=0.3464, attn_decoder_loss=0.237, over 5737418.51 frames. ], batch size: 77, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:48:55,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=872800.0, ans=0.125 2024-09-20 10:49:11,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=872840.0, ans=0.125 2024-09-20 10:49:51,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=872920.0, ans=0.04949747468305833 2024-09-20 10:50:06,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2024-09-20 10:50:10,720 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.566e+01 9.140e+01 9.673e+01 2.370e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-20 10:50:10,742 INFO [train.py:1198] (1/2) Epoch 49, batch 1050, loss[loss=0.2438, ctc_loss=0.1196, cr_loss=0.3692, attn_decoder_loss=0.2494, over 29693.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1075, cr_loss=0.3463, attn_decoder_loss=0.2365, over 5744295.94 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:50:17,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=873000.0, ans=0.125 2024-09-20 10:50:52,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=873080.0, ans=0.1 2024-09-20 10:50:57,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2024-09-20 10:50:58,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=873120.0, ans=0.125 2024-09-20 10:50:59,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=873120.0, ans=0.0 2024-09-20 10:51:08,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2024-09-20 10:51:09,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=873160.0, ans=0.125 2024-09-20 10:51:10,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=873160.0, ans=0.07 2024-09-20 10:51:22,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2024-09-20 10:51:26,494 INFO [train.py:1198] (1/2) Epoch 49, batch 1100, loss[loss=0.2245, ctc_loss=0.1081, cr_loss=0.341, attn_decoder_loss=0.2298, over 29439.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1075, cr_loss=0.3461, attn_decoder_loss=0.2362, over 5757191.06 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:51:43,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=873240.0, ans=10.0 2024-09-20 10:52:02,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=873280.0, ans=0.125 2024-09-20 10:52:21,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.83 vs. limit=10.0 2024-09-20 10:52:32,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=873360.0, ans=0.125 2024-09-20 10:52:41,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=873360.0, ans=0.125 2024-09-20 10:52:44,484 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.549e+01 9.114e+01 9.620e+01 1.410e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-20 10:52:44,505 INFO [train.py:1198] (1/2) Epoch 49, batch 1150, loss[loss=0.2258, ctc_loss=0.1018, cr_loss=0.3411, attn_decoder_loss=0.232, over 29458.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1075, cr_loss=0.3461, attn_decoder_loss=0.2362, over 5756315.79 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:54:02,477 INFO [train.py:1198] (1/2) Epoch 49, batch 1200, loss[loss=0.2277, ctc_loss=0.1003, cr_loss=0.3151, attn_decoder_loss=0.2349, over 29680.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1076, cr_loss=0.3464, attn_decoder_loss=0.2366, over 5747880.39 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:54:10,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=873600.0, ans=0.2 2024-09-20 10:54:22,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=873640.0, ans=0.0 2024-09-20 10:54:32,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.77 vs. limit=15.0 2024-09-20 10:54:37,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=873680.0, ans=0.2 2024-09-20 10:54:39,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=873680.0, ans=0.125 2024-09-20 10:55:00,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=873720.0, ans=0.125 2024-09-20 10:55:18,093 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.765e+01 9.274e+01 9.697e+01 1.334e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-20 10:55:18,119 INFO [train.py:1198] (1/2) Epoch 49, batch 1250, loss[loss=0.2459, ctc_loss=0.1103, cr_loss=0.3616, attn_decoder_loss=0.2529, over 29589.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1082, cr_loss=0.3481, attn_decoder_loss=0.2372, over 5775149.76 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:55:36,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.01 vs. limit=15.0 2024-09-20 10:55:40,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=873840.0, ans=0.025 2024-09-20 10:55:43,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873840.0, ans=0.1 2024-09-20 10:55:47,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873840.0, ans=0.1 2024-09-20 10:55:48,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-09-20 10:56:07,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873920.0, ans=0.1 2024-09-20 10:56:21,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=873960.0, ans=0.025 2024-09-20 10:56:35,756 INFO [train.py:1198] (1/2) Epoch 49, batch 1300, loss[loss=0.239, ctc_loss=0.1119, cr_loss=0.3601, attn_decoder_loss=0.2451, over 28127.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1077, cr_loss=0.3465, attn_decoder_loss=0.2365, over 5780383.19 frames. ], batch size: 111, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:56:36,122 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:56:48,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=874000.0, ans=0.0 2024-09-20 10:57:07,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.03 vs. limit=10.0 2024-09-20 10:57:09,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2024-09-20 10:57:28,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.50 vs. limit=15.0 2024-09-20 10:57:35,715 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:57:36,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-20 10:57:44,801 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:57:53,496 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.411e+01 9.030e+01 9.662e+01 1.974e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-20 10:57:53,517 INFO [train.py:1198] (1/2) Epoch 49, batch 1350, loss[loss=0.2322, ctc_loss=0.1107, cr_loss=0.3638, attn_decoder_loss=0.2376, over 29754.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1074, cr_loss=0.3462, attn_decoder_loss=0.2363, over 5796609.28 frames. ], batch size: 81, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:58:02,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=874200.0, ans=0.05 2024-09-20 10:58:06,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.93 vs. limit=10.0 2024-09-20 10:58:08,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=874240.0, ans=0.0 2024-09-20 10:58:23,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=874280.0, ans=0.125 2024-09-20 10:58:48,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=12.0 2024-09-20 10:58:54,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.53 vs. limit=15.0 2024-09-20 10:59:09,072 INFO [train.py:1198] (1/2) Epoch 49, batch 1400, loss[loss=0.2084, ctc_loss=0.09308, cr_loss=0.3239, attn_decoder_loss=0.214, over 29581.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.107, cr_loss=0.3454, attn_decoder_loss=0.2359, over 5808647.48 frames. ], batch size: 69, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:59:47,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=874480.0, ans=0.09899494936611666 2024-09-20 10:59:59,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=874520.0, ans=0.05 2024-09-20 11:00:00,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=12.0 2024-09-20 11:00:17,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=874560.0, ans=0.2 2024-09-20 11:00:19,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=874560.0, ans=0.125 2024-09-20 11:00:23,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=874560.0, ans=0.125 2024-09-20 11:00:25,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=874600.0, ans=0.0 2024-09-20 11:00:26,309 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.532e+01 9.313e+01 9.693e+01 1.325e+02, threshold=1.863e+02, percent-clipped=0.0 2024-09-20 11:00:26,330 INFO [train.py:1198] (1/2) Epoch 49, batch 1450, loss[loss=0.2458, ctc_loss=0.1208, cr_loss=0.3685, attn_decoder_loss=0.2515, over 29447.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1071, cr_loss=0.3452, attn_decoder_loss=0.2363, over 5804893.50 frames. ], batch size: 94, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 11:01:00,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=874680.0, ans=0.0 2024-09-20 11:01:03,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=874680.0, ans=0.09899494936611666 2024-09-20 11:01:09,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=874680.0, ans=0.125 2024-09-20 11:01:18,443 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:01:19,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=874720.0, ans=0.0 2024-09-20 11:01:43,691 INFO [train.py:1198] (1/2) Epoch 49, batch 1500, loss[loss=0.2347, ctc_loss=0.1067, cr_loss=0.3495, attn_decoder_loss=0.2411, over 29615.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.107, cr_loss=0.3455, attn_decoder_loss=0.2365, over 5806298.56 frames. ], batch size: 86, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 11:02:07,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-20 11:02:18,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.32 vs. limit=15.0 2024-09-20 11:02:35,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=874920.0, ans=0.125 2024-09-20 11:02:44,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=874960.0, ans=0.025 2024-09-20 11:02:59,720 INFO [train.py:1198] (1/2) Epoch 49, batch 1550, loss[loss=0.2449, ctc_loss=0.1136, cr_loss=0.3632, attn_decoder_loss=0.2514, over 29507.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1075, cr_loss=0.3461, attn_decoder_loss=0.2368, over 5783431.93 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 11:03:00,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.43 vs. limit=10.0 2024-09-20 11:03:01,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.780e+01 9.221e+01 9.714e+01 1.731e+02, threshold=1.844e+02, percent-clipped=0.0 2024-09-20 11:03:22,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=875040.0, ans=22.5 2024-09-20 11:03:26,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=875040.0, ans=0.0 2024-09-20 11:03:35,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=875080.0, ans=0.125 2024-09-20 11:04:01,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875160.0, ans=0.1 2024-09-20 11:04:17,871 INFO [train.py:1198] (1/2) Epoch 49, batch 1600, loss[loss=0.2377, ctc_loss=0.1053, cr_loss=0.3327, attn_decoder_loss=0.245, over 29644.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1076, cr_loss=0.3466, attn_decoder_loss=0.2366, over 5765812.00 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 11:04:23,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=875200.0, ans=0.5 2024-09-20 11:04:47,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.49 vs. limit=22.5 2024-09-20 11:05:35,284 INFO [train.py:1198] (1/2) Epoch 49, batch 1650, loss[loss=0.2419, ctc_loss=0.1059, cr_loss=0.3473, attn_decoder_loss=0.2492, over 29709.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1071, cr_loss=0.3458, attn_decoder_loss=0.2364, over 5760646.21 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 11:05:36,811 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.731e+01 9.375e+01 1.033e+02 4.600e+02, threshold=1.875e+02, percent-clipped=3.0 2024-09-20 11:05:37,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=875400.0, ans=0.125 2024-09-20 11:05:49,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.40 vs. limit=6.0 2024-09-20 11:05:52,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=875440.0, ans=0.125 2024-09-20 11:06:22,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=875520.0, ans=0.125 2024-09-20 11:06:25,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=875520.0, ans=0.0 2024-09-20 11:06:28,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=875520.0, ans=0.0 2024-09-20 11:06:30,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.31 vs. limit=15.0 2024-09-20 11:06:45,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2024-09-20 11:06:50,363 INFO [train.py:1198] (1/2) Epoch 49, batch 1700, loss[loss=0.201, ctc_loss=0.0845, cr_loss=0.2969, attn_decoder_loss=0.2074, over 29570.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1069, cr_loss=0.346, attn_decoder_loss=0.2363, over 5781583.73 frames. ], batch size: 69, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 11:06:52,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=875600.0, ans=0.2 2024-09-20 11:07:07,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=875640.0, ans=0.125 2024-09-20 11:07:11,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=875640.0, ans=0.0 2024-09-20 11:07:19,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=875640.0, ans=0.0 2024-09-20 11:07:33,552 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:07:50,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=875720.0, ans=0.125 2024-09-20 11:07:53,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=875760.0, ans=0.0 2024-09-20 11:07:57,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=875760.0, ans=0.0 2024-09-20 11:08:03,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=875760.0, ans=0.125 2024-09-20 11:08:07,654 INFO [train.py:1198] (1/2) Epoch 49, batch 1750, loss[loss=0.2118, ctc_loss=0.09515, cr_loss=0.3283, attn_decoder_loss=0.2175, over 29372.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1067, cr_loss=0.3453, attn_decoder_loss=0.2359, over 5789465.51 frames. ], batch size: 67, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 11:08:08,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=875800.0, ans=0.125 2024-09-20 11:08:10,635 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.685e+01 8.566e+01 9.020e+01 9.576e+01 1.474e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-20 11:08:15,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=875800.0, ans=0.125 2024-09-20 11:08:34,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=875840.0, ans=0.0 2024-09-20 11:08:35,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=875840.0, ans=0.0 2024-09-20 11:08:38,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875880.0, ans=0.1 2024-09-20 11:08:44,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=875880.0, ans=0.1 2024-09-20 11:08:45,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.05 vs. limit=10.0 2024-09-20 11:08:45,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=875880.0, ans=0.025 2024-09-20 11:08:57,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=875920.0, ans=0.125 2024-09-20 11:09:03,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=875920.0, ans=0.09899494936611666 2024-09-20 11:09:11,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.59 vs. limit=15.0 2024-09-20 11:09:11,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2024-09-20 11:09:17,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=12.0 2024-09-20 11:09:24,847 INFO [train.py:1198] (1/2) Epoch 49, batch 1800, loss[loss=0.2269, ctc_loss=0.103, cr_loss=0.3405, attn_decoder_loss=0.2331, over 29670.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1066, cr_loss=0.3453, attn_decoder_loss=0.236, over 5790763.84 frames. ], batch size: 83, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 11:09:34,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=876000.0, ans=0.2 2024-09-20 11:09:54,124 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:10:40,174 INFO [train.py:1198] (1/2) Epoch 49, batch 1850, loss[loss=0.2391, ctc_loss=0.1046, cr_loss=0.3262, attn_decoder_loss=0.2468, over 29621.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.107, cr_loss=0.3464, attn_decoder_loss=0.2362, over 5796276.57 frames. ], batch size: 86, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:10:43,132 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.600e+01 9.055e+01 9.654e+01 2.900e+02, threshold=1.811e+02, percent-clipped=2.0 2024-09-20 11:10:43,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.51 vs. limit=15.0 2024-09-20 11:11:21,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=876280.0, ans=0.125 2024-09-20 11:11:30,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=876320.0, ans=0.0 2024-09-20 11:11:56,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=22.5 2024-09-20 11:11:56,966 INFO [train.py:1198] (1/2) Epoch 49, batch 1900, loss[loss=0.2452, ctc_loss=0.1136, cr_loss=0.3679, attn_decoder_loss=0.2517, over 29697.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1073, cr_loss=0.3465, attn_decoder_loss=0.2368, over 5804275.90 frames. ], batch size: 89, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:12:04,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876400.0, ans=0.1 2024-09-20 11:12:17,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.82 vs. limit=22.5 2024-09-20 11:12:23,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876440.0, ans=0.1 2024-09-20 11:12:46,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.82 vs. limit=15.0 2024-09-20 11:12:47,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=876520.0, ans=0.0 2024-09-20 11:13:06,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.37 vs. limit=15.0 2024-09-20 11:13:14,820 INFO [train.py:1198] (1/2) Epoch 49, batch 1950, loss[loss=0.2337, ctc_loss=0.1238, cr_loss=0.3726, attn_decoder_loss=0.2377, over 29471.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.108, cr_loss=0.3478, attn_decoder_loss=0.238, over 5819180.08 frames. ], batch size: 78, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:13:17,850 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.737e+01 9.338e+01 9.931e+01 1.218e+02, threshold=1.868e+02, percent-clipped=0.0 2024-09-20 11:13:32,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.37 vs. limit=22.5 2024-09-20 11:13:45,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=876680.0, ans=0.125 2024-09-20 11:13:46,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=876680.0, ans=0.125 2024-09-20 11:14:06,284 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:14:24,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=876760.0, ans=0.125 2024-09-20 11:14:30,277 INFO [train.py:1198] (1/2) Epoch 49, batch 2000, loss[loss=0.204, ctc_loss=0.0896, cr_loss=0.3147, attn_decoder_loss=0.2097, over 29399.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1082, cr_loss=0.3483, attn_decoder_loss=0.2382, over 5796678.06 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:14:34,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-20 11:14:53,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=876840.0, ans=0.0 2024-09-20 11:14:59,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=876880.0, ans=0.0 2024-09-20 11:15:47,891 INFO [train.py:1198] (1/2) Epoch 49, batch 2050, loss[loss=0.2072, ctc_loss=0.08921, cr_loss=0.297, attn_decoder_loss=0.2137, over 29423.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1077, cr_loss=0.3471, attn_decoder_loss=0.2372, over 5789602.60 frames. ], batch size: 70, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:15:50,913 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.585e+01 9.358e+01 1.005e+02 5.300e+02, threshold=1.872e+02, percent-clipped=1.0 2024-09-20 11:15:53,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.31 vs. limit=15.0 2024-09-20 11:16:00,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=877000.0, ans=0.125 2024-09-20 11:16:04,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.65 vs. limit=15.0 2024-09-20 11:16:14,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=877040.0, ans=0.125 2024-09-20 11:16:25,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=877080.0, ans=0.2 2024-09-20 11:16:31,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=877080.0, ans=0.125 2024-09-20 11:16:59,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=877160.0, ans=15.0 2024-09-20 11:17:05,477 INFO [train.py:1198] (1/2) Epoch 49, batch 2100, loss[loss=0.2348, ctc_loss=0.1139, cr_loss=0.3764, attn_decoder_loss=0.2399, over 29780.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1076, cr_loss=0.3464, attn_decoder_loss=0.2369, over 5801960.17 frames. ], batch size: 81, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:17:10,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=877200.0, ans=0.2 2024-09-20 11:17:24,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2024-09-20 11:17:40,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=877280.0, ans=0.125 2024-09-20 11:17:45,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-09-20 11:17:46,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=877280.0, ans=0.09899494936611666 2024-09-20 11:17:58,266 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:18:09,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-20 11:18:20,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-20 11:18:20,695 INFO [train.py:1198] (1/2) Epoch 49, batch 2150, loss[loss=0.2282, ctc_loss=0.1064, cr_loss=0.3592, attn_decoder_loss=0.2338, over 29453.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1073, cr_loss=0.3461, attn_decoder_loss=0.2364, over 5816855.09 frames. ], batch size: 78, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:18:23,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.478e+01 8.920e+01 9.429e+01 1.261e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-20 11:18:24,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=877400.0, ans=0.035 2024-09-20 11:19:09,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=877520.0, ans=0.025 2024-09-20 11:19:20,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=877520.0, ans=0.125 2024-09-20 11:19:29,575 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:19:33,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=877560.0, ans=0.0 2024-09-20 11:19:38,778 INFO [train.py:1198] (1/2) Epoch 49, batch 2200, loss[loss=0.2324, ctc_loss=0.105, cr_loss=0.3387, attn_decoder_loss=0.239, over 29627.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1072, cr_loss=0.3462, attn_decoder_loss=0.2365, over 5812010.08 frames. ], batch size: 86, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:19:56,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=877640.0, ans=0.2 2024-09-20 11:20:09,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-20 11:20:19,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-09-20 11:20:35,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877720.0, ans=0.1 2024-09-20 11:20:48,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-20 11:20:53,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=877760.0, ans=0.125 2024-09-20 11:20:56,325 INFO [train.py:1198] (1/2) Epoch 49, batch 2250, loss[loss=0.2435, ctc_loss=0.1197, cr_loss=0.3868, attn_decoder_loss=0.2486, over 29739.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1068, cr_loss=0.3451, attn_decoder_loss=0.2363, over 5811607.96 frames. ], batch size: 82, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:20:59,106 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.798e+01 9.192e+01 9.899e+01 1.510e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-20 11:21:07,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=15.0 2024-09-20 11:21:19,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=877840.0, ans=0.1 2024-09-20 11:21:25,040 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:21:34,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=877880.0, ans=0.125 2024-09-20 11:21:35,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877880.0, ans=0.1 2024-09-20 11:21:36,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=877880.0, ans=0.125 2024-09-20 11:21:46,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=877920.0, ans=0.125 2024-09-20 11:22:05,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=877960.0, ans=0.125 2024-09-20 11:22:08,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=877960.0, ans=0.125 2024-09-20 11:22:11,329 INFO [train.py:1198] (1/2) Epoch 49, batch 2300, loss[loss=0.2123, ctc_loss=0.09865, cr_loss=0.3151, attn_decoder_loss=0.218, over 29344.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1058, cr_loss=0.3429, attn_decoder_loss=0.2352, over 5797729.91 frames. ], batch size: 71, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:22:21,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=878000.0, ans=0.125 2024-09-20 11:22:41,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=878080.0, ans=0.125 2024-09-20 11:22:55,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=878120.0, ans=0.0 2024-09-20 11:23:01,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=878120.0, ans=0.125 2024-09-20 11:23:10,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.52 vs. limit=15.0 2024-09-20 11:23:15,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=878160.0, ans=0.125 2024-09-20 11:23:25,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2024-09-20 11:23:29,171 INFO [train.py:1198] (1/2) Epoch 49, batch 2350, loss[loss=0.2394, ctc_loss=0.1171, cr_loss=0.3783, attn_decoder_loss=0.2446, over 29702.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1061, cr_loss=0.3437, attn_decoder_loss=0.2355, over 5804039.25 frames. ], batch size: 83, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:23:32,110 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.491e+01 9.028e+01 9.631e+01 3.047e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-20 11:24:01,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=878280.0, ans=0.1 2024-09-20 11:24:04,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=878280.0, ans=0.125 2024-09-20 11:24:04,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=878280.0, ans=0.125 2024-09-20 11:24:06,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=878280.0, ans=0.125 2024-09-20 11:24:10,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=878280.0, ans=0.125 2024-09-20 11:24:47,382 INFO [train.py:1198] (1/2) Epoch 49, batch 2400, loss[loss=0.2174, ctc_loss=0.1055, cr_loss=0.3309, attn_decoder_loss=0.2225, over 29520.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1068, cr_loss=0.3453, attn_decoder_loss=0.236, over 5808095.53 frames. ], batch size: 76, lr: 2.24e-03, grad_scale: 32.0 2024-09-20 11:25:06,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.88 vs. limit=15.0 2024-09-20 11:25:10,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=878440.0, ans=0.125 2024-09-20 11:25:17,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.94 vs. limit=15.0 2024-09-20 11:25:40,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2024-09-20 11:25:43,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=878520.0, ans=0.1 2024-09-20 11:26:02,941 INFO [train.py:1198] (1/2) Epoch 49, batch 2450, loss[loss=0.2319, ctc_loss=0.1123, cr_loss=0.3607, attn_decoder_loss=0.2372, over 29686.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1074, cr_loss=0.3468, attn_decoder_loss=0.2368, over 5783518.54 frames. ], batch size: 82, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:26:07,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.775e+01 9.341e+01 9.851e+01 1.765e+02, threshold=1.868e+02, percent-clipped=0.0 2024-09-20 11:26:11,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=878600.0, ans=0.1 2024-09-20 11:26:13,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=878600.0, ans=0.0 2024-09-20 11:26:28,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2024-09-20 11:26:34,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=878680.0, ans=0.2 2024-09-20 11:26:48,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=878720.0, ans=0.125 2024-09-20 11:27:02,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.36 vs. limit=22.5 2024-09-20 11:27:14,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=878760.0, ans=0.0 2024-09-20 11:27:19,940 INFO [train.py:1198] (1/2) Epoch 49, batch 2500, loss[loss=0.2374, ctc_loss=0.108, cr_loss=0.3401, attn_decoder_loss=0.2443, over 29626.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1074, cr_loss=0.3472, attn_decoder_loss=0.2368, over 5793881.12 frames. ], batch size: 86, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:27:27,716 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:27:34,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=878800.0, ans=0.2 2024-09-20 11:27:50,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=878880.0, ans=0.0 2024-09-20 11:28:28,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.06 vs. limit=10.0 2024-09-20 11:28:33,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=878960.0, ans=0.125 2024-09-20 11:28:37,629 INFO [train.py:1198] (1/2) Epoch 49, batch 2550, loss[loss=0.2041, ctc_loss=0.08591, cr_loss=0.3093, attn_decoder_loss=0.2104, over 29378.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1077, cr_loss=0.3476, attn_decoder_loss=0.2368, over 5796757.26 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:28:42,018 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 8.758e+01 9.202e+01 9.559e+01 1.179e+02, threshold=1.840e+02, percent-clipped=0.0 2024-09-20 11:29:04,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=879040.0, ans=0.125 2024-09-20 11:29:10,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2024-09-20 11:29:12,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.48 vs. limit=10.0 2024-09-20 11:29:30,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879120.0, ans=0.1 2024-09-20 11:29:35,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.36 vs. limit=15.0 2024-09-20 11:29:50,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879160.0, ans=0.1 2024-09-20 11:29:53,824 INFO [train.py:1198] (1/2) Epoch 49, batch 2600, loss[loss=0.2322, ctc_loss=0.111, cr_loss=0.359, attn_decoder_loss=0.2377, over 29456.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1079, cr_loss=0.3477, attn_decoder_loss=0.2373, over 5793667.65 frames. ], batch size: 78, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:30:08,981 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:30:19,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=879240.0, ans=0.125 2024-09-20 11:30:48,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=879320.0, ans=0.125 2024-09-20 11:30:59,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=879360.0, ans=0.2 2024-09-20 11:31:02,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=879360.0, ans=0.1 2024-09-20 11:31:10,896 INFO [train.py:1198] (1/2) Epoch 49, batch 2650, loss[loss=0.2405, ctc_loss=0.1181, cr_loss=0.3763, attn_decoder_loss=0.2457, over 29234.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1078, cr_loss=0.3477, attn_decoder_loss=0.2374, over 5799289.61 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:31:15,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.630e+01 9.011e+01 9.615e+01 2.139e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 11:31:20,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=879400.0, ans=0.07 2024-09-20 11:31:31,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=879440.0, ans=0.0 2024-09-20 11:31:34,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=879440.0, ans=0.125 2024-09-20 11:31:44,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=879480.0, ans=0.125 2024-09-20 11:31:52,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=879480.0, ans=0.125 2024-09-20 11:31:54,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=879480.0, ans=0.125 2024-09-20 11:32:00,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=879520.0, ans=0.125 2024-09-20 11:32:11,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879560.0, ans=0.1 2024-09-20 11:32:12,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=879560.0, ans=0.0 2024-09-20 11:32:27,636 INFO [train.py:1198] (1/2) Epoch 49, batch 2700, loss[loss=0.2466, ctc_loss=0.119, cr_loss=0.3788, attn_decoder_loss=0.2524, over 29517.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1079, cr_loss=0.3475, attn_decoder_loss=0.2376, over 5795243.03 frames. ], batch size: 87, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:32:28,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-09-20 11:32:37,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-09-20 11:32:44,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=879640.0, ans=0.2 2024-09-20 11:32:44,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=879640.0, ans=0.125 2024-09-20 11:32:55,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=879640.0, ans=0.125 2024-09-20 11:33:43,447 INFO [train.py:1198] (1/2) Epoch 49, batch 2750, loss[loss=0.2153, ctc_loss=0.09901, cr_loss=0.3229, attn_decoder_loss=0.221, over 29508.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.107, cr_loss=0.3453, attn_decoder_loss=0.2364, over 5794718.60 frames. ], batch size: 75, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:33:49,655 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.773e+01 9.217e+01 9.860e+01 5.240e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-20 11:34:02,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=879840.0, ans=0.125 2024-09-20 11:34:06,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=879840.0, ans=0.0 2024-09-20 11:34:06,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=879840.0, ans=0.125 2024-09-20 11:34:10,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2024-09-20 11:34:17,177 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:34:23,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=879880.0, ans=0.125 2024-09-20 11:34:27,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=879920.0, ans=0.125 2024-09-20 11:34:38,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=879920.0, ans=0.025 2024-09-20 11:34:57,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=879960.0, ans=0.125 2024-09-20 11:35:09,257 INFO [train.py:1198] (1/2) Epoch 49, batch 2800, loss[loss=0.2482, ctc_loss=0.1364, cr_loss=0.375, attn_decoder_loss=0.2523, over 20086.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1076, cr_loss=0.3465, attn_decoder_loss=0.2368, over 5776416.77 frames. ], batch size: 210, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:35:31,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=880040.0, ans=0.0 2024-09-20 11:35:32,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=880040.0, ans=0.125 2024-09-20 11:35:33,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.02 vs. limit=15.0 2024-09-20 11:35:46,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=880080.0, ans=0.125 2024-09-20 11:36:02,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=880120.0, ans=0.125 2024-09-20 11:36:16,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=880160.0, ans=0.1 2024-09-20 11:36:26,657 INFO [train.py:1198] (1/2) Epoch 49, batch 2850, loss[loss=0.2239, ctc_loss=0.1037, cr_loss=0.3395, attn_decoder_loss=0.2297, over 29524.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1083, cr_loss=0.3478, attn_decoder_loss=0.2375, over 5762188.66 frames. ], batch size: 77, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:36:32,610 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.815e+01 9.180e+01 9.751e+01 2.075e+02, threshold=1.836e+02, percent-clipped=1.0 2024-09-20 11:36:45,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=880240.0, ans=0.125 2024-09-20 11:36:48,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=880240.0, ans=0.125 2024-09-20 11:37:07,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=880280.0, ans=0.125 2024-09-20 11:37:19,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=880320.0, ans=0.025 2024-09-20 11:37:29,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=880360.0, ans=0.2 2024-09-20 11:37:42,425 INFO [train.py:1198] (1/2) Epoch 49, batch 2900, loss[loss=0.2194, ctc_loss=0.09385, cr_loss=0.3165, attn_decoder_loss=0.2263, over 29424.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1091, cr_loss=0.3498, attn_decoder_loss=0.2384, over 5787678.24 frames. ], batch size: 79, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:37:44,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=880400.0, ans=0.2 2024-09-20 11:38:01,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=22.5 2024-09-20 11:38:16,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=880480.0, ans=0.5 2024-09-20 11:38:22,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=880480.0, ans=0.125 2024-09-20 11:38:30,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.46 vs. limit=15.0 2024-09-20 11:38:35,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=880520.0, ans=0.125 2024-09-20 11:38:36,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.45 vs. limit=15.0 2024-09-20 11:38:44,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=880560.0, ans=0.0 2024-09-20 11:38:51,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=880560.0, ans=0.125 2024-09-20 11:39:02,487 INFO [train.py:1198] (1/2) Epoch 49, batch 2950, loss[loss=0.2332, ctc_loss=0.1094, cr_loss=0.3661, attn_decoder_loss=0.2388, over 29526.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1078, cr_loss=0.3469, attn_decoder_loss=0.2372, over 5783159.42 frames. ], batch size: 75, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:39:08,386 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.608e+01 9.182e+01 9.827e+01 1.689e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-20 11:39:16,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=880640.0, ans=15.0 2024-09-20 11:39:21,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=880640.0, ans=0.0 2024-09-20 11:40:18,499 INFO [train.py:1198] (1/2) Epoch 49, batch 3000, loss[loss=0.2289, ctc_loss=0.0982, cr_loss=0.3212, attn_decoder_loss=0.2363, over 29767.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1075, cr_loss=0.3462, attn_decoder_loss=0.2368, over 5783868.65 frames. ], batch size: 81, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:40:18,500 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 11:40:24,652 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([4.6914, 4.6440, 4.2960, 4.6053], device='cuda:1') 2024-09-20 11:40:36,855 INFO [train.py:1230] (1/2) Epoch 49, validation: loss=0.2126, ctc_loss=0.03669, cr_loss=6.618e-15, attn_decoder_loss=0.2322, over 944034.00 frames. 2024-09-20 11:40:36,855 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 11:40:47,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=880800.0, ans=0.025 2024-09-20 11:40:48,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2024-09-20 11:41:13,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=880880.0, ans=0.2 2024-09-20 11:41:14,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=880880.0, ans=0.125 2024-09-20 11:41:28,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=880920.0, ans=0.125 2024-09-20 11:41:28,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=880920.0, ans=0.125 2024-09-20 11:41:31,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=880920.0, ans=0.0 2024-09-20 11:41:37,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=880960.0, ans=0.125 2024-09-20 11:41:43,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=880960.0, ans=0.2 2024-09-20 11:41:47,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.96 vs. limit=22.5 2024-09-20 11:41:51,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=881000.0, ans=0.0 2024-09-20 11:41:52,486 INFO [train.py:1198] (1/2) Epoch 49, batch 3050, loss[loss=0.2134, ctc_loss=0.09553, cr_loss=0.3236, attn_decoder_loss=0.2193, over 29538.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1079, cr_loss=0.3464, attn_decoder_loss=0.2373, over 5778679.04 frames. ], batch size: 76, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:41:54,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=881000.0, ans=0.125 2024-09-20 11:41:58,509 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.535e+01 9.070e+01 9.568e+01 1.381e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 11:43:11,829 INFO [train.py:1198] (1/2) Epoch 49, batch 3100, loss[loss=0.2439, ctc_loss=0.1156, cr_loss=0.3545, attn_decoder_loss=0.2502, over 29293.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1074, cr_loss=0.3451, attn_decoder_loss=0.2365, over 5778009.30 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:43:43,818 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:43:57,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=881320.0, ans=0.07 2024-09-20 11:43:57,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=881320.0, ans=0.125 2024-09-20 11:44:06,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=881320.0, ans=0.125 2024-09-20 11:44:27,457 INFO [train.py:1198] (1/2) Epoch 49, batch 3150, loss[loss=0.2504, ctc_loss=0.1225, cr_loss=0.387, attn_decoder_loss=0.256, over 29011.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1076, cr_loss=0.3456, attn_decoder_loss=0.2368, over 5784236.03 frames. ], batch size: 104, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:44:33,450 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.538e+01 9.268e+01 9.767e+01 2.524e+02, threshold=1.854e+02, percent-clipped=1.0 2024-09-20 11:44:39,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=881400.0, ans=0.125 2024-09-20 11:44:56,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=881480.0, ans=0.125 2024-09-20 11:45:05,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=881480.0, ans=0.125 2024-09-20 11:45:11,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2024-09-20 11:45:20,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=881520.0, ans=0.125 2024-09-20 11:45:27,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=881560.0, ans=0.025 2024-09-20 11:45:27,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=881560.0, ans=0.125 2024-09-20 11:45:42,744 INFO [train.py:1198] (1/2) Epoch 49, batch 3200, loss[loss=0.227, ctc_loss=0.1043, cr_loss=0.3503, attn_decoder_loss=0.2328, over 29425.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1073, cr_loss=0.3452, attn_decoder_loss=0.2364, over 5793925.63 frames. ], batch size: 79, lr: 2.24e-03, grad_scale: 32.0 2024-09-20 11:45:47,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=881600.0, ans=0.125 2024-09-20 11:45:49,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=881600.0, ans=0.0 2024-09-20 11:46:04,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=881640.0, ans=0.1 2024-09-20 11:46:10,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=881640.0, ans=0.025 2024-09-20 11:46:21,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=881680.0, ans=0.1 2024-09-20 11:46:53,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=881760.0, ans=0.1 2024-09-20 11:46:54,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.10 vs. limit=15.0 2024-09-20 11:47:02,653 INFO [train.py:1198] (1/2) Epoch 49, batch 3250, loss[loss=0.2359, ctc_loss=0.1118, cr_loss=0.3548, attn_decoder_loss=0.2418, over 29710.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1073, cr_loss=0.3451, attn_decoder_loss=0.2369, over 5799826.72 frames. ], batch size: 84, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:47:10,204 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.742e+01 9.266e+01 9.794e+01 1.259e+02, threshold=1.853e+02, percent-clipped=0.0 2024-09-20 11:47:24,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.67 vs. limit=10.0 2024-09-20 11:47:35,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=881880.0, ans=0.0 2024-09-20 11:47:50,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=881920.0, ans=0.2 2024-09-20 11:47:55,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=881920.0, ans=10.0 2024-09-20 11:48:04,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2024-09-20 11:48:17,509 INFO [train.py:1198] (1/2) Epoch 49, batch 3300, loss[loss=0.241, ctc_loss=0.1111, cr_loss=0.3482, attn_decoder_loss=0.2477, over 28294.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1066, cr_loss=0.3439, attn_decoder_loss=0.2358, over 5796788.63 frames. ], batch size: 111, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:48:27,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.29 vs. limit=15.0 2024-09-20 11:48:48,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=882080.0, ans=0.125 2024-09-20 11:48:48,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=882080.0, ans=0.0 2024-09-20 11:48:52,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=882080.0, ans=0.0 2024-09-20 11:49:32,906 INFO [train.py:1198] (1/2) Epoch 49, batch 3350, loss[loss=0.2505, ctc_loss=0.1311, cr_loss=0.3949, attn_decoder_loss=0.255, over 28918.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1075, cr_loss=0.3457, attn_decoder_loss=0.2367, over 5773928.82 frames. ], batch size: 104, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:49:39,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=882200.0, ans=0.125 2024-09-20 11:49:39,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=882200.0, ans=0.125 2024-09-20 11:49:40,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.704e+01 9.230e+01 9.837e+01 1.570e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-20 11:49:44,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.40 vs. limit=12.0 2024-09-20 11:50:06,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2024-09-20 11:50:07,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.69 vs. limit=12.0 2024-09-20 11:50:20,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=882320.0, ans=0.0 2024-09-20 11:50:36,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.32 vs. limit=22.5 2024-09-20 11:50:52,873 INFO [train.py:1198] (1/2) Epoch 49, batch 3400, loss[loss=0.2044, ctc_loss=0.08935, cr_loss=0.3071, attn_decoder_loss=0.2103, over 29345.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1074, cr_loss=0.3452, attn_decoder_loss=0.2364, over 5765903.62 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:51:05,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=882400.0, ans=0.025 2024-09-20 11:51:12,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=882440.0, ans=0.125 2024-09-20 11:51:13,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-09-20 11:51:50,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=882520.0, ans=0.07 2024-09-20 11:52:05,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=882560.0, ans=0.125 2024-09-20 11:52:05,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=882560.0, ans=0.125 2024-09-20 11:52:08,170 INFO [train.py:1198] (1/2) Epoch 49, batch 3450, loss[loss=0.2383, ctc_loss=0.1047, cr_loss=0.3346, attn_decoder_loss=0.2457, over 28202.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1074, cr_loss=0.3455, attn_decoder_loss=0.2367, over 5772384.89 frames. ], batch size: 111, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:52:15,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 8.667e+01 9.196e+01 9.628e+01 1.869e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-20 11:52:25,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=882640.0, ans=0.125 2024-09-20 11:52:39,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=882680.0, ans=0.025 2024-09-20 11:52:41,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.85 vs. limit=12.0 2024-09-20 11:52:49,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.92 vs. limit=15.0 2024-09-20 11:52:56,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=882720.0, ans=0.025 2024-09-20 11:52:57,001 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-20 11:53:19,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=882760.0, ans=0.2 2024-09-20 11:53:23,247 INFO [train.py:1198] (1/2) Epoch 49, batch 3500, loss[loss=0.2081, ctc_loss=0.08918, cr_loss=0.3044, attn_decoder_loss=0.2146, over 29341.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.107, cr_loss=0.3447, attn_decoder_loss=0.2359, over 5775290.92 frames. ], batch size: 71, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:53:43,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2024-09-20 11:53:59,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=882880.0, ans=0.125 2024-09-20 11:54:05,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-09-20 11:54:15,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=882920.0, ans=0.0 2024-09-20 11:54:15,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=882920.0, ans=0.125 2024-09-20 11:54:17,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=882920.0, ans=0.125 2024-09-20 11:54:17,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=882920.0, ans=0.0 2024-09-20 11:54:17,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=882920.0, ans=0.125 2024-09-20 11:54:20,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=882920.0, ans=0.0 2024-09-20 11:54:24,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=882960.0, ans=0.125 2024-09-20 11:54:39,742 INFO [train.py:1198] (1/2) Epoch 49, batch 3550, loss[loss=0.2414, ctc_loss=0.1141, cr_loss=0.3659, attn_decoder_loss=0.2474, over 29720.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1072, cr_loss=0.3455, attn_decoder_loss=0.236, over 5781682.57 frames. ], batch size: 89, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:54:47,104 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.606e+01 9.040e+01 9.689e+01 1.934e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-20 11:55:13,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff2.min_abs, batch_count=883080.0, ans=0.1 2024-09-20 11:55:56,047 INFO [train.py:1198] (1/2) Epoch 49, batch 3600, loss[loss=0.2314, ctc_loss=0.1119, cr_loss=0.3589, attn_decoder_loss=0.2367, over 29495.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1071, cr_loss=0.345, attn_decoder_loss=0.2361, over 5791456.71 frames. ], batch size: 77, lr: 2.24e-03, grad_scale: 32.0 2024-09-20 11:56:20,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=883240.0, ans=0.04949747468305833 2024-09-20 11:56:22,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=883240.0, ans=15.0 2024-09-20 11:56:23,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=883240.0, ans=0.125 2024-09-20 11:56:39,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=883320.0, ans=0.125 2024-09-20 11:56:43,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=883320.0, ans=0.125 2024-09-20 11:56:46,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=883320.0, ans=0.025 2024-09-20 11:56:57,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=883360.0, ans=0.025 2024-09-20 11:57:10,212 INFO [train.py:1198] (1/2) Epoch 49, batch 3650, loss[loss=0.2413, ctc_loss=0.1235, cr_loss=0.3869, attn_decoder_loss=0.2458, over 29495.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1065, cr_loss=0.3441, attn_decoder_loss=0.2355, over 5794851.63 frames. ], batch size: 90, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:57:11,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883400.0, ans=0.1 2024-09-20 11:57:14,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=883400.0, ans=0.125 2024-09-20 11:57:18,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=883400.0, ans=0.07 2024-09-20 11:57:19,164 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.791e+01 8.757e+01 9.183e+01 9.714e+01 2.760e+02, threshold=1.837e+02, percent-clipped=2.0 2024-09-20 11:57:22,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=883400.0, ans=0.0 2024-09-20 11:57:33,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-20 11:57:40,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=883480.0, ans=0.125 2024-09-20 11:57:47,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=883480.0, ans=0.05 2024-09-20 11:58:10,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=883560.0, ans=0.125 2024-09-20 11:58:24,153 INFO [train.py:1198] (1/2) Epoch 49, batch 3700, loss[loss=0.2476, ctc_loss=0.1139, cr_loss=0.3583, attn_decoder_loss=0.2544, over 29696.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1068, cr_loss=0.3443, attn_decoder_loss=0.2359, over 5804041.64 frames. ], batch size: 84, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:58:33,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=883600.0, ans=0.2 2024-09-20 11:58:38,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.07 vs. limit=15.0 2024-09-20 11:58:51,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=883640.0, ans=0.125 2024-09-20 11:58:52,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=883680.0, ans=0.0 2024-09-20 11:59:31,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=883760.0, ans=0.125 2024-09-20 11:59:38,352 INFO [train.py:1198] (1/2) Epoch 49, batch 3750, loss[loss=0.2082, ctc_loss=0.09443, cr_loss=0.3119, attn_decoder_loss=0.214, over 29345.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1065, cr_loss=0.3438, attn_decoder_loss=0.2355, over 5806552.42 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:59:38,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=883800.0, ans=0.0 2024-09-20 11:59:47,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.555e+01 9.159e+01 9.903e+01 1.587e+02, threshold=1.832e+02, percent-clipped=0.0 2024-09-20 11:59:47,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=883800.0, ans=0.125 2024-09-20 12:00:07,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2024-09-20 12:00:20,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=883880.0, ans=0.0 2024-09-20 12:00:21,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=883920.0, ans=0.0 2024-09-20 12:00:24,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=883920.0, ans=0.125 2024-09-20 12:00:24,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=883920.0, ans=0.125 2024-09-20 12:00:29,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=883920.0, ans=0.125 2024-09-20 12:00:35,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.02 vs. limit=15.0 2024-09-20 12:00:46,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2024-09-20 12:00:47,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=883960.0, ans=0.125 2024-09-20 12:00:54,870 INFO [train.py:1198] (1/2) Epoch 49, batch 3800, loss[loss=0.245, ctc_loss=0.1125, cr_loss=0.3655, attn_decoder_loss=0.2516, over 29615.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1064, cr_loss=0.344, attn_decoder_loss=0.2355, over 5797248.34 frames. ], batch size: 86, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:01:21,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=22.5 2024-09-20 12:02:10,428 INFO [train.py:1198] (1/2) Epoch 49, batch 3850, loss[loss=0.2436, ctc_loss=0.1141, cr_loss=0.3442, attn_decoder_loss=0.2504, over 29271.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1062, cr_loss=0.3435, attn_decoder_loss=0.2353, over 5812481.12 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:02:13,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=22.5 2024-09-20 12:02:19,390 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.680e+01 9.125e+01 9.699e+01 1.289e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-20 12:02:46,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.75 vs. limit=15.0 2024-09-20 12:03:01,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=884320.0, ans=0.05 2024-09-20 12:03:05,750 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:03:09,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.19 vs. limit=12.0 2024-09-20 12:03:24,873 INFO [train.py:1198] (1/2) Epoch 49, batch 3900, loss[loss=0.2298, ctc_loss=0.09819, cr_loss=0.3345, attn_decoder_loss=0.237, over 29607.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1066, cr_loss=0.3446, attn_decoder_loss=0.2359, over 5817249.13 frames. ], batch size: 86, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:03:28,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.80 vs. limit=10.0 2024-09-20 12:03:41,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=884440.0, ans=0.025 2024-09-20 12:03:45,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=884440.0, ans=0.0 2024-09-20 12:03:57,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.95 vs. limit=10.0 2024-09-20 12:04:09,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=884520.0, ans=0.125 2024-09-20 12:04:09,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=884520.0, ans=0.125 2024-09-20 12:04:21,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=884520.0, ans=0.0 2024-09-20 12:04:21,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.14 vs. limit=22.5 2024-09-20 12:04:27,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884560.0, ans=0.1 2024-09-20 12:04:32,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884560.0, ans=0.1 2024-09-20 12:04:35,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=884560.0, ans=0.04949747468305833 2024-09-20 12:04:38,683 INFO [train.py:1198] (1/2) Epoch 49, batch 3950, loss[loss=0.2487, ctc_loss=0.1212, cr_loss=0.3738, attn_decoder_loss=0.2545, over 29487.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1068, cr_loss=0.3448, attn_decoder_loss=0.2364, over 5836220.31 frames. ], batch size: 97, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:04:46,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884600.0, ans=0.1 2024-09-20 12:04:47,486 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.739e+01 9.137e+01 9.584e+01 1.763e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-20 12:05:11,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=884680.0, ans=0.125 2024-09-20 12:05:13,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-20 12:05:36,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=884760.0, ans=0.125 2024-09-20 12:05:39,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=884760.0, ans=0.0 2024-09-20 12:05:50,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=884760.0, ans=0.125 2024-09-20 12:05:50,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=884760.0, ans=0.2 2024-09-20 12:05:53,640 INFO [train.py:1198] (1/2) Epoch 49, batch 4000, loss[loss=0.2147, ctc_loss=0.09272, cr_loss=0.3064, attn_decoder_loss=0.2214, over 29492.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1065, cr_loss=0.344, attn_decoder_loss=0.2362, over 5812743.84 frames. ], batch size: 74, lr: 2.23e-03, grad_scale: 32.0 2024-09-20 12:05:56,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884800.0, ans=0.1 2024-09-20 12:05:56,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=884800.0, ans=0.2 2024-09-20 12:06:01,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=884800.0, ans=0.125 2024-09-20 12:06:25,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.01 vs. limit=15.0 2024-09-20 12:06:42,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=884920.0, ans=0.125 2024-09-20 12:06:53,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.49 vs. limit=22.5 2024-09-20 12:07:01,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=884960.0, ans=0.125 2024-09-20 12:07:08,771 INFO [train.py:1198] (1/2) Epoch 49, batch 4050, loss[loss=0.2539, ctc_loss=0.1303, cr_loss=0.3679, attn_decoder_loss=0.2595, over 20567.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1068, cr_loss=0.3443, attn_decoder_loss=0.2362, over 5795287.56 frames. ], batch size: 210, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:07:18,926 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.931e+01 9.287e+01 9.798e+01 1.744e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-20 12:07:22,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=885040.0, ans=0.125 2024-09-20 12:07:23,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=885040.0, ans=0.0 2024-09-20 12:07:25,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=885040.0, ans=0.09899494936611666 2024-09-20 12:07:44,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885080.0, ans=0.1 2024-09-20 12:08:02,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2024-09-20 12:08:22,202 INFO [train.py:1198] (1/2) Epoch 49, batch 4100, loss[loss=0.2527, ctc_loss=0.1276, cr_loss=0.4093, attn_decoder_loss=0.2575, over 29508.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1073, cr_loss=0.3457, attn_decoder_loss=0.2366, over 5789950.79 frames. ], batch size: 90, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:08:28,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885200.0, ans=0.1 2024-09-20 12:08:41,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=885240.0, ans=0.0 2024-09-20 12:09:20,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.84 vs. limit=15.0 2024-09-20 12:09:22,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=885360.0, ans=0.125 2024-09-20 12:09:22,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=885360.0, ans=0.0 2024-09-20 12:09:22,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=885360.0, ans=0.125 2024-09-20 12:09:32,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=885360.0, ans=10.0 2024-09-20 12:09:35,735 INFO [train.py:1198] (1/2) Epoch 49, batch 4150, loss[loss=0.2286, ctc_loss=0.1151, cr_loss=0.3727, attn_decoder_loss=0.233, over 29488.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1075, cr_loss=0.3463, attn_decoder_loss=0.2365, over 5797230.93 frames. ], batch size: 77, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:09:46,199 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.769e+01 9.382e+01 9.981e+01 1.562e+02, threshold=1.876e+02, percent-clipped=0.0 2024-09-20 12:10:05,122 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:10:14,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.58 vs. limit=5.0 2024-09-20 12:10:21,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=885520.0, ans=0.0 2024-09-20 12:10:24,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=885520.0, ans=0.125 2024-09-20 12:10:52,059 INFO [train.py:1198] (1/2) Epoch 49, batch 4200, loss[loss=0.2597, ctc_loss=0.135, cr_loss=0.4146, attn_decoder_loss=0.2643, over 29503.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1078, cr_loss=0.3468, attn_decoder_loss=0.237, over 5798863.30 frames. ], batch size: 90, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:10:52,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=885600.0, ans=0.2 2024-09-20 12:11:09,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2024-09-20 12:11:12,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-09-20 12:12:02,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=885760.0, ans=0.025 2024-09-20 12:12:02,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885760.0, ans=0.1 2024-09-20 12:12:05,522 INFO [train.py:1198] (1/2) Epoch 49, batch 4250, loss[loss=0.2168, ctc_loss=0.09539, cr_loss=0.315, attn_decoder_loss=0.2232, over 29483.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1076, cr_loss=0.3462, attn_decoder_loss=0.2369, over 5804612.66 frames. ], batch size: 74, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:12:15,831 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.756e+01 9.233e+01 9.751e+01 2.001e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-20 12:12:17,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885800.0, ans=0.1 2024-09-20 12:12:29,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=885840.0, ans=0.0 2024-09-20 12:12:32,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=885840.0, ans=0.125 2024-09-20 12:12:37,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=12.0 2024-09-20 12:12:54,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=885920.0, ans=0.125 2024-09-20 12:13:05,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-20 12:13:13,442 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:13:19,049 INFO [train.py:1198] (1/2) Epoch 49, batch 4300, loss[loss=0.2431, ctc_loss=0.1059, cr_loss=0.3338, attn_decoder_loss=0.251, over 29540.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1072, cr_loss=0.3456, attn_decoder_loss=0.2368, over 5794689.68 frames. ], batch size: 87, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:13:24,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=886000.0, ans=0.025 2024-09-20 12:13:43,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-09-20 12:14:00,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=886080.0, ans=0.0 2024-09-20 12:14:10,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=886120.0, ans=0.125 2024-09-20 12:14:15,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.71 vs. limit=15.0 2024-09-20 12:14:18,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886160.0, ans=0.1 2024-09-20 12:14:24,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=886160.0, ans=0.125 2024-09-20 12:14:34,870 INFO [train.py:1198] (1/2) Epoch 49, batch 4350, loss[loss=0.2501, ctc_loss=0.1247, cr_loss=0.3814, attn_decoder_loss=0.2555, over 29510.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1097, cr_loss=0.3511, attn_decoder_loss=0.2399, over 5797428.30 frames. ], batch size: 97, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:14:36,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=22.5 2024-09-20 12:14:45,120 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 9.096e+01 9.498e+01 1.000e+02 1.959e+02, threshold=1.900e+02, percent-clipped=0.0 2024-09-20 12:14:49,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=886240.0, ans=0.95 2024-09-20 12:15:13,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=886280.0, ans=0.125 2024-09-20 12:15:21,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=886320.0, ans=0.0 2024-09-20 12:15:48,111 INFO [train.py:1198] (1/2) Epoch 49, batch 4400, loss[loss=0.2438, ctc_loss=0.1168, cr_loss=0.3743, attn_decoder_loss=0.2495, over 27231.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1105, cr_loss=0.3527, attn_decoder_loss=0.2415, over 5766722.36 frames. ], batch size: 124, lr: 2.23e-03, grad_scale: 32.0 2024-09-20 12:15:58,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=886400.0, ans=0.125 2024-09-20 12:15:59,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2024-09-20 12:16:06,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=886440.0, ans=0.125 2024-09-20 12:16:06,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=886440.0, ans=0.0 2024-09-20 12:16:08,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=886440.0, ans=0.025 2024-09-20 12:16:19,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=886480.0, ans=0.125 2024-09-20 12:16:27,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=886480.0, ans=0.0 2024-09-20 12:16:32,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=886520.0, ans=15.0 2024-09-20 12:16:39,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=886520.0, ans=0.125 2024-09-20 12:16:51,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.46 vs. limit=15.0 2024-09-20 12:16:57,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=886560.0, ans=0.125 2024-09-20 12:17:03,040 INFO [train.py:1198] (1/2) Epoch 49, batch 4450, loss[loss=0.2534, ctc_loss=0.1311, cr_loss=0.3826, attn_decoder_loss=0.2584, over 20335.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1138, cr_loss=0.3583, attn_decoder_loss=0.2436, over 5572818.06 frames. ], batch size: 210, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:17:03,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=886600.0, ans=0.0 2024-09-20 12:17:05,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.60 vs. limit=22.5 2024-09-20 12:17:12,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2024-09-20 12:17:16,288 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.166e+01 9.203e+01 9.654e+01 1.067e+02 3.742e+02, threshold=1.931e+02, percent-clipped=2.0 2024-09-20 12:17:16,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=886640.0, ans=0.07 2024-09-20 12:17:20,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-09-20 12:17:25,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=886640.0, ans=0.125 2024-09-20 12:17:30,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=886640.0, ans=0.025 2024-09-20 12:17:43,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=886680.0, ans=0.0 2024-09-20 12:17:49,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886720.0, ans=0.1 2024-09-20 12:17:52,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=886720.0, ans=0.125 2024-09-20 12:18:03,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-09-20 12:18:17,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=886800.0, ans=0.125 2024-09-20 12:18:18,103 INFO [train.py:1198] (1/2) Epoch 49, batch 4500, loss[loss=0.2371, ctc_loss=0.1208, cr_loss=0.348, attn_decoder_loss=0.2422, over 19928.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1161, cr_loss=0.3601, attn_decoder_loss=0.2451, over 5232513.16 frames. ], batch size: 209, lr: 2.23e-03, grad_scale: 8.0 2024-09-20 12:18:19,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=886800.0, ans=0.125 2024-09-20 12:18:22,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=886800.0, ans=0.05 2024-09-20 12:18:34,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=886840.0, ans=0.1 2024-09-20 12:18:37,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=886840.0, ans=0.0 2024-09-20 12:18:48,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=886880.0, ans=0.125 2024-09-20 12:19:45,528 INFO [train.py:1198] (1/2) Epoch 50, batch 0, loss[loss=0.2109, ctc_loss=0.08504, cr_loss=0.2994, attn_decoder_loss=0.2182, over 29586.00 frames. ], tot_loss[loss=0.2109, ctc_loss=0.08504, cr_loss=0.2994, attn_decoder_loss=0.2182, over 29586.00 frames. ], batch size: 73, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:19:45,528 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 12:20:03,818 INFO [train.py:1230] (1/2) Epoch 50, validation: loss=0.2133, ctc_loss=0.03558, cr_loss=6.519e-15, attn_decoder_loss=0.2331, over 944034.00 frames. 2024-09-20 12:20:03,818 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 12:20:15,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.79 vs. limit=12.0 2024-09-20 12:20:42,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=886980.0, ans=0.0 2024-09-20 12:20:57,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.187e+01 1.009e+02 1.098e+02 1.200e+02 1.487e+02, threshold=2.197e+02, percent-clipped=0.0 2024-09-20 12:21:21,356 INFO [train.py:1198] (1/2) Epoch 50, batch 50, loss[loss=0.2116, ctc_loss=0.09968, cr_loss=0.3318, attn_decoder_loss=0.2166, over 29421.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1089, cr_loss=0.3492, attn_decoder_loss=0.2374, over 1268398.17 frames. ], batch size: 70, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:21:51,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=887180.0, ans=0.1 2024-09-20 12:22:02,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=887180.0, ans=0.125 2024-09-20 12:22:02,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=887180.0, ans=0.125 2024-09-20 12:22:04,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=887180.0, ans=0.025 2024-09-20 12:22:13,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=887220.0, ans=0.2 2024-09-20 12:22:19,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=887220.0, ans=0.125 2024-09-20 12:22:33,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=887260.0, ans=0.025 2024-09-20 12:22:34,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=887260.0, ans=0.2 2024-09-20 12:22:37,263 INFO [train.py:1198] (1/2) Epoch 50, batch 100, loss[loss=0.2257, ctc_loss=0.1055, cr_loss=0.3335, attn_decoder_loss=0.2316, over 29539.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1092, cr_loss=0.3496, attn_decoder_loss=0.2389, over 2251911.23 frames. ], batch size: 76, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:23:13,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=12.0 2024-09-20 12:23:29,880 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.808e+01 9.273e+01 9.833e+01 1.804e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-20 12:23:33,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=887420.0, ans=0.0 2024-09-20 12:23:53,822 INFO [train.py:1198] (1/2) Epoch 50, batch 150, loss[loss=0.2127, ctc_loss=0.09989, cr_loss=0.3192, attn_decoder_loss=0.2181, over 29452.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1078, cr_loss=0.3464, attn_decoder_loss=0.2373, over 3046561.36 frames. ], batch size: 70, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:24:13,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-09-20 12:24:38,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=887580.0, ans=0.125 2024-09-20 12:24:47,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-20 12:24:49,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887620.0, ans=0.1 2024-09-20 12:24:52,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=887620.0, ans=0.1 2024-09-20 12:24:59,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=887660.0, ans=0.2 2024-09-20 12:25:06,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.19 vs. limit=22.5 2024-09-20 12:25:08,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=887660.0, ans=0.125 2024-09-20 12:25:10,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=887700.0, ans=0.125 2024-09-20 12:25:11,629 INFO [train.py:1198] (1/2) Epoch 50, batch 200, loss[loss=0.2387, ctc_loss=0.1134, cr_loss=0.3605, attn_decoder_loss=0.2447, over 27492.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1071, cr_loss=0.3448, attn_decoder_loss=0.2366, over 3658918.25 frames. ], batch size: 125, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:25:22,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=887700.0, ans=0.125 2024-09-20 12:26:04,373 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.480e+01 9.009e+01 9.638e+01 2.120e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 12:26:16,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=887860.0, ans=0.0 2024-09-20 12:26:24,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=887860.0, ans=0.0 2024-09-20 12:26:26,836 INFO [train.py:1198] (1/2) Epoch 50, batch 250, loss[loss=0.2394, ctc_loss=0.113, cr_loss=0.3545, attn_decoder_loss=0.2456, over 29232.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1066, cr_loss=0.3445, attn_decoder_loss=0.2363, over 4140898.17 frames. ], batch size: 100, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:26:27,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.77 vs. limit=15.0 2024-09-20 12:26:41,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=887900.0, ans=0.2 2024-09-20 12:27:27,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=12.0 2024-09-20 12:27:39,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=888060.0, ans=0.125 2024-09-20 12:27:43,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=888100.0, ans=0.2 2024-09-20 12:27:45,107 INFO [train.py:1198] (1/2) Epoch 50, batch 300, loss[loss=0.2485, ctc_loss=0.1271, cr_loss=0.3988, attn_decoder_loss=0.2532, over 29532.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1065, cr_loss=0.3446, attn_decoder_loss=0.2358, over 4509877.16 frames. ], batch size: 92, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:28:25,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2024-09-20 12:28:32,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=888220.0, ans=0.1 2024-09-20 12:28:35,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=888220.0, ans=0.125 2024-09-20 12:28:40,141 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 8.884e+01 9.251e+01 9.818e+01 2.212e+02, threshold=1.850e+02, percent-clipped=1.0 2024-09-20 12:29:02,945 INFO [train.py:1198] (1/2) Epoch 50, batch 350, loss[loss=0.1976, ctc_loss=0.08119, cr_loss=0.2778, attn_decoder_loss=0.2044, over 29310.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1065, cr_loss=0.3445, attn_decoder_loss=0.236, over 4795040.10 frames. ], batch size: 71, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:29:10,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=888300.0, ans=0.0 2024-09-20 12:29:19,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=888340.0, ans=0.015 2024-09-20 12:29:56,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2024-09-20 12:30:15,144 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:30:17,905 INFO [train.py:1198] (1/2) Epoch 50, batch 400, loss[loss=0.231, ctc_loss=0.1068, cr_loss=0.3386, attn_decoder_loss=0.2373, over 29719.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1063, cr_loss=0.3435, attn_decoder_loss=0.2359, over 5025736.44 frames. ], batch size: 82, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:30:28,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=888500.0, ans=0.0 2024-09-20 12:30:34,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=888540.0, ans=0.1 2024-09-20 12:31:05,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=888620.0, ans=0.125 2024-09-20 12:31:10,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=888620.0, ans=0.125 2024-09-20 12:31:14,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.622e+01 9.023e+01 9.604e+01 1.265e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-20 12:31:18,298 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:31:30,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=888660.0, ans=0.0 2024-09-20 12:31:35,879 INFO [train.py:1198] (1/2) Epoch 50, batch 450, loss[loss=0.2462, ctc_loss=0.1265, cr_loss=0.3855, attn_decoder_loss=0.2509, over 29698.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1066, cr_loss=0.344, attn_decoder_loss=0.2362, over 5187566.19 frames. ], batch size: 83, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:31:36,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=888700.0, ans=0.0 2024-09-20 12:31:51,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=888740.0, ans=0.1 2024-09-20 12:32:07,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.04 vs. limit=22.5 2024-09-20 12:32:26,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=888820.0, ans=15.0 2024-09-20 12:32:30,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=888820.0, ans=0.125 2024-09-20 12:32:31,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=888820.0, ans=0.125 2024-09-20 12:32:54,119 INFO [train.py:1198] (1/2) Epoch 50, batch 500, loss[loss=0.2457, ctc_loss=0.1174, cr_loss=0.3837, attn_decoder_loss=0.2514, over 29396.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1059, cr_loss=0.3422, attn_decoder_loss=0.2354, over 5330610.55 frames. ], batch size: 94, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:33:15,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=888940.0, ans=0.2 2024-09-20 12:33:45,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=889020.0, ans=0.04949747468305833 2024-09-20 12:33:48,200 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.714e+01 9.199e+01 9.608e+01 6.151e+02, threshold=1.840e+02, percent-clipped=1.0 2024-09-20 12:33:50,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-09-20 12:34:09,227 INFO [train.py:1198] (1/2) Epoch 50, batch 550, loss[loss=0.2368, ctc_loss=0.1088, cr_loss=0.3534, attn_decoder_loss=0.2431, over 28826.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1061, cr_loss=0.343, attn_decoder_loss=0.2353, over 5423417.14 frames. ], batch size: 104, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:34:09,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=889100.0, ans=0.2 2024-09-20 12:34:12,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=889100.0, ans=0.0 2024-09-20 12:34:43,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=889180.0, ans=0.1 2024-09-20 12:34:51,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=889180.0, ans=0.0 2024-09-20 12:34:57,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=889220.0, ans=0.125 2024-09-20 12:35:27,185 INFO [train.py:1198] (1/2) Epoch 50, batch 600, loss[loss=0.2427, ctc_loss=0.1157, cr_loss=0.3654, attn_decoder_loss=0.2487, over 29286.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1064, cr_loss=0.3436, attn_decoder_loss=0.2354, over 5510733.57 frames. ], batch size: 100, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:35:29,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2024-09-20 12:35:31,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=889300.0, ans=0.035 2024-09-20 12:35:35,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=889300.0, ans=0.125 2024-09-20 12:35:41,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=889340.0, ans=0.0 2024-09-20 12:35:47,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=889340.0, ans=0.0 2024-09-20 12:36:13,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=889420.0, ans=0.0 2024-09-20 12:36:17,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=889420.0, ans=0.0 2024-09-20 12:36:19,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=889420.0, ans=0.0 2024-09-20 12:36:23,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.988e+01 8.691e+01 9.098e+01 9.563e+01 1.951e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-20 12:36:38,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=889460.0, ans=0.1 2024-09-20 12:36:44,425 INFO [train.py:1198] (1/2) Epoch 50, batch 650, loss[loss=0.2294, ctc_loss=0.1071, cr_loss=0.344, attn_decoder_loss=0.2354, over 29742.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1057, cr_loss=0.3419, attn_decoder_loss=0.2349, over 5587391.51 frames. ], batch size: 81, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:36:57,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.53 vs. limit=10.0 2024-09-20 12:37:07,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=889540.0, ans=0.125 2024-09-20 12:37:57,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=889660.0, ans=0.2 2024-09-20 12:38:00,250 INFO [train.py:1198] (1/2) Epoch 50, batch 700, loss[loss=0.2246, ctc_loss=0.1086, cr_loss=0.3393, attn_decoder_loss=0.23, over 29537.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1065, cr_loss=0.3442, attn_decoder_loss=0.2358, over 5637237.92 frames. ], batch size: 76, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:38:37,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-09-20 12:38:47,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=889820.0, ans=0.0 2024-09-20 12:38:56,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.634e+01 9.067e+01 9.623e+01 1.303e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-20 12:39:15,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=889860.0, ans=0.0 2024-09-20 12:39:15,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=889860.0, ans=0.125 2024-09-20 12:39:17,907 INFO [train.py:1198] (1/2) Epoch 50, batch 750, loss[loss=0.2313, ctc_loss=0.1015, cr_loss=0.3353, attn_decoder_loss=0.2383, over 29729.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1067, cr_loss=0.3447, attn_decoder_loss=0.2355, over 5676360.45 frames. ], batch size: 82, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:39:21,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=889900.0, ans=0.09899494936611666 2024-09-20 12:39:22,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=889900.0, ans=0.125 2024-09-20 12:39:30,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.37 vs. limit=6.0 2024-09-20 12:39:42,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=889940.0, ans=0.125 2024-09-20 12:40:07,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=890020.0, ans=0.125 2024-09-20 12:40:22,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=890060.0, ans=0.125 2024-09-20 12:40:35,478 INFO [train.py:1198] (1/2) Epoch 50, batch 800, loss[loss=0.2187, ctc_loss=0.09792, cr_loss=0.3284, attn_decoder_loss=0.2248, over 29623.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1069, cr_loss=0.3454, attn_decoder_loss=0.2357, over 5707210.77 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:40:37,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.88 vs. limit=12.0 2024-09-20 12:40:50,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890140.0, ans=0.1 2024-09-20 12:40:53,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=890140.0, ans=0.125 2024-09-20 12:41:02,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=890140.0, ans=0.0 2024-09-20 12:41:02,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=890140.0, ans=0.1 2024-09-20 12:41:10,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=890180.0, ans=0.125 2024-09-20 12:41:29,703 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.785e+01 9.269e+01 9.766e+01 2.898e+02, threshold=1.854e+02, percent-clipped=1.0 2024-09-20 12:41:34,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=890260.0, ans=0.125 2024-09-20 12:41:35,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-09-20 12:41:47,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=890260.0, ans=22.5 2024-09-20 12:41:50,577 INFO [train.py:1198] (1/2) Epoch 50, batch 850, loss[loss=0.2392, ctc_loss=0.1084, cr_loss=0.3374, attn_decoder_loss=0.2463, over 29707.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1064, cr_loss=0.3442, attn_decoder_loss=0.2354, over 5736666.21 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:41:52,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=890300.0, ans=0.125 2024-09-20 12:42:01,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=890300.0, ans=0.125 2024-09-20 12:42:02,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=890300.0, ans=0.125 2024-09-20 12:42:14,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=890340.0, ans=0.1 2024-09-20 12:42:31,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=890380.0, ans=0.125 2024-09-20 12:42:33,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=890380.0, ans=0.125 2024-09-20 12:42:41,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-20 12:42:53,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=890460.0, ans=0.2 2024-09-20 12:43:07,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=890500.0, ans=0.0 2024-09-20 12:43:08,693 INFO [train.py:1198] (1/2) Epoch 50, batch 900, loss[loss=0.216, ctc_loss=0.09374, cr_loss=0.3178, attn_decoder_loss=0.2225, over 29595.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1067, cr_loss=0.345, attn_decoder_loss=0.2359, over 5740522.55 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:43:14,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.70 vs. limit=6.0 2024-09-20 12:43:14,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=890500.0, ans=0.0 2024-09-20 12:43:33,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=890540.0, ans=0.025 2024-09-20 12:43:53,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=890580.0, ans=0.025 2024-09-20 12:44:00,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=890620.0, ans=0.0 2024-09-20 12:44:02,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=890620.0, ans=0.125 2024-09-20 12:44:03,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=890620.0, ans=0.125 2024-09-20 12:44:06,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 8.704e+01 9.222e+01 9.610e+01 2.090e+02, threshold=1.844e+02, percent-clipped=2.0 2024-09-20 12:44:07,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2024-09-20 12:44:15,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=890660.0, ans=0.2 2024-09-20 12:44:18,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=890660.0, ans=0.125 2024-09-20 12:44:25,915 INFO [train.py:1198] (1/2) Epoch 50, batch 950, loss[loss=0.2221, ctc_loss=0.09853, cr_loss=0.3298, attn_decoder_loss=0.2285, over 29512.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1065, cr_loss=0.3444, attn_decoder_loss=0.2359, over 5742446.27 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:45:06,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.47 vs. limit=15.0 2024-09-20 12:45:21,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=12.0 2024-09-20 12:45:22,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=890820.0, ans=0.2 2024-09-20 12:45:27,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=890860.0, ans=0.0 2024-09-20 12:45:32,552 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:45:34,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=890860.0, ans=0.125 2024-09-20 12:45:41,062 INFO [train.py:1198] (1/2) Epoch 50, batch 1000, loss[loss=0.2196, ctc_loss=0.09543, cr_loss=0.3222, attn_decoder_loss=0.2262, over 29504.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1069, cr_loss=0.3451, attn_decoder_loss=0.2363, over 5736194.64 frames. ], batch size: 77, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:45:52,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-09-20 12:45:56,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=890940.0, ans=0.125 2024-09-20 12:45:59,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=890940.0, ans=0.125 2024-09-20 12:46:00,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=890940.0, ans=0.125 2024-09-20 12:46:03,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=890940.0, ans=0.125 2024-09-20 12:46:09,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=890980.0, ans=0.0 2024-09-20 12:46:24,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=891020.0, ans=0.125 2024-09-20 12:46:31,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=891020.0, ans=0.025 2024-09-20 12:46:34,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=891020.0, ans=0.0 2024-09-20 12:46:38,847 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.665e+01 9.186e+01 9.812e+01 1.609e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-20 12:46:45,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=891060.0, ans=0.0 2024-09-20 12:46:51,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=891060.0, ans=0.1 2024-09-20 12:46:58,385 INFO [train.py:1198] (1/2) Epoch 50, batch 1050, loss[loss=0.2398, ctc_loss=0.1116, cr_loss=0.365, attn_decoder_loss=0.246, over 29665.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1068, cr_loss=0.3449, attn_decoder_loss=0.2359, over 5744188.96 frames. ], batch size: 85, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:47:28,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=891180.0, ans=0.95 2024-09-20 12:47:33,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.25 vs. limit=15.0 2024-09-20 12:47:40,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-09-20 12:47:48,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=891220.0, ans=0.0 2024-09-20 12:47:52,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=891220.0, ans=0.1 2024-09-20 12:47:55,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=891220.0, ans=0.125 2024-09-20 12:48:16,829 INFO [train.py:1198] (1/2) Epoch 50, batch 1100, loss[loss=0.2181, ctc_loss=0.09623, cr_loss=0.3301, attn_decoder_loss=0.2243, over 29444.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1067, cr_loss=0.3446, attn_decoder_loss=0.2358, over 5756822.08 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:48:20,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=891300.0, ans=0.125 2024-09-20 12:48:38,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=891340.0, ans=0.125 2024-09-20 12:49:12,783 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.673e+01 9.174e+01 9.700e+01 1.224e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-20 12:49:20,615 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:49:32,518 INFO [train.py:1198] (1/2) Epoch 50, batch 1150, loss[loss=0.2224, ctc_loss=0.09797, cr_loss=0.3232, attn_decoder_loss=0.2291, over 29440.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1068, cr_loss=0.3446, attn_decoder_loss=0.2358, over 5754658.89 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:49:43,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=891500.0, ans=0.0 2024-09-20 12:49:49,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=891540.0, ans=0.125 2024-09-20 12:49:49,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=891540.0, ans=0.025 2024-09-20 12:50:14,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2024-09-20 12:50:20,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2024-09-20 12:50:27,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=891620.0, ans=0.0 2024-09-20 12:50:45,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.87 vs. limit=10.0 2024-09-20 12:50:50,194 INFO [train.py:1198] (1/2) Epoch 50, batch 1200, loss[loss=0.2436, ctc_loss=0.1107, cr_loss=0.3677, attn_decoder_loss=0.2502, over 29670.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1069, cr_loss=0.3445, attn_decoder_loss=0.2364, over 5746321.68 frames. ], batch size: 85, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:51:08,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=891740.0, ans=0.0 2024-09-20 12:51:48,358 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.827e+01 9.334e+01 1.012e+02 2.490e+02, threshold=1.867e+02, percent-clipped=1.0 2024-09-20 12:51:49,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.75 vs. limit=10.0 2024-09-20 12:52:07,783 INFO [train.py:1198] (1/2) Epoch 50, batch 1250, loss[loss=0.2555, ctc_loss=0.1306, cr_loss=0.4071, attn_decoder_loss=0.2603, over 29527.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1072, cr_loss=0.3455, attn_decoder_loss=0.2369, over 5774223.82 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:52:21,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2024-09-20 12:52:21,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=891940.0, ans=0.125 2024-09-20 12:52:38,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=891980.0, ans=0.0 2024-09-20 12:52:52,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=892020.0, ans=0.125 2024-09-20 12:53:16,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=892060.0, ans=0.0 2024-09-20 12:53:24,085 INFO [train.py:1198] (1/2) Epoch 50, batch 1300, loss[loss=0.2487, ctc_loss=0.1203, cr_loss=0.3835, attn_decoder_loss=0.2544, over 28240.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1067, cr_loss=0.3443, attn_decoder_loss=0.2365, over 5778031.43 frames. ], batch size: 111, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:53:27,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=892100.0, ans=0.0 2024-09-20 12:53:49,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=892140.0, ans=0.125 2024-09-20 12:53:53,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=892180.0, ans=0.125 2024-09-20 12:54:14,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=892220.0, ans=0.125 2024-09-20 12:54:17,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-20 12:54:19,787 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.576e+01 8.998e+01 9.559e+01 1.394e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-20 12:54:24,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=892260.0, ans=0.0 2024-09-20 12:54:32,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=892260.0, ans=0.0 2024-09-20 12:54:38,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=892260.0, ans=0.2 2024-09-20 12:54:41,688 INFO [train.py:1198] (1/2) Epoch 50, batch 1350, loss[loss=0.2304, ctc_loss=0.1102, cr_loss=0.3632, attn_decoder_loss=0.2356, over 29774.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1062, cr_loss=0.3434, attn_decoder_loss=0.2359, over 5793990.34 frames. ], batch size: 81, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:54:43,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=892300.0, ans=0.0 2024-09-20 12:54:50,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=892300.0, ans=0.2 2024-09-20 12:54:59,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=892340.0, ans=0.02 2024-09-20 12:55:07,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=3.88 vs. limit=12.0 2024-09-20 12:55:08,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-09-20 12:55:16,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-09-20 12:55:37,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=12.0 2024-09-20 12:55:50,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-09-20 12:55:58,283 INFO [train.py:1198] (1/2) Epoch 50, batch 1400, loss[loss=0.1999, ctc_loss=0.08437, cr_loss=0.2972, attn_decoder_loss=0.2062, over 29575.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1063, cr_loss=0.344, attn_decoder_loss=0.2358, over 5805634.30 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:56:06,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=892500.0, ans=0.0 2024-09-20 12:56:06,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=892500.0, ans=0.2 2024-09-20 12:56:11,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=15.0 2024-09-20 12:56:13,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=892540.0, ans=0.125 2024-09-20 12:56:35,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=892580.0, ans=0.0 2024-09-20 12:56:46,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=892620.0, ans=0.0 2024-09-20 12:56:53,533 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 8.988e+01 9.426e+01 9.888e+01 1.632e+02, threshold=1.885e+02, percent-clipped=0.0 2024-09-20 12:57:07,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=892660.0, ans=0.125 2024-09-20 12:57:12,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.64 vs. limit=6.0 2024-09-20 12:57:13,308 INFO [train.py:1198] (1/2) Epoch 50, batch 1450, loss[loss=0.2483, ctc_loss=0.1227, cr_loss=0.3799, attn_decoder_loss=0.2538, over 29479.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1062, cr_loss=0.3432, attn_decoder_loss=0.2359, over 5803828.39 frames. ], batch size: 94, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:57:26,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892740.0, ans=0.1 2024-09-20 12:57:32,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2024-09-20 12:57:33,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892740.0, ans=0.1 2024-09-20 12:58:30,652 INFO [train.py:1198] (1/2) Epoch 50, batch 1500, loss[loss=0.2461, ctc_loss=0.1151, cr_loss=0.3628, attn_decoder_loss=0.2526, over 29619.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1065, cr_loss=0.3437, attn_decoder_loss=0.2364, over 5804130.64 frames. ], batch size: 86, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:58:35,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=892900.0, ans=0.0 2024-09-20 12:58:47,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=892940.0, ans=0.0 2024-09-20 12:59:02,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=892980.0, ans=0.125 2024-09-20 12:59:21,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=893020.0, ans=0.2 2024-09-20 12:59:24,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=893020.0, ans=0.125 2024-09-20 12:59:28,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2024-09-20 12:59:28,762 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.732e+01 9.209e+01 9.748e+01 2.356e+02, threshold=1.842e+02, percent-clipped=2.0 2024-09-20 12:59:31,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2024-09-20 12:59:48,130 INFO [train.py:1198] (1/2) Epoch 50, batch 1550, loss[loss=0.2467, ctc_loss=0.122, cr_loss=0.3834, attn_decoder_loss=0.252, over 29508.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.107, cr_loss=0.3447, attn_decoder_loss=0.2364, over 5780762.42 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:00:19,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=893180.0, ans=0.125 2024-09-20 13:00:22,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=893180.0, ans=15.0 2024-09-20 13:00:31,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=893220.0, ans=0.2 2024-09-20 13:00:34,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=893220.0, ans=0.125 2024-09-20 13:00:37,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=893220.0, ans=0.2 2024-09-20 13:00:40,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=893220.0, ans=0.125 2024-09-20 13:00:44,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.64 vs. limit=22.5 2024-09-20 13:00:45,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=893220.0, ans=0.125 2024-09-20 13:00:48,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=893260.0, ans=0.0 2024-09-20 13:01:02,875 INFO [train.py:1198] (1/2) Epoch 50, batch 1600, loss[loss=0.2388, ctc_loss=0.1158, cr_loss=0.3668, attn_decoder_loss=0.2444, over 29683.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1071, cr_loss=0.3449, attn_decoder_loss=0.2362, over 5762811.96 frames. ], batch size: 85, lr: 2.20e-03, grad_scale: 32.0 2024-09-20 13:01:08,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.47 vs. limit=22.5 2024-09-20 13:01:12,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.72 vs. limit=15.0 2024-09-20 13:01:20,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.57 vs. limit=12.0 2024-09-20 13:01:24,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=893340.0, ans=0.025 2024-09-20 13:01:27,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=893340.0, ans=0.125 2024-09-20 13:01:35,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=893380.0, ans=0.125 2024-09-20 13:01:54,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=893420.0, ans=0.0 2024-09-20 13:01:59,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893420.0, ans=0.1 2024-09-20 13:02:00,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.886e+01 8.710e+01 9.146e+01 9.958e+01 1.437e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-20 13:02:00,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=893420.0, ans=0.125 2024-09-20 13:02:02,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893460.0, ans=0.1 2024-09-20 13:02:02,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=893460.0, ans=0.0 2024-09-20 13:02:20,551 INFO [train.py:1198] (1/2) Epoch 50, batch 1650, loss[loss=0.2372, ctc_loss=0.1121, cr_loss=0.342, attn_decoder_loss=0.2435, over 29716.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1071, cr_loss=0.3445, attn_decoder_loss=0.2362, over 5758301.10 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:02:35,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=893540.0, ans=0.125 2024-09-20 13:02:43,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=893540.0, ans=0.125 2024-09-20 13:02:45,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2024-09-20 13:03:10,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=893620.0, ans=0.125 2024-09-20 13:03:14,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2024-09-20 13:03:19,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=893620.0, ans=0.2 2024-09-20 13:03:37,897 INFO [train.py:1198] (1/2) Epoch 50, batch 1700, loss[loss=0.2003, ctc_loss=0.08652, cr_loss=0.2985, attn_decoder_loss=0.2064, over 29520.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1064, cr_loss=0.3431, attn_decoder_loss=0.2359, over 5780676.27 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:03:42,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=893700.0, ans=0.125 2024-09-20 13:03:47,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=893700.0, ans=0.125 2024-09-20 13:03:56,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=893740.0, ans=0.125 2024-09-20 13:03:56,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=22.5 2024-09-20 13:03:57,789 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:04:05,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=893740.0, ans=0.0 2024-09-20 13:04:08,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=893780.0, ans=0.09899494936611666 2024-09-20 13:04:35,201 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.662e+01 9.260e+01 9.838e+01 1.206e+02, threshold=1.852e+02, percent-clipped=0.0 2024-09-20 13:04:47,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=893860.0, ans=0.0 2024-09-20 13:04:53,292 INFO [train.py:1198] (1/2) Epoch 50, batch 1750, loss[loss=0.2123, ctc_loss=0.09845, cr_loss=0.3248, attn_decoder_loss=0.2177, over 29313.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1064, cr_loss=0.3432, attn_decoder_loss=0.2357, over 5789086.97 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:05:02,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=893900.0, ans=0.09899494936611666 2024-09-20 13:05:05,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=893900.0, ans=0.125 2024-09-20 13:05:07,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=893940.0, ans=0.2 2024-09-20 13:05:49,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=894020.0, ans=0.125 2024-09-20 13:05:53,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=22.5 2024-09-20 13:06:05,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=894060.0, ans=0.0 2024-09-20 13:06:08,367 INFO [train.py:1198] (1/2) Epoch 50, batch 1800, loss[loss=0.2478, ctc_loss=0.125, cr_loss=0.3994, attn_decoder_loss=0.2525, over 29680.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1065, cr_loss=0.3431, attn_decoder_loss=0.2359, over 5790843.07 frames. ], batch size: 83, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:06:27,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=894140.0, ans=0.0 2024-09-20 13:06:38,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=894140.0, ans=0.07 2024-09-20 13:06:38,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-09-20 13:06:45,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=894180.0, ans=0.125 2024-09-20 13:06:46,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=894180.0, ans=0.025 2024-09-20 13:06:56,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=894220.0, ans=0.125 2024-09-20 13:07:10,020 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.701e+01 9.171e+01 9.771e+01 2.069e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-20 13:07:17,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=894260.0, ans=10.0 2024-09-20 13:07:28,091 INFO [train.py:1198] (1/2) Epoch 50, batch 1850, loss[loss=0.2362, ctc_loss=0.1117, cr_loss=0.3513, attn_decoder_loss=0.2422, over 29627.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1066, cr_loss=0.3438, attn_decoder_loss=0.236, over 5794939.07 frames. ], batch size: 86, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:07:35,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=894300.0, ans=0.0 2024-09-20 13:07:40,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=894300.0, ans=0.2 2024-09-20 13:07:47,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-09-20 13:08:19,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=894420.0, ans=0.125 2024-09-20 13:08:21,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=894420.0, ans=0.025 2024-09-20 13:08:31,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=894460.0, ans=0.2 2024-09-20 13:08:37,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=894460.0, ans=0.0 2024-09-20 13:08:43,293 INFO [train.py:1198] (1/2) Epoch 50, batch 1900, loss[loss=0.2492, ctc_loss=0.1139, cr_loss=0.3674, attn_decoder_loss=0.2561, over 29725.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.107, cr_loss=0.3445, attn_decoder_loss=0.2366, over 5803582.92 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:08:53,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.89 vs. limit=12.0 2024-09-20 13:09:06,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=894540.0, ans=0.0 2024-09-20 13:09:42,595 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.927e+01 9.438e+01 9.973e+01 1.317e+02, threshold=1.888e+02, percent-clipped=0.0 2024-09-20 13:09:59,261 INFO [train.py:1198] (1/2) Epoch 50, batch 1950, loss[loss=0.2266, ctc_loss=0.09945, cr_loss=0.3131, attn_decoder_loss=0.2338, over 29431.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1073, cr_loss=0.3455, attn_decoder_loss=0.2376, over 5818338.28 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:10:12,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=894700.0, ans=0.125 2024-09-20 13:10:13,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=894700.0, ans=0.0 2024-09-20 13:10:15,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=894740.0, ans=0.125 2024-09-20 13:10:31,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=894780.0, ans=0.125 2024-09-20 13:10:33,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.60 vs. limit=15.0 2024-09-20 13:11:18,240 INFO [train.py:1198] (1/2) Epoch 50, batch 2000, loss[loss=0.2029, ctc_loss=0.08552, cr_loss=0.2998, attn_decoder_loss=0.2093, over 29345.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1072, cr_loss=0.3449, attn_decoder_loss=0.2378, over 5795504.20 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:11:32,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=894940.0, ans=0.2 2024-09-20 13:11:44,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=894940.0, ans=0.125 2024-09-20 13:11:47,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=894980.0, ans=0.0 2024-09-20 13:12:00,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=894980.0, ans=0.125 2024-09-20 13:12:17,047 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.718e+01 9.150e+01 9.752e+01 2.823e+02, threshold=1.830e+02, percent-clipped=2.0 2024-09-20 13:12:25,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=895060.0, ans=0.125 2024-09-20 13:12:34,060 INFO [train.py:1198] (1/2) Epoch 50, batch 2050, loss[loss=0.2103, ctc_loss=0.09076, cr_loss=0.3094, attn_decoder_loss=0.2167, over 29424.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1071, cr_loss=0.345, attn_decoder_loss=0.237, over 5789036.33 frames. ], batch size: 70, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:12:37,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=895100.0, ans=0.05 2024-09-20 13:12:46,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=895100.0, ans=0.1 2024-09-20 13:13:02,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=895180.0, ans=0.1 2024-09-20 13:13:09,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=15.0 2024-09-20 13:13:19,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=895220.0, ans=0.05 2024-09-20 13:13:29,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.69 vs. limit=10.0 2024-09-20 13:13:48,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=895300.0, ans=0.125 2024-09-20 13:13:48,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895300.0, ans=0.1 2024-09-20 13:13:49,477 INFO [train.py:1198] (1/2) Epoch 50, batch 2100, loss[loss=0.239, ctc_loss=0.1169, cr_loss=0.3788, attn_decoder_loss=0.2442, over 29753.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1067, cr_loss=0.3448, attn_decoder_loss=0.2365, over 5799419.00 frames. ], batch size: 81, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:13:59,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.10 vs. limit=22.5 2024-09-20 13:14:05,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=895340.0, ans=0.2 2024-09-20 13:14:11,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895340.0, ans=0.1 2024-09-20 13:14:47,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=895420.0, ans=0.125 2024-09-20 13:14:51,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.617e+01 9.072e+01 9.604e+01 1.170e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 13:14:55,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=895460.0, ans=0.035 2024-09-20 13:14:59,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=895460.0, ans=0.2 2024-09-20 13:15:01,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=895460.0, ans=0.0 2024-09-20 13:15:08,556 INFO [train.py:1198] (1/2) Epoch 50, batch 2150, loss[loss=0.2204, ctc_loss=0.1004, cr_loss=0.3231, attn_decoder_loss=0.2266, over 29436.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1062, cr_loss=0.3443, attn_decoder_loss=0.2358, over 5814439.31 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:15:24,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=895540.0, ans=0.05 2024-09-20 13:15:31,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=895540.0, ans=0.125 2024-09-20 13:15:47,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-20 13:15:48,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=895580.0, ans=0.1 2024-09-20 13:15:58,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=895620.0, ans=0.5 2024-09-20 13:16:23,845 INFO [train.py:1198] (1/2) Epoch 50, batch 2200, loss[loss=0.235, ctc_loss=0.1077, cr_loss=0.3442, attn_decoder_loss=0.2415, over 29633.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1063, cr_loss=0.3447, attn_decoder_loss=0.2359, over 5811811.52 frames. ], batch size: 86, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:16:26,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.81 vs. limit=15.0 2024-09-20 13:16:31,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=895700.0, ans=0.125 2024-09-20 13:16:37,694 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:16:44,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=895740.0, ans=0.125 2024-09-20 13:16:47,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=895740.0, ans=0.2 2024-09-20 13:16:48,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=895740.0, ans=0.2 2024-09-20 13:17:15,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=895820.0, ans=0.0 2024-09-20 13:17:23,883 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.920e+01 8.609e+01 9.131e+01 9.597e+01 2.793e+02, threshold=1.826e+02, percent-clipped=2.0 2024-09-20 13:17:38,980 INFO [train.py:1198] (1/2) Epoch 50, batch 2250, loss[loss=0.2321, ctc_loss=0.1058, cr_loss=0.3493, attn_decoder_loss=0.2383, over 29684.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1057, cr_loss=0.3429, attn_decoder_loss=0.2356, over 5810113.11 frames. ], batch size: 82, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:18:02,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=895940.0, ans=0.0 2024-09-20 13:18:06,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=895940.0, ans=0.025 2024-09-20 13:18:26,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=895980.0, ans=0.0 2024-09-20 13:18:26,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=895980.0, ans=0.125 2024-09-20 13:18:40,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=896020.0, ans=0.2 2024-09-20 13:18:53,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=896060.0, ans=0.125 2024-09-20 13:18:56,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=896060.0, ans=0.125 2024-09-20 13:19:02,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=896060.0, ans=0.125 2024-09-20 13:19:05,562 INFO [train.py:1198] (1/2) Epoch 50, batch 2300, loss[loss=0.2091, ctc_loss=0.0924, cr_loss=0.3129, attn_decoder_loss=0.2152, over 29328.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1052, cr_loss=0.3413, attn_decoder_loss=0.2348, over 5796602.72 frames. ], batch size: 71, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:19:07,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=896100.0, ans=0.125 2024-09-20 13:19:35,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=896180.0, ans=0.1 2024-09-20 13:20:05,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.477e+01 9.129e+01 9.785e+01 2.320e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-20 13:20:12,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=896260.0, ans=0.1 2024-09-20 13:20:20,686 INFO [train.py:1198] (1/2) Epoch 50, batch 2350, loss[loss=0.2471, ctc_loss=0.1179, cr_loss=0.3635, attn_decoder_loss=0.2534, over 29703.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1057, cr_loss=0.3424, attn_decoder_loss=0.2352, over 5802686.35 frames. ], batch size: 83, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:20:29,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=896300.0, ans=0.0 2024-09-20 13:20:47,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=896340.0, ans=0.0 2024-09-20 13:20:51,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=896380.0, ans=0.0 2024-09-20 13:21:09,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-09-20 13:21:16,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=896420.0, ans=0.2 2024-09-20 13:21:22,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=896460.0, ans=0.125 2024-09-20 13:21:36,031 INFO [train.py:1198] (1/2) Epoch 50, batch 2400, loss[loss=0.22, ctc_loss=0.1046, cr_loss=0.3546, attn_decoder_loss=0.225, over 29542.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1061, cr_loss=0.3436, attn_decoder_loss=0.2356, over 5807470.68 frames. ], batch size: 76, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:21:55,896 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:22:05,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.03 vs. limit=15.0 2024-09-20 13:22:09,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=896580.0, ans=0.125 2024-09-20 13:22:11,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=896580.0, ans=0.125 2024-09-20 13:22:32,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=896620.0, ans=0.0 2024-09-20 13:22:34,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=896620.0, ans=0.125 2024-09-20 13:22:35,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=896620.0, ans=0.125 2024-09-20 13:22:40,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 8.885e+01 9.266e+01 9.766e+01 1.218e+02, threshold=1.853e+02, percent-clipped=0.0 2024-09-20 13:22:49,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=896660.0, ans=0.09899494936611666 2024-09-20 13:22:55,252 INFO [train.py:1198] (1/2) Epoch 50, batch 2450, loss[loss=0.2279, ctc_loss=0.1052, cr_loss=0.3492, attn_decoder_loss=0.2337, over 29708.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1069, cr_loss=0.3451, attn_decoder_loss=0.2364, over 5784967.42 frames. ], batch size: 82, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:23:15,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.63 vs. limit=15.0 2024-09-20 13:23:40,948 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:23:43,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896820.0, ans=0.1 2024-09-20 13:23:48,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=896820.0, ans=0.0 2024-09-20 13:24:03,294 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:24:10,521 INFO [train.py:1198] (1/2) Epoch 50, batch 2500, loss[loss=0.236, ctc_loss=0.107, cr_loss=0.3452, attn_decoder_loss=0.2426, over 29615.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1067, cr_loss=0.3449, attn_decoder_loss=0.2361, over 5794318.60 frames. ], batch size: 86, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:24:16,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=896900.0, ans=0.2 2024-09-20 13:24:28,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2024-09-20 13:24:34,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=896940.0, ans=0.0 2024-09-20 13:24:35,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=896940.0, ans=0.125 2024-09-20 13:25:10,798 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.696e+01 9.094e+01 9.539e+01 5.829e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-20 13:25:14,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897060.0, ans=0.1 2024-09-20 13:25:18,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=897060.0, ans=0.125 2024-09-20 13:25:22,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.65 vs. limit=15.0 2024-09-20 13:25:23,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=897060.0, ans=0.0 2024-09-20 13:25:25,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2024-09-20 13:25:25,884 INFO [train.py:1198] (1/2) Epoch 50, batch 2550, loss[loss=0.2112, ctc_loss=0.09228, cr_loss=0.3222, attn_decoder_loss=0.2173, over 29385.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1065, cr_loss=0.344, attn_decoder_loss=0.2361, over 5798608.96 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:25:42,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897140.0, ans=0.1 2024-09-20 13:25:45,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=897140.0, ans=0.0 2024-09-20 13:25:46,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.44 vs. limit=15.0 2024-09-20 13:26:21,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897220.0, ans=0.1 2024-09-20 13:26:26,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-20 13:26:27,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=897220.0, ans=0.125 2024-09-20 13:26:38,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-20 13:26:39,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=897260.0, ans=0.125 2024-09-20 13:26:45,536 INFO [train.py:1198] (1/2) Epoch 50, batch 2600, loss[loss=0.2333, ctc_loss=0.117, cr_loss=0.3627, attn_decoder_loss=0.2382, over 29452.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1068, cr_loss=0.3448, attn_decoder_loss=0.2366, over 5796425.55 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:26:56,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897300.0, ans=0.1 2024-09-20 13:26:56,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-09-20 13:27:14,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.94 vs. limit=15.0 2024-09-20 13:27:45,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.589e+01 9.158e+01 9.861e+01 1.661e+02, threshold=1.832e+02, percent-clipped=0.0 2024-09-20 13:27:57,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=897460.0, ans=0.2 2024-09-20 13:28:00,333 INFO [train.py:1198] (1/2) Epoch 50, batch 2650, loss[loss=0.2431, ctc_loss=0.1134, cr_loss=0.3506, attn_decoder_loss=0.2497, over 29219.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1071, cr_loss=0.3454, attn_decoder_loss=0.2369, over 5802040.58 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:28:17,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=897540.0, ans=0.125 2024-09-20 13:28:18,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=897540.0, ans=0.025 2024-09-20 13:28:46,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=897620.0, ans=0.125 2024-09-20 13:28:49,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=12.0 2024-09-20 13:28:50,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=897620.0, ans=0.0 2024-09-20 13:29:01,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.01 vs. limit=10.0 2024-09-20 13:29:03,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=897660.0, ans=0.125 2024-09-20 13:29:13,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=897660.0, ans=0.125 2024-09-20 13:29:15,759 INFO [train.py:1198] (1/2) Epoch 50, batch 2700, loss[loss=0.2273, ctc_loss=0.09269, cr_loss=0.3177, attn_decoder_loss=0.2352, over 29525.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.107, cr_loss=0.3457, attn_decoder_loss=0.2372, over 5796442.33 frames. ], batch size: 87, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:29:32,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=897740.0, ans=0.025 2024-09-20 13:29:34,057 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:29:34,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=897740.0, ans=0.125 2024-09-20 13:29:51,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=897780.0, ans=0.1 2024-09-20 13:30:06,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897820.0, ans=0.1 2024-09-20 13:30:11,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.70 vs. limit=15.0 2024-09-20 13:30:19,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.670e+01 9.155e+01 9.600e+01 1.586e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-20 13:30:30,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=897860.0, ans=0.09899494936611666 2024-09-20 13:30:32,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=897860.0, ans=0.1 2024-09-20 13:30:35,020 INFO [train.py:1198] (1/2) Epoch 50, batch 2750, loss[loss=0.2289, ctc_loss=0.1124, cr_loss=0.3753, attn_decoder_loss=0.2335, over 29518.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1063, cr_loss=0.3441, attn_decoder_loss=0.2362, over 5795311.98 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:30:37,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2024-09-20 13:30:38,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=897900.0, ans=0.125 2024-09-20 13:30:38,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=897900.0, ans=0.1 2024-09-20 13:31:05,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=897980.0, ans=0.1 2024-09-20 13:31:15,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=897980.0, ans=0.125 2024-09-20 13:31:23,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=898020.0, ans=0.125 2024-09-20 13:31:23,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-20 13:31:26,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=898020.0, ans=0.125 2024-09-20 13:31:40,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-09-20 13:31:50,611 INFO [train.py:1198] (1/2) Epoch 50, batch 2800, loss[loss=0.2437, ctc_loss=0.1264, cr_loss=0.3597, attn_decoder_loss=0.2487, over 20059.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1069, cr_loss=0.3448, attn_decoder_loss=0.2366, over 5776612.07 frames. ], batch size: 210, lr: 2.19e-03, grad_scale: 32.0 2024-09-20 13:31:50,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=898100.0, ans=0.125 2024-09-20 13:32:07,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=898140.0, ans=0.0 2024-09-20 13:32:46,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=898220.0, ans=0.1 2024-09-20 13:32:50,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=898260.0, ans=0.0 2024-09-20 13:32:53,361 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.751e+01 9.267e+01 9.840e+01 2.500e+02, threshold=1.853e+02, percent-clipped=1.0 2024-09-20 13:32:57,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-20 13:33:05,416 INFO [train.py:1198] (1/2) Epoch 50, batch 2850, loss[loss=0.2351, ctc_loss=0.1117, cr_loss=0.365, attn_decoder_loss=0.2407, over 29485.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1074, cr_loss=0.3461, attn_decoder_loss=0.2372, over 5763173.90 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:33:28,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=898340.0, ans=0.0 2024-09-20 13:33:30,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.08 vs. limit=15.0 2024-09-20 13:33:38,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=898380.0, ans=0.0 2024-09-20 13:33:39,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=898380.0, ans=0.125 2024-09-20 13:34:08,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=898460.0, ans=0.125 2024-09-20 13:34:11,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=898460.0, ans=0.125 2024-09-20 13:34:13,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=898460.0, ans=0.1 2024-09-20 13:34:23,631 INFO [train.py:1198] (1/2) Epoch 50, batch 2900, loss[loss=0.2198, ctc_loss=0.09121, cr_loss=0.3081, attn_decoder_loss=0.2272, over 29409.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1079, cr_loss=0.3476, attn_decoder_loss=0.2381, over 5788084.39 frames. ], batch size: 79, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:34:31,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=898500.0, ans=0.125 2024-09-20 13:34:40,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=898540.0, ans=0.0 2024-09-20 13:34:57,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2024-09-20 13:35:01,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=898580.0, ans=0.125 2024-09-20 13:35:16,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=898620.0, ans=0.1 2024-09-20 13:35:21,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898620.0, ans=0.1 2024-09-20 13:35:26,849 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 8.699e+01 9.096e+01 9.563e+01 1.472e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-20 13:35:30,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898660.0, ans=0.1 2024-09-20 13:35:33,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=898660.0, ans=0.0 2024-09-20 13:35:38,792 INFO [train.py:1198] (1/2) Epoch 50, batch 2950, loss[loss=0.2229, ctc_loss=0.1064, cr_loss=0.3487, attn_decoder_loss=0.2281, over 29521.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.107, cr_loss=0.3452, attn_decoder_loss=0.2367, over 5783032.30 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:35:43,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=898700.0, ans=10.0 2024-09-20 13:36:22,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=898820.0, ans=0.125 2024-09-20 13:36:28,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=898820.0, ans=0.07 2024-09-20 13:36:30,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=898820.0, ans=0.125 2024-09-20 13:36:54,238 INFO [train.py:1198] (1/2) Epoch 50, batch 3000, loss[loss=0.2347, ctc_loss=0.1122, cr_loss=0.3508, attn_decoder_loss=0.2406, over 29746.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1072, cr_loss=0.346, attn_decoder_loss=0.2368, over 5783219.13 frames. ], batch size: 81, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:36:54,238 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 13:37:12,393 INFO [train.py:1230] (1/2) Epoch 50, validation: loss=0.213, ctc_loss=0.03629, cr_loss=7.081e-15, attn_decoder_loss=0.2326, over 944034.00 frames. 2024-09-20 13:37:12,394 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 52672MB 2024-09-20 13:37:33,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=898940.0, ans=0.025 2024-09-20 13:37:43,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=898940.0, ans=0.125 2024-09-20 13:38:03,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=899020.0, ans=0.125 2024-09-20 13:38:04,976 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:38:13,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=899020.0, ans=0.0 2024-09-20 13:38:19,549 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 8.906e+01 9.324e+01 9.722e+01 1.754e+02, threshold=1.865e+02, percent-clipped=0.0 2024-09-20 13:38:31,731 INFO [train.py:1198] (1/2) Epoch 50, batch 3050, loss[loss=0.2152, ctc_loss=0.09753, cr_loss=0.3168, attn_decoder_loss=0.2213, over 29502.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1074, cr_loss=0.3467, attn_decoder_loss=0.2371, over 5776516.46 frames. ], batch size: 76, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:38:45,883 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:39:04,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.73 vs. limit=10.0 2024-09-20 13:39:05,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.69 vs. limit=15.0 2024-09-20 13:39:10,032 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:39:23,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=899220.0, ans=0.025 2024-09-20 13:39:43,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=899260.0, ans=0.0 2024-09-20 13:39:44,532 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:39:47,288 INFO [train.py:1198] (1/2) Epoch 50, batch 3100, loss[loss=0.2501, ctc_loss=0.1209, cr_loss=0.3693, attn_decoder_loss=0.2562, over 29246.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1075, cr_loss=0.3468, attn_decoder_loss=0.237, over 5777475.21 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:39:56,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=899300.0, ans=0.125 2024-09-20 13:40:20,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.36 vs. limit=22.5 2024-09-20 13:40:23,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=899380.0, ans=0.0 2024-09-20 13:40:23,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=899380.0, ans=0.025 2024-09-20 13:40:27,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=12.0 2024-09-20 13:40:34,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.23 vs. limit=15.0 2024-09-20 13:40:46,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=899460.0, ans=0.125 2024-09-20 13:40:50,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.844e+01 9.233e+01 9.806e+01 2.846e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-20 13:40:50,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=899460.0, ans=0.125 2024-09-20 13:40:52,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=899460.0, ans=0.125 2024-09-20 13:41:02,633 INFO [train.py:1198] (1/2) Epoch 50, batch 3150, loss[loss=0.2335, ctc_loss=0.1066, cr_loss=0.3424, attn_decoder_loss=0.24, over 28805.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1073, cr_loss=0.3461, attn_decoder_loss=0.2369, over 5783142.85 frames. ], batch size: 104, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:41:07,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=22.5 2024-09-20 13:41:17,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=899500.0, ans=0.1 2024-09-20 13:41:25,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=899540.0, ans=0.125 2024-09-20 13:41:41,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=899580.0, ans=0.1 2024-09-20 13:41:50,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=899620.0, ans=0.07 2024-09-20 13:42:07,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=899660.0, ans=0.125 2024-09-20 13:42:09,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=899660.0, ans=0.125 2024-09-20 13:42:13,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=899660.0, ans=0.0 2024-09-20 13:42:20,746 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:42:21,891 INFO [train.py:1198] (1/2) Epoch 50, batch 3200, loss[loss=0.2224, ctc_loss=0.0937, cr_loss=0.3261, attn_decoder_loss=0.2295, over 29405.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1066, cr_loss=0.3449, attn_decoder_loss=0.2362, over 5794232.10 frames. ], batch size: 79, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:42:24,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.02 vs. limit=12.0 2024-09-20 13:42:30,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=899700.0, ans=15.0 2024-09-20 13:42:38,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=899740.0, ans=0.0 2024-09-20 13:43:25,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.416e+01 9.001e+01 9.640e+01 1.386e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-20 13:43:37,444 INFO [train.py:1198] (1/2) Epoch 50, batch 3250, loss[loss=0.2298, ctc_loss=0.1052, cr_loss=0.3411, attn_decoder_loss=0.2361, over 29707.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1067, cr_loss=0.3456, attn_decoder_loss=0.2368, over 5800835.43 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:43:44,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=22.5 2024-09-20 13:44:26,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=900020.0, ans=10.0 2024-09-20 13:44:50,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=900060.0, ans=0.0 2024-09-20 13:44:53,041 INFO [train.py:1198] (1/2) Epoch 50, batch 3300, loss[loss=0.2424, ctc_loss=0.1108, cr_loss=0.3596, attn_decoder_loss=0.2491, over 28136.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1062, cr_loss=0.3443, attn_decoder_loss=0.2358, over 5796607.29 frames. ], batch size: 111, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:45:32,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=900180.0, ans=0.125 2024-09-20 13:45:50,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=900220.0, ans=0.125 2024-09-20 13:46:01,978 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 8.683e+01 9.254e+01 9.837e+01 3.581e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-20 13:46:04,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-09-20 13:46:05,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.54 vs. limit=15.0 2024-09-20 13:46:12,367 INFO [train.py:1198] (1/2) Epoch 50, batch 3350, loss[loss=0.2465, ctc_loss=0.1177, cr_loss=0.3799, attn_decoder_loss=0.2523, over 28773.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.107, cr_loss=0.3461, attn_decoder_loss=0.2365, over 5774300.11 frames. ], batch size: 104, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:46:14,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=900300.0, ans=0.125 2024-09-20 13:46:41,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=900380.0, ans=0.95 2024-09-20 13:47:14,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=900460.0, ans=0.0 2024-09-20 13:47:27,585 INFO [train.py:1198] (1/2) Epoch 50, batch 3400, loss[loss=0.2169, ctc_loss=0.1075, cr_loss=0.3495, attn_decoder_loss=0.2213, over 29366.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1073, cr_loss=0.3465, attn_decoder_loss=0.2365, over 5767007.38 frames. ], batch size: 67, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:47:27,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=900500.0, ans=0.2 2024-09-20 13:47:33,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-09-20 13:47:33,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-09-20 13:47:38,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=900500.0, ans=0.2 2024-09-20 13:47:44,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=900540.0, ans=0.1 2024-09-20 13:47:52,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2024-09-20 13:48:14,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=900620.0, ans=0.0 2024-09-20 13:48:32,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.696e+01 9.262e+01 1.002e+02 2.353e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-20 13:48:33,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.50 vs. limit=10.0 2024-09-20 13:48:40,827 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:48:44,939 INFO [train.py:1198] (1/2) Epoch 50, batch 3450, loss[loss=0.2425, ctc_loss=0.1139, cr_loss=0.3533, attn_decoder_loss=0.2489, over 28300.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1073, cr_loss=0.3463, attn_decoder_loss=0.2365, over 5775347.41 frames. ], batch size: 111, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:48:45,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=900700.0, ans=0.125 2024-09-20 13:48:52,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=900700.0, ans=0.125 2024-09-20 13:48:54,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=900700.0, ans=0.0 2024-09-20 13:48:54,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=900700.0, ans=0.0 2024-09-20 13:48:55,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=900700.0, ans=0.0 2024-09-20 13:49:04,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=900740.0, ans=0.0 2024-09-20 13:49:07,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=900740.0, ans=0.2 2024-09-20 13:49:07,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=900740.0, ans=0.0 2024-09-20 13:49:12,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=900740.0, ans=0.1 2024-09-20 13:49:21,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2024-09-20 13:49:23,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900780.0, ans=0.1 2024-09-20 13:49:28,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=900780.0, ans=0.125 2024-09-20 13:49:47,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.54 vs. limit=22.5 2024-09-20 13:50:02,672 INFO [train.py:1198] (1/2) Epoch 50, batch 3500, loss[loss=0.2074, ctc_loss=0.08378, cr_loss=0.2904, attn_decoder_loss=0.2147, over 29332.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1069, cr_loss=0.3457, attn_decoder_loss=0.2361, over 5776600.59 frames. ], batch size: 71, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:50:02,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=900900.0, ans=0.125 2024-09-20 13:50:07,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=900900.0, ans=0.125 2024-09-20 13:50:10,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=900900.0, ans=0.2 2024-09-20 13:50:25,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=22.5 2024-09-20 13:50:43,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=900980.0, ans=0.0 2024-09-20 13:50:46,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.62 vs. limit=10.0 2024-09-20 13:51:06,275 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.687e+01 9.132e+01 9.623e+01 1.623e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-20 13:51:06,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=901060.0, ans=0.025 2024-09-20 13:51:16,613 INFO [train.py:1198] (1/2) Epoch 50, batch 3550, loss[loss=0.2406, ctc_loss=0.1102, cr_loss=0.3582, attn_decoder_loss=0.2471, over 29700.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1069, cr_loss=0.3457, attn_decoder_loss=0.2362, over 5782676.54 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:51:24,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=901100.0, ans=0.125 2024-09-20 13:51:27,327 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:51:30,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=901140.0, ans=0.125 2024-09-20 13:51:46,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=901180.0, ans=0.0 2024-09-20 13:52:04,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2024-09-20 13:52:16,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2024-09-20 13:52:26,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=901260.0, ans=0.125 2024-09-20 13:52:30,409 INFO [train.py:1198] (1/2) Epoch 50, batch 3600, loss[loss=0.2353, ctc_loss=0.1193, cr_loss=0.3734, attn_decoder_loss=0.2399, over 29493.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1069, cr_loss=0.3455, attn_decoder_loss=0.2361, over 5791595.42 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:52:40,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=901300.0, ans=0.125 2024-09-20 13:52:50,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=901340.0, ans=0.09899494936611666 2024-09-20 13:53:22,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=901420.0, ans=0.0 2024-09-20 13:53:34,114 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.620e+01 9.031e+01 9.703e+01 1.754e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-20 13:53:44,526 INFO [train.py:1198] (1/2) Epoch 50, batch 3650, loss[loss=0.2566, ctc_loss=0.1408, cr_loss=0.4313, attn_decoder_loss=0.2599, over 29526.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1066, cr_loss=0.345, attn_decoder_loss=0.2358, over 5794825.92 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:53:59,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=901540.0, ans=0.04949747468305833 2024-09-20 13:54:04,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=901540.0, ans=0.0 2024-09-20 13:54:05,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901540.0, ans=0.1 2024-09-20 13:54:05,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=901540.0, ans=0.125 2024-09-20 13:54:24,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=901580.0, ans=0.0 2024-09-20 13:54:38,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=901620.0, ans=0.09899494936611666 2024-09-20 13:54:52,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901660.0, ans=0.1 2024-09-20 13:55:00,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=901660.0, ans=0.0 2024-09-20 13:55:02,720 INFO [train.py:1198] (1/2) Epoch 50, batch 3700, loss[loss=0.2485, ctc_loss=0.1235, cr_loss=0.3853, attn_decoder_loss=0.2538, over 29712.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1068, cr_loss=0.3456, attn_decoder_loss=0.2361, over 5804705.03 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:55:05,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.66 vs. limit=15.0 2024-09-20 13:55:17,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=901740.0, ans=0.125 2024-09-20 13:55:32,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901780.0, ans=0.1 2024-09-20 13:55:37,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2024-09-20 13:55:41,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=901780.0, ans=0.125 2024-09-20 13:55:42,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=901780.0, ans=0.025 2024-09-20 13:56:01,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=901860.0, ans=0.125 2024-09-20 13:56:05,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.521e+01 9.140e+01 9.589e+01 3.115e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-20 13:56:08,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-09-20 13:56:11,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2024-09-20 13:56:16,332 INFO [train.py:1198] (1/2) Epoch 50, batch 3750, loss[loss=0.2042, ctc_loss=0.08542, cr_loss=0.2882, attn_decoder_loss=0.211, over 29365.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1065, cr_loss=0.3452, attn_decoder_loss=0.2356, over 5807761.27 frames. ], batch size: 67, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:56:25,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=901900.0, ans=0.0 2024-09-20 13:56:45,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=12.0 2024-09-20 13:56:59,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=902020.0, ans=0.0 2024-09-20 13:57:01,490 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:57:28,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=902060.0, ans=0.125 2024-09-20 13:57:30,714 INFO [train.py:1198] (1/2) Epoch 50, batch 3800, loss[loss=0.239, ctc_loss=0.1092, cr_loss=0.3501, attn_decoder_loss=0.2456, over 29650.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1066, cr_loss=0.3455, attn_decoder_loss=0.2356, over 5798101.08 frames. ], batch size: 86, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:58:03,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=902180.0, ans=0.125 2024-09-20 13:58:14,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff3.min_abs, batch_count=902220.0, ans=0.2 2024-09-20 13:58:14,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=902220.0, ans=0.04949747468305833 2024-09-20 13:58:14,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.00 vs. limit=15.0 2024-09-20 13:58:27,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=902220.0, ans=0.0 2024-09-20 13:58:34,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 8.593e+01 9.127e+01 9.556e+01 1.815e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-20 13:58:34,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902260.0, ans=0.1 2024-09-20 13:58:36,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-09-20 13:58:38,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-09-20 13:58:44,683 INFO [train.py:1198] (1/2) Epoch 50, batch 3850, loss[loss=0.2488, ctc_loss=0.1213, cr_loss=0.3692, attn_decoder_loss=0.2547, over 29213.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1063, cr_loss=0.3445, attn_decoder_loss=0.2354, over 5812017.56 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:58:57,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=902340.0, ans=0.015 2024-09-20 13:59:00,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=15.0 2024-09-20 13:59:25,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=902380.0, ans=0.025 2024-09-20 13:59:29,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=902420.0, ans=0.0 2024-09-20 13:59:39,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.78 vs. limit=15.0 2024-09-20 13:59:50,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=902460.0, ans=0.125 2024-09-20 14:00:00,220 INFO [train.py:1198] (1/2) Epoch 50, batch 3900, loss[loss=0.2513, ctc_loss=0.1192, cr_loss=0.3766, attn_decoder_loss=0.2576, over 29631.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1064, cr_loss=0.345, attn_decoder_loss=0.2359, over 5817225.26 frames. ], batch size: 86, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:00:50,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=902620.0, ans=0.2 2024-09-20 14:01:00,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=902660.0, ans=0.125 2024-09-20 14:01:02,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=902660.0, ans=0.125 2024-09-20 14:01:05,030 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.795e+01 8.747e+01 9.246e+01 9.668e+01 1.412e+02, threshold=1.849e+02, percent-clipped=0.0 2024-09-20 14:01:08,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2024-09-20 14:01:15,505 INFO [train.py:1198] (1/2) Epoch 50, batch 3950, loss[loss=0.241, ctc_loss=0.1132, cr_loss=0.3597, attn_decoder_loss=0.2472, over 29536.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1057, cr_loss=0.3433, attn_decoder_loss=0.2357, over 5836454.63 frames. ], batch size: 97, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:01:21,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=902700.0, ans=0.0 2024-09-20 14:01:24,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=902700.0, ans=0.0 2024-09-20 14:01:33,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=902740.0, ans=0.125 2024-09-20 14:01:45,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.70 vs. limit=22.5 2024-09-20 14:01:56,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2024-09-20 14:01:59,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=902820.0, ans=0.125 2024-09-20 14:02:10,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=902820.0, ans=0.125 2024-09-20 14:02:13,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2024-09-20 14:02:28,914 INFO [train.py:1198] (1/2) Epoch 50, batch 4000, loss[loss=0.2181, ctc_loss=0.1051, cr_loss=0.3444, attn_decoder_loss=0.223, over 29542.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1061, cr_loss=0.3436, attn_decoder_loss=0.2358, over 5814467.49 frames. ], batch size: 74, lr: 2.19e-03, grad_scale: 32.0 2024-09-20 14:02:29,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=902900.0, ans=0.07 2024-09-20 14:02:32,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=902900.0, ans=0.125 2024-09-20 14:02:34,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=902900.0, ans=10.0 2024-09-20 14:02:42,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=902940.0, ans=0.125 2024-09-20 14:02:52,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=902940.0, ans=0.125 2024-09-20 14:03:10,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=902980.0, ans=0.125 2024-09-20 14:03:15,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=903020.0, ans=0.125 2024-09-20 14:03:26,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=903060.0, ans=0.05 2024-09-20 14:03:33,840 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.945e+01 8.787e+01 9.327e+01 9.838e+01 2.486e+02, threshold=1.865e+02, percent-clipped=3.0 2024-09-20 14:03:42,693 INFO [train.py:1198] (1/2) Epoch 50, batch 4050, loss[loss=0.2433, ctc_loss=0.1291, cr_loss=0.3628, attn_decoder_loss=0.248, over 19721.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1058, cr_loss=0.3433, attn_decoder_loss=0.2355, over 5797575.92 frames. ], batch size: 209, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:03:59,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.38 vs. limit=15.0 2024-09-20 14:04:00,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=903140.0, ans=0.125 2024-09-20 14:04:02,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=903140.0, ans=15.0 2024-09-20 14:04:21,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=22.5 2024-09-20 14:04:25,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903180.0, ans=0.1 2024-09-20 14:04:34,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=903220.0, ans=0.125 2024-09-20 14:04:37,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=903220.0, ans=0.125 2024-09-20 14:04:41,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=903260.0, ans=0.125 2024-09-20 14:04:57,480 INFO [train.py:1198] (1/2) Epoch 50, batch 4100, loss[loss=0.237, ctc_loss=0.1053, cr_loss=0.3333, attn_decoder_loss=0.2442, over 29526.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1061, cr_loss=0.344, attn_decoder_loss=0.2357, over 5793069.75 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:05:07,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=903300.0, ans=0.125 2024-09-20 14:05:15,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=903340.0, ans=0.125 2024-09-20 14:05:19,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903340.0, ans=0.1 2024-09-20 14:05:19,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=903340.0, ans=0.125 2024-09-20 14:05:20,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=903340.0, ans=0.125 2024-09-20 14:05:47,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=903420.0, ans=0.0 2024-09-20 14:06:03,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903460.0, ans=0.1 2024-09-20 14:06:04,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.737e+01 9.246e+01 9.591e+01 2.033e+02, threshold=1.849e+02, percent-clipped=1.0 2024-09-20 14:06:11,996 INFO [train.py:1198] (1/2) Epoch 50, batch 4150, loss[loss=0.2189, ctc_loss=0.09807, cr_loss=0.3245, attn_decoder_loss=0.2251, over 29521.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1058, cr_loss=0.3431, attn_decoder_loss=0.2353, over 5798483.97 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:06:13,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=903500.0, ans=0.125 2024-09-20 14:06:31,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=903540.0, ans=0.0 2024-09-20 14:06:35,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=903540.0, ans=0.07 2024-09-20 14:06:38,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=903540.0, ans=0.025 2024-09-20 14:06:40,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-09-20 14:06:56,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=903620.0, ans=0.2 2024-09-20 14:07:19,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=903660.0, ans=0.0 2024-09-20 14:07:22,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=903660.0, ans=0.125 2024-09-20 14:07:25,109 INFO [train.py:1198] (1/2) Epoch 50, batch 4200, loss[loss=0.2514, ctc_loss=0.1289, cr_loss=0.4055, attn_decoder_loss=0.256, over 29502.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.106, cr_loss=0.3436, attn_decoder_loss=0.2357, over 5799548.73 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:07:32,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=903700.0, ans=0.125 2024-09-20 14:07:37,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=903700.0, ans=0.125 2024-09-20 14:07:37,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=903700.0, ans=0.0 2024-09-20 14:07:40,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=903740.0, ans=0.07 2024-09-20 14:08:32,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.954e+01 8.657e+01 9.068e+01 9.554e+01 1.385e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 14:08:33,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=903860.0, ans=0.125 2024-09-20 14:08:37,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=903860.0, ans=0.0 2024-09-20 14:08:39,520 INFO [train.py:1198] (1/2) Epoch 50, batch 4250, loss[loss=0.2143, ctc_loss=0.08962, cr_loss=0.3017, attn_decoder_loss=0.2215, over 29538.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1054, cr_loss=0.3418, attn_decoder_loss=0.2355, over 5804661.45 frames. ], batch size: 74, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:08:41,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=903900.0, ans=0.125 2024-09-20 14:08:52,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=903940.0, ans=0.2 2024-09-20 14:09:34,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904020.0, ans=0.1 2024-09-20 14:09:53,370 INFO [train.py:1198] (1/2) Epoch 50, batch 4300, loss[loss=0.2414, ctc_loss=0.1107, cr_loss=0.3486, attn_decoder_loss=0.2481, over 29521.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1052, cr_loss=0.3414, attn_decoder_loss=0.2355, over 5794333.52 frames. ], batch size: 87, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:10:15,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=904140.0, ans=0.125 2024-09-20 14:10:23,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=904180.0, ans=0.125 2024-09-20 14:10:26,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.79 vs. limit=12.0 2024-09-20 14:10:43,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=904220.0, ans=0.125 2024-09-20 14:10:45,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=904220.0, ans=0.0 2024-09-20 14:10:55,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.81 vs. limit=15.0 2024-09-20 14:10:56,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2024-09-20 14:10:59,382 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 8.735e+01 9.239e+01 9.870e+01 1.478e+02, threshold=1.848e+02, percent-clipped=0.0 2024-09-20 14:11:03,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=22.5 2024-09-20 14:11:04,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=904260.0, ans=0.0 2024-09-20 14:11:06,797 INFO [train.py:1198] (1/2) Epoch 50, batch 4350, loss[loss=0.2495, ctc_loss=0.1226, cr_loss=0.3849, attn_decoder_loss=0.255, over 29515.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1077, cr_loss=0.3473, attn_decoder_loss=0.2388, over 5796767.78 frames. ], batch size: 97, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:11:15,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=904300.0, ans=0.125 2024-09-20 14:11:22,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=904340.0, ans=0.0 2024-09-20 14:12:00,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=904420.0, ans=0.125 2024-09-20 14:12:11,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=904460.0, ans=0.125 2024-09-20 14:12:15,553 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 14:12:21,022 INFO [train.py:1198] (1/2) Epoch 50, batch 4400, loss[loss=0.2387, ctc_loss=0.1162, cr_loss=0.3593, attn_decoder_loss=0.2444, over 27346.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1089, cr_loss=0.3496, attn_decoder_loss=0.2409, over 5767156.46 frames. ], batch size: 124, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:12:50,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.35 vs. limit=15.0 2024-09-20 14:12:51,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.69 vs. limit=15.0 2024-09-20 14:12:54,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-09-20 14:13:22,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=904660.0, ans=0.025 2024-09-20 14:13:27,635 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.488e+01 9.232e+01 9.764e+01 1.027e+02 1.631e+02, threshold=1.953e+02, percent-clipped=0.0 2024-09-20 14:13:32,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904660.0, ans=0.1 2024-09-20 14:13:35,034 INFO [train.py:1198] (1/2) Epoch 50, batch 4450, loss[loss=0.2512, ctc_loss=0.1283, cr_loss=0.3618, attn_decoder_loss=0.2568, over 19885.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1122, cr_loss=0.3555, attn_decoder_loss=0.2429, over 5577143.51 frames. ], batch size: 209, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:13:47,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2024-09-20 14:14:09,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.40 vs. limit=22.5 2024-09-20 14:14:26,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-09-20 14:14:28,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=904820.0, ans=0.0 2024-09-20 14:14:38,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=904860.0, ans=0.125 2024-09-20 14:14:46,501 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 14:14:50,474 INFO [train.py:1198] (1/2) Epoch 50, batch 4500, loss[loss=0.2341, ctc_loss=0.12, cr_loss=0.3427, attn_decoder_loss=0.2392, over 20458.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1151, cr_loss=0.358, attn_decoder_loss=0.2446, over 5237912.42 frames. ], batch size: 209, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:15:13,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.04 vs. limit=22.5 2024-09-20 14:15:17,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.84 vs. limit=22.5 2024-09-20 14:15:27,073 INFO [train.py:1496] (1/2) Done!